R語言學習筆記(七)：機器與深度學習

Yanwei Liu

30 min readDec 2, 2019

本文參考至以下連結及書籍：

RPubs

Edit description

rpubs.com

HyRead ebook 高雄空中大學圖書館-R語言資料分析:從機器學習、資料探勘、文字探勘到巨量資料分析

內容簡介資訊爆量的時代，你需要學習R語言成為統計分析達人！ R是統計軟體也是一種程式設計語言，具有Windows、Unix、Linux及Apple MacOS ...

ouk.ebook.hyread.com.tw

套件：

20200210更新：21個機器學習套件

21 Best R Machine learning Packages | Dunebook

In this article, we have come with 21 interesting and top R machine learning packages for you. R is among the commonest…

www.dunebook.com

20191228更新：mlr3，此套件是目前唯一整合所有ML演算法的套件

mlr3 manual | mlr3 manual

As a 30-second introductory example, we will train a decision tree model on the first 120 rows of iris data set and…

mlr3book.mlr-org.com

決策樹用rpart, C50
SVM用e1071
ANN用neuralnet
隨機森林用randomForest(基本款)、ranger(快速版)
AdaBoost用adabga
K-Means用kmeans()函式
基因演算法用GA
Apriori演算法用arules
文字雲用worldcloud
GUI版本的工具用rattle

安裝Tensorflow和Keras

install.packages('devtools')

library(devtools)

devtools::install_github("rstudio/tensorflow") 

devtools::install_github("rstudio/keras")

匯入套件與測試

#測試Tensorflowlibrary(tensorflow)
sess = tf$Session()
hello <- tf$constant('Hello, TensorFlow!')
sess$run(hello)#測試Keras
library(keras)
model <- keras_model_sequential() 
model %>% 
  layer_dense(units = 256, activation = "relu", input_shape = c(784)) %>% 
  layer_dropout(rate = 0.4) %>% 
  layer_dense(units = 128, activation = "relu") %>%
  layer_dropout(rate = 0.3) %>%
  layer_dense(units = 10, activation = "softmax")

summary(model)

迴歸分析

羅輯迴歸

require(datasets)  # source package
str(iris)          # check structure of iris
head(iris, n=6)
summary(iris)
#內建繪圖
plot(x=iris$Sepal.Length, y=iris$Sepal.Width,pch=2) 
#ggplot2
require(ggplot2)
  ggplot(data=iris) +
    geom_point(aes(x=Sepal.Length,
                   y=Sepal.Width)) +
    theme_bw()#上色
###內建繪圖
#  plot(x=iris$Petal.Length, y=iris$Petal.Width,pch=16)
  #  d1 <- iris[iris$Species=="versicolor", ]
  #  points(x=d1$Petal.Length, y=d1$Petal.Width,pch=16, col="green")
  #  d2 <- iris[iris$Species=="setosa", ]
  #  points(x=d2$Petal.Length, y=d2$Petal.Width,pch=16, col="red")
  #  d3 <- iris[iris$Species=="virginica", ]
  #  points(x=d3$Petal.Length, y=d3$Petal.Width,pch=16, col="blue")
  #  legend("topleft", pch=16
  #         legend=c("setosa","versicolor","virginica"), 
  #         col=c("red", "green", "blue")
  #        )
### ggplot2
require(ggplot2)
  ggplot(data=iris) +                        # 準備畫布
    geom_point(aes(x=Petal.Length,           # 散布圖
                   y=Petal.Width,
                   color=Species)) +         # 把不同品種的資料標上顏色
    
    theme_bw()                               # 改變主題背景成白色
#資料預處理
data <- data.frame(x=c(1,2,3,NA,5),
                   y=c(4,5,3,NA,NA))
is.na(data)
table(is.na(data)) #有多少遺漏值
na.omit(data) #移除遺漏值
data[is.na(data[,"y"]), "y"] <- mean(data[,"y"], na.rm=T) #用平均值填補遺漏值
table(is.na(iris)) #檢查還有沒有遺漏值#線性回歸lm()建立
model <- lm(formula= Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width,
            data=iris)
summary(model)
#預測
new.iris <- data.frame(Sepal.Width=3.456, Petal.Length=1.535, Petal.Width=0.341)
predict(model, new.iris)

關聯分析

資料載點load("titanic.raw.rdata")  #匯入.rdata檔
###匯入CSV
data <- read.csv("C:/data.csv")str(titanic.raw)require(arules) # apriori關聯式法則的套件支持度(min support)：「規則」在資料內具有普遍性，也就是這些 A 跟 B 同時出現的機率多少。信賴度(min confidence)：「規則」要有一定的信心水準，也就是當購買 A 狀態下，也會購買 B 的條件機率。rule <- apriori(titanic.raw, 
                parameter=list(minlen=3, supp=0.1, conf=0.7),  
                appearance = list(default="lhs",
                                  rhs=c("Survived=No", "Survived=Yes") 
                                  # 右手邊顯示的特徵
                                  )
                )
inspect(rule) #觀察rule

決策樹

KNN

require(rpart) #建立決策樹load("titanic.raw.rdata")  #匯入.rdata檔# 先把資料區分成 train=0.8, test=0.2 
set.seed(22)
train.index <- sample(x=1:nrow(titanic.raw), size=ceiling(0.8*nrow(titanic.raw) ))
train <- titanic.raw[train.index, ]
test <- titanic.raw[-train.index, ]

# CART的模型：把存活與否的變數(Survived)當作Y，剩下的變數當作X
cart.model<- rpart(Survived ~. , 
                   data=train)#決策樹視覺化
require(rpart.plot) 
prp(cart.model,         # 模型
    faclen=0,           # 呈現的變數不要縮寫
    fallen.leaves=TRUE, # 讓樹枝以垂直方式呈現
    shadow.col="gray",  # 最下面的節點塗上陰影
    # number of correct classifications / number of observations in that node
    extra=2)
#預測
pred <- predict(cart.model, newdata=test, type="class")

# 用table看預測的情況
table(real=test$Survived, predict=pred)# 計算預測準確率 = 對角線的數量/總數量
confus.matrix <- table(real=test$Survived, predict=pred)
sum(diag(confus.matrix))/sum(confus.matrix) # 對角線的數量/總數量

類神經網路

#安裝以下套件
require(neuralnet) # for neuralnet(), nn model
require(nnet)      # for class.ind()
require(caret)     # for train(), tune parametersdata <- iris

# 因為Species是類別型態，這邊轉換成三個output nodes，使用的是class.ind函式()
head(class.ind(data$Species))# 並和原始的資料合併在一起，cbind意即column-bind
data <- cbind(data, class.ind(data$Species))

# 原始資料就會變成像這樣
head(data)formula.bpn <- setosa + versicolor + virginica ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Widthbpn <- neuralnet(formula = formula.bpn, 
                  data = data,
                  hidden = c(2),       # 一個隱藏層：2個node
                  learningrate = 0.01, # learning rate
                  threshold = 0.01,    # partial derivatives of the error function, a stopping criteria
                  stepmax = 5e5        # 最大的ieration數 = 500000(5*10^5)

                  )# nrow()是用來擷取資料筆數，乘上0.8後，表示我們的train set裡面要有多少筆資料(data size)
smp.size <- floor(0.8*nrow(data)) 
# 因為是抽樣，有可能每次抽樣結果都不一樣，因此這裡規定好亂數表，讓每次抽樣的結果一樣
set.seed(131)                     
# 從原始資料裡面，抽出train set所需要的資料筆數(data size)
train.ind <- sample(seq_len(nrow(data)), smp.size)
# 分成train/test
train <- data[train.ind, ]
test <- data[-train.ind, ]
# bpn模型會長得像這樣
plot(bpn)#預測
pred <- compute(bpn, test[, 1:4])  

# 預測結果
pred$net.result

分群K-Means

1. 階層式分群(Hierarchical Clustering)data <- iris[, -5] # 因為Species是第五欄位，故移除掉E.dist <- dist(data, method="euclidean") # 歐式距離
M.dist <- dist(data, method="manhattan") # 曼哈頓距離par(mfrow=c(1,2)) # 讓圖片以1x2的方式呈現，詳情請見(4)繪圖-資料視覺化

# 使用歐式距離進行分群
h.E.cluster <- hclust(E.dist)
plot(h.E.cluster, xlab="歐式距離")

# 使用曼哈頓距離進行分群
h.M.cluster <- hclust(M.dist) 
plot(h.M.cluster, xlab="曼哈頓距離")#用歐式距離搭配華德法，來進行階層式分群：E.dist <- dist(data, method="euclidean")      # 歐式距離
h.cluster <- hclust(E.dist, method="ward.D2") # 華德法

# 視覺化
plot(h.cluster)
abline(h=9, col="red")# 減少階層結構cut.h.cluster <- cutree(h.cluster, k=3)  # 分成三群
cut.h.cluster                            # 分群結果table(cut.h.cluster, iris$Species)       # 分群結果和實際結果比較2. 切割式分群(Partitional Clustering)#K-Means# 分成三群
kmeans.cluster <- kmeans(data, centers=3) 

# 群內的變異數
kmeans.cluster$withinss# 分群結果和實際結果比較
table(kmeans.cluster$cluster, iris$Species)# 視覺化 k-means 分群結果(基於ggplot2的語法)
require(factoextra)
fviz_cluster(kmeans.cluster,           # 分群結果
             data = data,              # 資料
             geom = c("point","text"), # 點和標籤(point & label)
             frame.type = "norm")      # 框架型態#找出最佳的K值# Elbow Method 應用在 K-Means
fviz_nbclust(data, 
             FUNcluster = kmeans,# K-Means
             method = "wss",     # total within sum of square
             k.max = 12          # max number of clusters to consider
             ) +
    
labs(title="Elbow Method for K-Means") +
    
geom_vline(xintercept = 3,        # 在 X=3的地方 
           linetype = 2)          # 畫一條垂直虛線

支持向量機SVM、SVR

SVMrequire(e1071)require("mlbench")
data(Glass, package="mlbench")
data = Glasssmp.size = floor(0.8*nrow(data)) 
set.seed(516)                     
train.ind = sample(seq_len(nrow(data)), smp.size)
train = data[train.ind, ] # 80%
test = data[-train.ind, ] # 20%model = svm(formula = Type ~ .,  # 依變數(在這裡是Type)的資料形態要是Factor
            data = train)

# 可以看到SVM預設的參數設定
summary(model)# 預測
train.pred = predict(model, train)
test.pred = predict(model, test)

# 訓練資料的混淆矩陣
table(real=train$Type, predict=train.pred)# 訓練資料的分類準確率
confus.matrix = table(real=train$Type, predict=train.pred)
sum(diag(confus.matrix))/sum(confus.matrix)# 測試資料的混淆矩陣
table(real=test$Type, predict=test.pred)# 測試資料的分類準確率
confus.matrix = table(real=test$Type, predict=test.pred)
sum(diag(confus.matrix))/sum(confus.matrix)###調整參數
# data
require("mlbench")
data(Glass, package="mlbench")
data = Glass

# tune cost and gamma in SVM(soft-margin)
tune.model = tune(svm,
                  Type~.,
                  data=data,
                  kernel="radial", # RBF kernel function
                  range=list(cost=10^(-1:2), gamma=c(.5,1,2))# 調參數的最主要一行
)summary(tune.model)
plot(tune.model)
# Best model in set of tuning models
tune.model$best.model

Ensemble Learning(集成學習)

data(Prostate, package="lasso2")
str(Prostate)# 先把資料區分成 train=0.8, test=0.2 
set.seed(22)
train.index <- sample(x=1:nrow(Prostate), size=ceiling(0.8*nrow(Prostate) ))

train = Prostate[train.index, ]
test = Prostate[-train.index, ]建立3個子集# Subset-1
set.seed(1)
ind_1 = sample(1:nrow(train), 33)
subset_1 = train[ind_1, ]

# Subset-1
set.seed(2)
ind_2 = sample(1:nrow(train), 33)
subset_2 = train[ind_2, ]

# Subset-3
set.seed(3)
ind_3 = sample(1:nrow(train), 33)
subset_3 = train[ind_3, ]針對3個子集建立迴歸模型
# Model-1 : linear regression
model_1 = lm(lpsa~., subset_1)
y1 = predict(model_1, test)
  
# Model-2 : linear regression
model_2 = lm(lpsa~., subset_2)
y2 = predict(model_2, test)

# Model-3 : linear regression
model_3 = lm(lpsa~., subset_3)
y3 = predict(model_3, test)

# Average Prediction Results
ave_y = (y1+y2+y3)/3# MSE Comparision between three models and the bagging model
c(mean((y1 - test$lpsa)^2),   # linear regression of subset_1
  mean((y2 - test$lpsa)^2),   # linear regression of subset_2
  mean((y3 - test$lpsa)^2),   # linear regression of subset_3
  mean((ave_y - test$lpsa)^2))  # bagging model

隨機森林

data(Prostate, package="lasso2")
str(Prostate)# 先把資料區分成 train=0.8, test=0.2 
set.seed(22)
train.index <- sample(x=1:nrow(Prostate), size=ceiling(0.8*nrow(Prostate) ))

train = Prostate[train.index, ]
test = Prostate[-train.index, ]require(randomForest)

rf_model = randomForest(lpsa~.,
                        data=train,
                        ntree=150        # 決策樹的數量
                        )
# 預測
rf_y = predict(rf_model, test)
mean((rf_y - test$lpsa)^2) # MSE# Observe that what is the best number of trees
plot(rf_model)tuneRF(train[,-9], train[,9]) #找出抽N個變數時錯誤最低# 改寫隨機森林的模型
rf_model = randomForest(lpsa~.,
                        data = train,
                        ntree = 70,        # 決策樹的數量
                        mtry = 4
                        )
# 預測
rf_y = predict(rf_model, test)
mean((rf_y - test$lpsa)^2) # MSE#特徵重要性
rf_model$importance
varImpPlot(rf_model)

XGBoost

Bagging用多個強模型；Boosting用多個弱模型( 三個臭皮匠，勝過一個諸葛亮)

使用SOP：

將資料格式(Data.frame)，用xgb.DMatrix()轉換為 xgboost 的稀疏矩陣
設定xgb.params，也就是 xgboost 裡面的參數
使用xgb.cv()，tune 出最佳的決策樹數量(nrounds)
用xgb.train()建立模型

# 1. 將資料格式(Data.frame)，用`xgb.DMatrix()`轉換為 xgboost 的稀疏矩
require(xgboost)

dtrain = xgb.DMatrix(data = as.matrix(train[,1:8]),
                     label = train$lpsa)
dtest = xgb.DMatrix(data = as.matrix(test[,1:8]),
                     label = test$lpsa)
# 2. 設定xgb.params，也就是 xgboost 裡面的參數

xgb.params = list(
  #col的抽樣比例，越高表示每棵樹使用的col越多，會增加每棵小樹的複雜度
  colsample_bytree = 0.5,                    
  # row的抽樣比例，越高表示每棵樹使用的col越多，會增加每棵小樹的複雜度
  subsample = 0.5,                      
  booster = "gbtree",
  # 樹的最大深度，越高表示模型可以長得越深，模型複雜度越高
  max_depth = 2,           
  # boosting會增加被分錯的資料權重，而此參數是讓權重不會增加的那麼快，因此越大會讓模型愈保守
  eta = 0.03,
  # 或用'mae'也可以
  eval_metric = "rmse",                      
  objective = "reg:linear",
  # 越大，模型會越保守，相對的模型複雜度比較低
  gamma = 0)
# 3. 使用xgb.cv()，tune 出最佳的決策樹數量
cv.model = xgb.cv(
  params = xgb.params, 
  data = dtrain,
  nfold = 5,     # 5-fold cv
  nrounds=200,   # 測試1-100，各個樹總數下的模型
  # 如果當nrounds < 30 時，就已經有overfitting情況發生，那表示不用繼續tune下去了，可以提早停止                
  early_stopping_rounds = 30, 
  print_every_n = 20 # 每20個單位才顯示一次結果，
)
#繪圖
tmp = cv.model$evaluation_log

plot(x=1:nrow(tmp), y= tmp$train_rmse_mean, col='red', xlab="nround", ylab="rmse", main="Avg.Performance in CV") 
points(x=1:nrow(tmp), y= tmp$test_rmse_mean, col='blue') 
legend("topright", pch=1, col = c("red", "blue"), 
       legend = c("Train", "Validation") )
# 獲得 best nround
best.nrounds = cv.model$best_iteration 
best.nrounds
# 4. 用xgb.train()建立模型
xgb.model = xgb.train(paras = xgb.params, 
                      data = dtrain,
                      nrounds = best.nrounds) 

# 如果要畫出 xgb 內的所有決策樹，可以用以下函式(但因為會很多，這裡就不畫了)
# xgb.plot.tree(model = xgb.model) 

# 預測
xgb_y = predict(xgb.model, dtest)
mean((xgb_y - test$lpsa)^2) # MSE

深度學習DNN&CNN

#前處理# 42000個觀測值，每一筆代表一張手寫數字的圖片
# 784個自變數： 28 x 28 pixels，以灰階值0~255表示。
# 應變數label，代表這張圖片象徵的數字，
# 也是在測試資料(test.csv)中要預測的值。
train <- read.csv('data/train.csv')
dim(train)# 資料轉換成 28x28 的矩陣
obs.matrix <- matrix(unlist(train[1, -1]), # ignore 'label'
                     nrow = 28,            
                     byrow=T)
str(obs.matrix)# 用 image 畫圖，顏色指定為灰階值 0~255
image(obs.matrix, 
      col=grey.colors(255))# 由於原本的圖是倒的，因此寫一個翻轉的函式：
# rotates the matrix
rotate <- function(matrix){
  t(apply(matrix, 2, rev)) 
} 

# 畫出第1筆~第8筆資料的圖
par(mfrow=c(2,4))
for(x in 1:8){
  obs.matrix <- matrix(unlist(train[x, -1]), # ignore 'label'
                              nrow = 28,            
                              byrow=T)
         
  image(rotate(obs.matrix),
               col=grey.colors(255),
               xlab=paste("Label", train[x, 1], sep=": "),
               cex.lab = 1.5
       )
}# data preparation將pixels值從0~255轉換到介於0~1之間
train.x <- t(train[, -1]/255) # train: 28 x 28 pixels
train.y <- train[, 1]         # train: label
test.x <- t(test/255)         # test: 28 x 28 pixels
#安裝mxnet# download mxnet package from github 
install.packages("drat", repos="https://cran.rstudio.com")
drat:::addRepo("dmlc")
install.packages("mxnet")
require(mxnet)#建立模型
# 輸入層
data <- mx.symbol.Variable("data")

# 第一隱藏層: 500節點，狀態是Full-Connected
fc1 <- mx.symbol.FullyConnected(data, name="1-fc", num_hidden=500)
# 第一隱藏層的激發函數: Relu
act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
# 這裡引入dropout的概念
drop1 <- mx.symbol.Dropout(data=act1, p=0.5)

# 第二隱藏層: 400節點，狀態是Full-Connected
fc2 <- mx.symbol.FullyConnected(drop1, name="2-fc", num_hidden=400)
# 第二隱藏層的激發函數: Relu
act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu")
# 這裡引入dropout的概念
drop2 <- mx.symbol.Dropout(data=act2, p=0.5)

# 輸出層：因為預測數字為0~9共十個，節點為10
output <- mx.symbol.FullyConnected(drop2, name="output", num_hidden=10)
# Transfer the Evidence to Probability by Softmax-function
dnn <- mx.symbol.SoftmaxOutput(output, name="dnn")# 神經網路中的各個參數的資訊
arguments(dnn)

# 視覺化DNN結構
graph.viz(dnn$as.json(),
          graph.title= "DNN for Kaggle－Digit Recognizer"
          )#訓練模型
mx.set.seed(0) 

# 訓練剛剛創造/設計的模型
dnn.model <- mx.model.FeedForward.create(
      dnn,       # 剛剛設計的DNN模型
      X=train.x,  # train.x
      y=train.y,  #  train.y
      ctx=mx.cpu(),  # 可以決定使用cpu或gpu
      num.round=10,  # iteration round
      array.batch.size=100, # batch size
      learning.rate=0.07,   # learn rate
      momentum=0.9,         # momentum  
      eval.metric=mx.metric.accuracy, # 評估預測結果的基準函式*
      initializer=mx.init.uniform(0.07), # 初始化參數
      epoch.end.callback=mx.callback.log.train.metric(100)
)# define my own evaluate function (R-squared)
my.eval.metric <- mx.metric.custom(
                        name = "R-squared", 
                        function(real, pred) {
                            mean_of_obs <- mean(real)
                        
                            SS_tot <- sum((real - mean_of_obs)^2)
                            SS_reg <- sum((predict - mean_of_obs)^2)
                            SS_res <- sum((real - predict)^2)
                            
                            R_squared <- 1 - (SS_res/SS_tot)
                            R_squared
                        }
                  )#預測# test prediction 
test.y <- predict(dnn.model, test.x)
test.y <- t(test.y)
test.y.label <- max.col(test.y) - 1

# Submission format for Kaggle
result <- data.frame(ImageId = 1:length(test.y.label),
                     label = test.y.label)#CNN#建立模型# 輸入層
data <- mx.symbol.Variable('data')

# 第一卷積層，windows的視窗大小是 5x5, filter=20
conv1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=20, name="1-conv")
# 第一卷積層的激發函數：tanh
conv.act1 <- mx.symbol.Activation(data=conv1, act_type="tanh", name="1-conv.act")
# 第一卷積層後的池化層，max，大小縮為 2x2
pool1 <- mx.symbol.Pooling(data=conv.act1, pool_type="max", name="1-conv.pool",
                           kernel=c(2,2), stride=c(2,2))

# 第二卷積層，windows的視窗大小是 5x5, filter=50
conv2 <- mx.symbol.Convolution(data=pool1, kernel=c(5,5), num_filter=50, name="2-conv")
# 第二卷積層的激發函數：tanh
conv.act2 <- mx.symbol.Activation(data=conv2, act_type="tanh", name="2-conv.act")
# 第二卷積層後的池化層，max，大小縮為 2x2
pool2 <- mx.symbol.Pooling(data=conv.act2, pool_type="max", name="2-conv.pool",
                           kernel=c(2,2), stride=c(2,2))

# Flatten
flatten <- mx.symbol.Flatten(data=pool2)
# 建立一個Full-Connected的隱藏層，500節點
fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=500, name="1-fc")
# 隱藏層的激發函式：tanh
fc.act1 <- mx.symbol.Activation(data=fc1, act_type="tanh", name="1-fc.act")

# 輸出層：因為預測數字為0~9共十個，節點為10
output <- mx.symbol.FullyConnected(data=fc.act1, num_hidden=10, name="output")
# Transfer the Evidence to Probability by Softmax-function
cnn <- mx.symbol.SoftmaxOutput(data=output, name="cnn")# 神經網路中的各個參數的資訊
arguments(cnn)

# 視覺化CNN結構
graph.viz(cnn$as.json(),
          graph.title= "CNN for Kaggle－Digit Recognizer"
          )#訓練模型
# Transform matrix format for LeNet 
train.array <- train.x
dim(train.array) <- c(28, 28, 1, ncol(train.x))
test.array <- test.x
dim(test.array) <- c(28, 28, 1, ncol(test.x))mx.set.seed(0)

# 訓練剛剛設計的CNN模型
cnn.model <- mx.model.FeedForward.create(
      cnn,       # 剛剛設計的CNN模型
      X=train.array,  # train.x
      y=train.y,  #  train.y
      ctx=mx.cpu(),  # 可以決定使用cpu或gpu
      num.round=10,  # iteration round
      array.batch.size=100, # batch size
      learning.rate=0.07,   # learn rate
      momentum=0.7,         # momentum  
      eval.metric=mx.metric.accuracy, # 評估預測結果的基準函式*
      initializer=mx.init.uniform(0.05), # 初始化參數
      epoch.end.callback=mx.callback.log.train.metric(100)
)#預測
# test prediction 
test.y <- predict(cnn.model, test.array)
test.y <- t(test.y)
test.y.label <- max.col(test.y) - 1

# Submission format for Kaggle
result <- data.frame(ImageId = 1:length(test.y.label),
                     label = test.y.label)

R語言學習筆記(七)：機器與深度學習

RPubs

Edit description

HyRead ebook 高雄空中大學圖書館-R語言資料分析:從機器學習、資料探勘、文字探勘到巨量資料分析

內容簡介資訊爆量的時代，你需要學習R語言成為統計分析達人！ R是統計軟體也是一種程式設計語言，具有Windows、Unix、Linux及Apple MacOS ...

套件：

21 Best R Machine learning Packages | Dunebook

In this article, we have come with 21 interesting and top R machine learning packages for you. R is among the commonest…

mlr3 manual | mlr3 manual

As a 30-second introductory example, we will train a decision tree model on the first 120 rows of iris data set and…

安裝Tensorflow和Keras

匯入套件與測試

迴歸分析

羅輯迴歸

關聯分析

決策樹

KNN

類神經網路

分群K-Means

支持向量機SVM、SVR

Ensemble Learning(集成學習)

隨機森林

XGBoost

使用SOP：

深度學習DNN&CNN

自然語言處理

Written by Yanwei Liu

No responses yet

R語言學習筆記(七)：機器與深度學習

RPubs

Edit description

HyRead ebook 高雄空中大學圖書館-R語言資料分析:從機器學習、資料探勘、文字探勘到巨量資料分析

內容簡介 資訊爆量的時代，你需要學習R語言成為統計分析達人！ R是統計軟體也是一種程式設計語言，具有Windows、Unix、Linux及Apple MacOS ...

套件：

21 Best R Machine learning Packages | Dunebook

In this article, we have come with 21 interesting and top R machine learning packages for you. R is among the commonest…

mlr3 manual | mlr3 manual

As a 30-second introductory example, we will train a decision tree model on the first 120 rows of iris data set and…

安裝Tensorflow和Keras

匯入套件與測試

關聯分析

支持向量機SVM、SVR

Ensemble Learning(集成學習)

XGBoost

使用SOP：

深度學習DNN&CNN

Written by Yanwei Liu

No responses yet

內容簡介資訊爆量的時代，你需要學習R語言成為統計分析達人！ R是統計軟體也是一種程式設計語言，具有Windows、Unix、Linux及Apple MacOS ...