首页 > 编程语言 > 详细

R语言数据预处理,标准化,分训练集检验集

时间:2020-03-10 21:58:28      阅读:262      评论:0      收藏:0      [点我收藏+]

# load the dataset into data frame
credit.df <- read.csv("/Users/Mac/Desktop/Code/Chapter\ 6\ code\ files/credit_dataset_final.csv", header = TRUE, sep = ",")

读取数据

## data type transformations - factoring
to.factors <- function(df, variables){
for (variable in variables){
df[[variable]] <- as.factor(df[[variable]])
}
return(df)
}

## normalizing - scaling
scale.features <- function(df, variables){
for (variable in variables){
df[[variable]] <- scale(df[[variable]], center=T, scale=T)
}
return(df)
}

将数值变为因子

 

# normalize variables
numeric.vars <- c("credit.duration.months", "age", "credit.amount")
credit.df <- scale.features(credit.df, numeric.vars)
# factor variables
categorical.vars <- c(‘credit.rating‘, ‘account.balance‘, ‘previous.credit.payment.status‘,
‘credit.purpose‘, ‘savings‘, ‘employment.duration‘, ‘installment.rate‘,
‘marital.status‘, ‘guarantor‘, ‘residence.duration‘, ‘current.assets‘,
‘other.credits‘, ‘apartment.type‘, ‘bank.credits‘, ‘occupation‘,
‘dependents‘, ‘telephone‘, ‘foreign.worker‘)
credit.df <- to.factors(df=credit.df, variables=categorical.vars)

将数据标准化

# split data into training and test datasets in 60:40 ratio
indexes <- sample(1:nrow(credit.df), size=0.6*nrow(credit.df))
train.data <- credit.df[indexes,]
test.data <- credit.df[-indexes,]

6:4做训练集和检验集

R语言数据预处理,标准化,分训练集检验集

原文:https://www.cnblogs.com/ahualualua/p/12458782.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!