利用sklearn的Pipeline简化建模过程

时间：2019-11-27 11:48:31 阅读：81 评论：0 收藏：0 [点我收藏+]

很多框架都会提供一种Pipeline的机制，通过封装一系列操作的流程，调用时按计划执行即可。比如netty中有ChannelPipeline，TensorFlow的计算图也是如此。

下面简要介绍sklearn中pipeline的使用：

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# 定义类别型特征预处理器
categorical_transformer=Pipeline(steps=[
    (‘imputer‘,SimpleImputer(strategy=‘most_frequent‘)),
    (‘onehot‘,OneHotEncoder(handle_unknown=‘ignore‘))
])

# 定义数值型特征预处理器
numerical_transformer=SimpleImputer(strategy=‘constant‘)

# 将类别与数值型特征预处理器，分别应用于对应列上
preprocessor = ColumnTransformer(
    transformers=[
        (‘num‘, numerical_transformer, [‘Age‘]),
        (‘cat‘, categorical_transformer, [‘Embarked‘])
    ])

# 定义Pipeline，传入预处理器与选择的模型
my_pipeline=Pipeline(steps=[
    (‘preprocessor‘,preprocessor),
    (‘model‘,RandomForestClassifier(n_estimators=100,random_state=0))
])

# 使用pipeline
X_train,X_valid,y_train,y_valid=train_test_split(X,y,test_size=0.2,random_state=0)
my_pipeline.fit(X_train.copy(),y_train.copy())# 训练，预处理会改变原始数据，不想改变copy一下
preds=my_pipeline.predict(X_valid)# 预测

原文：https://www.cnblogs.com/lunge-blog/p/11940377.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)