机器学习知识点查漏补缺（随机森林和extraTrees）

时间：2018-04-01 15:55:29 阅读：282 评论：0 收藏：0 [点我收藏+]

随机森林

对数据样本及特征随机抽取，进行多个决策树训练，防止过拟合，提高泛化能力

一般随机森林的特点：

1、有放回抽样（所以生成每棵树的时候，实际数据集会有重复），

2、以最优划分分裂

Given a standard training set D of size n, bagging generates m new training sets D_i, each of size n′, by sampling from D uniformly and with replacement. This kind of sample is known as a bootstrap sample. The m models are fitted using the above m bootstrap samples and combined by averaging the output (for regression) or voting (for classification).

ExtraTrees算法多一层随机性，在对连续变量特征选取最优分裂值时，不会计算所有分裂值的效果，来选择分裂特征。

而是对每一个特征，在它的特征取值范围内，随机生成一个split value，再计算看选取哪一个特征来进行分裂。

1、Empirical good default values are max_features=n_features for regression problems, and max_features=sqrt(n_features) for classification tasks (where n_features is the number of features in the data).

2、In addition, note that in random forests, bootstrap samples are used by default (bootstrap=True) while the default strategy for extra-trees is to use the whole dataset (bootstrap=False).

机器学习知识点查漏补缺（随机森林和extraTrees）

原文：https://www.cnblogs.com/hugh-tan/p/8686701.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)