首页 > 其他 > 详细

the steps that may be taken to solve a feature selection problem:特征选择的步骤

时间:2015-08-12 21:47:22      阅读:252      评论:0      收藏:0      [点我收藏+]

参考:JMLR的paper《an introduction to variable and feature selection》


we summarize the steps that may be taken to solve a feature selection problem in a check list:


1. Do you have domain knowledge? If yes, construct a better set of “ad hoc” features.


2. Are your features commensurate(可以同单位度量的)? If no, consider normalizing them.


3. Do you suspect interdependence of features? If yes, expand your feature set by constructing conjunctive features or products of features(通过构建联合特征<应该是多个variables当做一个feature>或高次特征,扩展您的功能集), as much as your computer resources allow you(see example of use in Section 4.4).


4. Do you need to prune(裁剪) the input variables (e.g. for cost, speed or data understanding reasons)? If no, construct disjunctive features or weighted sums of features(构建析取特征<应该是一个variables当做一个feature>或加权和特征) (e.g. by clustering or matrix factorization, see Section 5).


5. Do you need to assess features individually(单独评估每个feature) (e.g. to understand their in?uence on the system or because their number is so large that you need to do a ?rst ?ltering)? If yes, use a variable ranking method (Section 2 and Section 7.2); else, do it anyway to get baseline results.


6. Do you need a predictor? If no, stop.


7. Do you suspect your data is “dirty” (has a few meaningless input patterns and/or noisy outputs or wrong class labels)? If yes, detect the outlier examples using the top ranking variables obtained in step 5 as representation; check and/or discard them(注意:这里的them是example的意思,不是feature。。。).


8. Do you know what to try ?rst? If no, use a linear predictor. Use a forward selection method(Section 4.2) with the “probe” method as a stopping criterion (Section 6) or use the L
0-norm embedded method (Section 4.3). For comparison, following the ranking of step 5, construct a sequence of predictors of same nature using increasing subsets of features. Can you match or improve performance with a smaller subset? If yes, try a non-linear predictor with that subset.


9. Do you have new ideas, time, computational resources, and enough examples? If yes, compare several feature selection methods, including your new idea, correlation coef?cients, backward selection and embedded methods (Section 4). Use linear and non-linear predictors. Select the best approach with model selection (Section 6).


10. Do you want a stable solution (to improve performance and/or understanding)? If yes, sub-sample your data and redo your analysis for several “bootstraps” (Section 7.1)




版权声明:本文为博主原创文章,未经博主允许不得转载。

the steps that may be taken to solve a feature selection problem:特征选择的步骤

原文:http://blog.csdn.net/mmc2015/article/details/47449765

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!