pandas的Categorical方法

时间：2018-09-09 18:53:54 阅读：208 评论：0 收藏：0 [点我收藏+]

　　对于数据样本的标签，如果我们事先不知道这个样本有多少类别，那么可以对数据集的类别列进行统计，这时我们用pandas的Categorical方法就非常快的实现。

1.说明：　

　　你的数据最好是一个series，之后，pd.Categorical(series)，得到分类的对象，使用categories或者codes进行查看

2.操作：

pd.Categorical( list ).codes 这样就可以直接得到原始数据的对应的序号列表，通过这样的处理可以将类别信息转化成数值信息，这样就可以应用到模型中去了

代码：

 1 import tensorflow
 2 import lightgbm as lgb
 3 import pandas as pd
 4 
 5 
 6 class Deng(object):
 7     def __init__(self):
 8         pass
 9 
10     def main(self):
11         temp = [‘a‘, ‘a‘, ‘b‘, ‘c‘, ‘c‘]
12         st = pd.Categorical(temp)
13         print(st)
14         # [a, a, b, c, c]
15         # Categories(3, object): [a, b, c]
16 
17         # 遍历temp指出temp中每个字符所属类别的位置索引
18         st2 = st.codes
19         print(st2)
20         # [0 0 1 2 2]
21 
22 
23 if __name__ == ‘__main__‘:
24     obj = Deng()
25     obj.main()

pandas的Categorical方法

原文：https://www.cnblogs.com/demo-deng/p/9614377.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)