DataFrame.
dropna
(self,axis = 0,how =‘any‘,thresh = None,subset = None,inplace = False )
axis:
0:删除包含缺失值的行。
1:删除包含缺失值的列。
how:
any‘:如果一行/列存在任何NA值,则删除该行或列。
‘all‘:如果一行/列所有值均为NA,则删除该行或列。
thresh:一个可选的int
subset:在哪些列中查看是否有NAN值
inplace
sample_incomplete_rows.dropna(subset=["total_bedrooms"])
DataFrame.
drop
(self,labels = None,axis = 0,index = None,column = None,level = None,inplace = False,errors =‘raise‘ )
labels:要删除的索引index或列column的标签
axis:
column:
level:代表标签所在级别,接受int/索引名,默认为None
errors:默认为"raise"
ignore:抑制错误,仅当标签存在时才会删除,需要删除的标签不存在也不会报错
sample_incomplete_rows.drop("total_bedrooms", axis=1)
DataFrame.
fillna
(self,value = None,method = None,axis = None,inplace = False,limit = None,downcast = None ) →Union [ForwardRef(‘DataFrame‘),NoneType]
method:{‘backfill‘,‘bfill‘,‘pad‘,‘ffill‘,None}
,默认为None
limit:int, 默认值None
pd.cut(d_cut[‘number‘], 4)
downcast:如果可以,将向下转换为适当的相等类型,如float64->int64
median = housing["total_bedrooms"].median()
sample_incomplete_rows["total_bedrooms"].fillna(median, inplace=True)
pandas.
cut
(x, bins, right: bool = True, labels=None, retbins: bool = False, precision: int = 3, include_lowest: bool = False, duplicates: str = ‘raise‘)
x:
要合并的数组,必须是一维的。
bins:
right:
labels:
rebins:
precision:
include_lowest:
duplicates:
import pandas as pd
import numpy as np
factors = np.array([29, 37, 46, 52, 77])
print(‘原数组为:‘,factors)
print(‘bins=4的情况下:‘,pd.cut(factors, 4))
print(‘bins为列表的情况下:‘,pd.cut(factors, bins=[25,36,43,50,57,67,80]))
print(pd.cut(factors, 4, labels=False))
print(pd.cut(factors, 4, labels=["分组1", "分组2", "分组3", "分组4"]))
如结果所示,其中labels是根据分组来确定他是哪个标签的,如图(28.952, 41.0]是第一个划分区间,20,37都在内,所以他们的标签都是0.
只要该列或行有空值或NA值,就返回True,否则返回False
pandas系列:drop,dropna,fillna,cut,isnull用法
原文:https://www.cnblogs.com/ApStar/p/13029047.html