关于这个数据集,在这个挑战中,您将获得一个用户列表以及他们的人口统计数据、web会话记录和一些汇总统计信息。您被要求预测新用户的第一个预订目的地将是哪个国家。这个数据集中的所有用户都来自美国。
目的地国家有12种可能的结果:“US”、“FR”、“CA”、“GB”、“ES”、“IT”、“PT”、“NL”、“DE”、“AU”、“NDF”(没有找到目的地)和“other”。请注意,“NDF”与“other”不同,因为“other”表示有预订,但指的是未包含在列表中的国家,而“NDF”表示没有预订。
总共包含6个csv文件
基于jupyter notebook 和 python3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn as sk
%matplotlib inline
import datetime
import os
import seaborn as sns # 数据可视化
from datetime import date
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelBinarizer
import pickle # 用于存储模型
from sklearn.metrics import *
from sklearn.model_selection import *
# 训练数据
train = pd.read_csv("train_users_2.csv")
# 测试数据
test = pd.read_csv("test_users.csv")
# 训练数据的列名
print('the columns name of training dataset:\n',train.columns)
# 测试数据的列名
print('the columns name of test dataset:\n',test.columns)
the columns name of training dataset:
Index(['id', 'date_account_created', 'timestamp_first_active',
'date_first_booking', 'gender', 'age', 'signup_method', 'signup_flow',
'language', 'affiliate_channel', 'affiliate_provider',
'first_affiliate_tracked', 'signup_app', 'first_device_type',
'first_browser', 'country_destination'],
dtype='object')
the columns name of test dataset:
Index(['id', 'date_account_created', 'timestamp_first_active',
'date_first_booking', 'gender', 'age', 'signup_method', 'signup_flow',
'language', 'affiliate_channel', 'affiliate_provider',
'first_affiliate_tracked', 'signup_app', 'first_device_type',
'first_browser'],
dtype='object')
分析:
print(train.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 213451 entries, 0 to 213450
Data columns (total 16 columns):
id 213451 non-null object
date_account_created 213451 non-null object
timestamp_first_active 213451 non-null int64
date_first_booking 88908 non-null object
gender 213451 non-null object
age 125461 non-null float64
signup_method 213451 non-null object
signup_flow 213451 non-null int64
language 213451 non-null object
affiliate_channel 213451 non-null object
affiliate_provider 213451 non-null object
first_affiliate_tracked 207386 non-null object
signup_app 213451 non-null object
first_device_type 213451 non-null object
first_browser 213451 non-null object
country_destination 213451 non-null object
dtypes: float64(1), int64(2), object(13)
memory usage: 26.1+ MB
None
分析:
查看date_account_created前几行数据
print(train.date_account_created.head())
0 2010-06-28
1 2011-05-25
2 2010-09-28
3 2011-12-05
4 2010-09-14
Name: date_account_created, dtype: object
对date_account_created数据进行统计
print(train.date_account_created.value_counts().head())
print(train.date_account_created.value_counts().tail())
2014-05-13 674
2014-06-24 670
2014-06-25 636
2014-05-20 632
2014-05-14 622
Name: date_account_created, dtype: int64
2010-01-01 1
2010-01-02 1
2010-06-18 1
2010-01-31 1
2010-02-14 1
Name: date_account_created, dtype: int64
获取date_account_created信息
print(train.date_account_created.describe())
count 213451
unique 1634
top 2014-05-13
freq 674
Name: date_account_created, dtype: object
观察用户增长情况
dac_train = train.date_account_created.value_counts()
dac_test = test.date_account_created.value_counts()
# 将数据类型转换为datatime类型
dac_train_date = pd.to_datetime(train.date_account_created.value_counts().index)
dac_test_date = pd.to_datetime(test.date_account_created.value_counts().index)
# 计算离首次注册时间相差的天数
dac_train_day = dac_train_date - dac_train_date.min()
dac_test_day = dac_test_date - dac_train_date.min()
# motplotlib作图
plt.scatter(dac_train_day.days, dac_train.values, color = 'r', label = 'train dataset')
plt.scatter(dac_test_day.days, dac_test.values, color = 'b', label = 'test dataset')
plt.title("Accounts created vs day")
plt.xlabel("Days")
plt.ylabel("Accounts created")
plt.legend(loc = 'upper left')
<matplotlib.legend.Legend at 0xca4fbd7278>
分析:
查看头几行数据
print(train.timestamp_first_active.head())
0 20090319043255
1 20090523174809
2 20090609231247
3 20091031060129
4 20091208061105
Name: timestamp_first_active, dtype: int64
对数据进行统计看非重复值的数量
print(train.timestamp_first_active.value_counts().unique())
[1]
分析: 结果[1]表明timestamp_first_active没有重复数据
将时间戳转成日期形式并获取数据信息
tfa_train_dt = train.timestamp_first_active.astype(str).apply(lambda x:
datetime.datetime(int(x[:4]),
int(x[4:6]),
int(x[6:8]),
int(x[8:10]),
int(x[10:12]),
int(x[12:])))
print(tfa_train_dt.describe())
count 213451
unique 213451
top 2013-07-01 05:26:34
freq 1
first 2009-03-19 04:32:55
last 2014-06-30 23:58:24
Name: timestamp_first_active, dtype: object
tfa_train_dt.head()
0 2009-03-19 04:32:55
1 2009-05-23 17:48:09
2 2009-06-09 23:12:47
3 2009-10-31 06:01:29
4 2009-12-08 06:11:05
Name: timestamp_first_active, dtype: datetime64[ns]
获取数据信息
print(train.date_first_booking.describe())
print(test.date_first_booking.describe())
count 88908
unique 1976
top 2014-05-22
freq 248
Name: date_first_booking, dtype: object
count 0.0
mean NaN
std NaN
min NaN
25% NaN
50% NaN
75% NaN
max NaN
Name: date_first_booking, dtype: float64
分析:
对数据进行统计
print(train.age.value_counts().head())
30.0 6124
31.0 6016
29.0 5963
28.0 5939
32.0 5855
Name: age, dtype: int64
分析:用户年龄主要集中在30左右
柱状图统计
# 首先将年龄进行分成4组missing values, too small age, reasonable age, too large age
age_train =[train[train.age.isnull()].age.shape[0],
train.query('age < 15').age.shape[0],
train.query("age >= 15 & age <= 90").age.shape[0],
train.query('age > 90').age.shape[0]]
age_test = [test[test.age.isnull()].age.shape[0],
test.query('age < 15').age.shape[0],
test.query("age >= 15 & age <= 90").age.shape[0],
test.query('age > 90').age.shape[0]]
columns = ['Null', 'age < 15', 'age', 'age > 90']
# plot
fig, (ax1,ax2) = plt.subplots(1, 2, sharex=True, sharey = True, figsize=(10,5))
sns.barplot(columns, age_train, ax = ax1)
sns.barplot(columns, age_test, ax = ax2)
ax1.set_title('training dataset')
ax2.set_title('test dataset')
ax1.set_ylabel('counts')
Text(0, 0.5, 'counts')
分析:异常年龄较少,且有一定数量的缺失值
其他特征
统一使用柱状图进行统计
def feature_barplot(feature, df_train = train, df_test = test, figsize=(10,5), rot = 90, saveimg = False):
feat_train = df_train[feature].value_counts()
feat_test = df_test[feature].value_counts()
fig_feature, (axis1,axis2) = plt.subplots(1, 2, sharex=True, sharey=True, figsize=figsize)
sns.barplot(feat_train.index.values, feat_train.values, ax = axis1)
sns.barplot(feat_test.index.values, feat_test.values, ax = axis2)
axis1.set_xticklabels(axis1.xaxis.get_majorticklabels(), rotation = rot)
axis2.set_xticklabels(axis1.xaxis.get_majorticklabels(), rotation = rot)
axis1.set_title(feature + ' of training dataset')
axis2.set_title(feature + ' of test dataset')
axis1.set_ylabel('Counts')
plt.tight_layout()
if saveimg == True:
figname = feature + ".png"
fig_feature.savefig(figname, dpi = 75)
feature_barplot('gender', saveimg = True)
feature_barplot('signup_method')
feature_barplot('signup_flow')
feature_barplot('language')
feature_barplot('affiliate_channel')
feature_barplot('first_affiliate_tracked')
feature_barplot('signup_app')
feature_barplot('first_device_type')
feature_barplot('first_browser')
获取数据并查看头10行数据
df_sessions = pd.read_csv('sessions.csv')
df_sessions.head(10)
user_id | action | action_type | action_detail | device_type | secs_elapsed | |
---|---|---|---|---|---|---|
0 | d1mm9tcy42 | lookup | NaN | NaN | Windows Desktop | 319.0 |
1 | d1mm9tcy42 | search_results | click | view_search_results | Windows Desktop | 67753.0 |
2 | d1mm9tcy42 | lookup | NaN | NaN | Windows Desktop | 301.0 |
3 | d1mm9tcy42 | search_results | click | view_search_results | Windows Desktop | 22141.0 |
4 | d1mm9tcy42 | lookup | NaN | NaN | Windows Desktop | 435.0 |
5 | d1mm9tcy42 | search_results | click | view_search_results | Windows Desktop | 7703.0 |
6 | d1mm9tcy42 | lookup | NaN | NaN | Windows Desktop | 115.0 |
7 | d1mm9tcy42 | personalize | data | wishlist_content_update | Windows Desktop | 831.0 |
8 | d1mm9tcy42 | index | view | view_search_results | Windows Desktop | 20842.0 |
9 | d1mm9tcy42 | lookup | NaN | NaN | Windows Desktop | 683.0 |
将user_id改名为id
# 这是为了后面的数据合并
df_sessions['id'] = df_sessions['user_id']
df_sessions = df_sessions.drop(['user_id'],axis=1) # 按行删除
查看数据的shape
df_sessions.shape
(10567737, 6)
分析:session文件有10567737行数据,6个特征
查看缺失值
df_sessions.isnull().sum()
action 79626
action_type 1126204
action_detail 1126204
device_type 0
secs_elapsed 136031
id 34496
dtype: int64
分析:action,action_type,action_detail, secs_elapsed缺失值较多
填充缺失值
df_sessions.action = df_sessions.action.fillna('NAN')
df_sessions.action_type = df_sessions.action_type.fillna('NAN')
df_sessions.action_detail = df_sessions.action_detail.fillna('NAN')
df_sessions.isnull().sum()
action 0
action_type 0
action_detail 0
device_type 0
secs_elapsed 136031
id 34496
dtype: int64
分析:
在对数据有一定了解后,我们进行特征提取工作
action
df_sessions.action.head()
0 lookup
1 search_results
2 lookup
3 search_results
4 lookup
Name: action, dtype: object
df_sessions.action.value_counts().min()
1
分析:对action进行统计,我们可以发现用户action有多种,且最少的发生次数只有1,接下来我们可以对用户发生次数较少的行为列为OTHER一类
将特征action次数低于阈值100的列为OTHER
# Action values with low frequency are changed to 'OTHER'
act_freq = 100 # Threshold of frequency
act = dict(zip(*np.unique(df_sessions.action, return_counts=True)))
df_sessions.action = df_sessions.action.apply(lambda x: 'OTHER' if act[x] < act_freq else x)
# np.unique(df_sessions.action, return_counts=True) 取以数组形式返回非重复的action值和它的数量
# zip(*(a,b))a,b种元素一一对应,返回zip object
对特征action,action_detail,action_type,device_type,secs_elapsed进行细化
# 对action特征进行细化,各个取值的数量并排序
f_act = df_sessions.action.value_counts().argsort()
f_act_detail = df_sessions.action_detail.value_counts().argsort()
f_act_type = df_sessions.action_type.value_counts().argsort()
f_dev_type = df_sessions.device_type.value_counts().argsort()
# 按照id进行分组
dgr_sess = df_sessions.groupby(['id'])
# 循环遍历dgr_sess创建所有特征
samples = [] # samples列表
ln = len(dgr_sess) # 计算分组后df_sessions的长度
# 对dgr_sess中每个id的数据进行遍历
for g in dgr_sess:
gr = g[1] # data frame that comtains all the data for a groupby value 'zzywmcn0jv'
l = [] # 建一个空列表,临时存放特征
# the id for example:'zzywmcn0jv'
l.append(g[0]) # 将id值放入空列表中
# number of total actions
l.append(len(gr)) # 将id对应数据的长度放入列表
# secs_elapsed 特征中的缺失值用0填充再获取具体的停留时长值
sev = gr.secs_elapsed.fillna(0).values # These values are used later.
# action features 特征-用户行为
# 每个用户行为出现的次数,各个行为类型的数量,平均值以及标准差
c_act = [0] * len(f_act)
for i,v in enumerate(gr.action.values): # i是从0-1对应的位置,v 是用户行为特征的值
c_act[f_act[v]] += 1
_, c_act_uqc = np.unique(gr.action.values, return_counts=True)
# 计算用户行为行为特征各个类型数量的长度,平均值以及标准差
c_act += [len(c_act_uqc), np.mean(c_act_uqc), np.std(c_act_uqc)]
l = l + c_act
# action_detail features 特征-用户行为具体
# (how many times each value occurs, numb of unique values, mean and std)
c_act_detail = [0] * len(f_act_detail)
for i,v in enumerate(gr.action_detail.values):
c_act_detail[f_act_detail[v]] += 1
_, c_act_det_uqc = np.unique(gr.action_detail.values, return_counts=True)
c_act_detail += [len(c_act_det_uqc), np.mean(c_act_det_uqc), np.std(c_act_det_uqc)]
l = l + c_act_detail
# action_type features 特征-用户行为类型 click等
# (how many times each value occurs, numb of unique values, mean and std
# + log of the sum of secs_elapsed for each value)
l_act_type = [0] * len(f_act_type)
c_act_type = [0] * len(f_act_type)
for i,v in enumerate(gr.action_type.values):
l_act_type[f_act_type[v]] += sev[i] #sev = gr.secs_elapsed.fillna(0).values ,求每个行为类型总的停留时长
c_act_type[f_act_type[v]] += 1
l_act_type = np.log(1 + np.array(l_act_type)).tolist() #每个行为类型总的停留时长,差异比较大,进行log处理
_, c_act_type_uqc = np.unique(gr.action_type.values, return_counts=True)
c_act_type += [len(c_act_type_uqc), np.mean(c_act_type_uqc), np.std(c_act_type_uqc)]
l = l + c_act_type + l_act_type
# device_type features 特征-设备类型
# (how many times each value occurs, numb of unique values, mean and std)
c_dev_type = [0] * len(f_dev_type)
for i,v in enumerate(gr.device_type .values):
c_dev_type[f_dev_type[v]] += 1
c_dev_type.append(len(np.unique(gr.device_type.values)))
_, c_dev_type_uqc = np.unique(gr.device_type.values, return_counts=True)
c_dev_type += [len(c_dev_type_uqc), np.mean(c_dev_type_uqc), np.std(c_dev_type_uqc)]
l = l + c_dev_type
# secs_elapsed features 特征-停留时长
l_secs = [0] * 5
l_log = [0] * 15
if len(sev) > 0:
# Simple statistics about the secs_elapsed values.
l_secs[0] = np.log(1 + np.sum(sev))
l_secs[1] = np.log(1 + np.mean(sev))
l_secs[2] = np.log(1 + np.std(sev))
l_secs[3] = np.log(1 + np.median(sev))
l_secs[4] = l_secs[0] / float(l[1]) #
# Values are grouped in 15 intervals. Compute the number of values
# in each interval.
# sev = gr.secs_elapsed.fillna(0).values
log_sev = np.log(1 + sev).astype(int)
# np.bincount():Count number of occurrences of each value in array of non-negative ints.
l_log = np.bincount(log_sev, minlength=15).tolist()
l = l + l_secs + l_log
# The list l has the feature values of one sample.
samples.append(l)
# preparing objects
samples = np.array(samples)
samp_ar = samples[:, 1:].astype(np.float16) #取除id外的特征数据
samp_id = samples[:, 0] #取id,id位于第一列
# 为提取的特征创建一个dataframe
col_names = [] #name of the columns
for i in range(len(samples[0])-1): #减1的原因是因为有个id
col_names.append('c_' + str(i)) #起名字的方式
df_agg_sess = pd.DataFrame(samp_ar, columns=col_names)
df_agg_sess['id'] = samp_id
df_agg_sess.index = df_agg_sess.id #将id作为index
df_agg_sess.head()
c_0 | c_1 | c_2 | c_3 | c_4 | c_5 | c_6 | c_7 | c_8 | c_9 | ... | c_448 | c_449 | c_450 | c_451 | c_452 | c_453 | c_454 | c_455 | c_456 | id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
00023iyk9l | 40.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 12.0 | 6.0 | 2.0 | 3.0 | 3.0 | 1.0 | 0.0 | 1.0 | 0.0 | 00023iyk9l |
0010k6l0om | 63.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 8.0 | 12.0 | 2.0 | 8.0 | 4.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0010k6l0om |
001wyh0pz8 | 90.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 27.0 | 30.0 | 9.0 | 8.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 001wyh0pz8 |
0028jgx1x1 | 31.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.0 | 2.0 | 3.0 | 5.0 | 4.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0028jgx1x1 |
002qnbzfs5 | 789.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 111.0 | 102.0 | 104.0 | 57.0 | 28.0 | 9.0 | 4.0 | 1.0 | 1.0 | 002qnbzfs5 |
5 rows × 458 columns
分析:经过特征提取后,session文件由6个特征变为458个特征
标记train文件的行数和存储我们进行预测的目标变量
train = pd.read_csv("train_users_2.csv")
test = pd.read_csv("test_users.csv")
#计算出train的行数,便于之后对train和test数据进行分离操作
train_row = train.shape[0]
# The label we need to predict
labels = train['country_destination'].values
删除date_first_booking和train文件中的country_destination
train.drop(['country_destination', 'date_first_booking'], axis = 1, inplace = True)
test.drop(['date_first_booking'], axis = 1, inplace = True)
合并train和test文件
#连接test 和 train
df = pd.concat([train, test], axis = 0, ignore_index = True)
tfa = df.timestamp_first_active.astype(str).apply(lambda x: datetime.datetime(int(x[:4]),
int(x[4:6]),
int(x[6:8]),
int(x[8:10]),
int(x[10:12]),
int(x[12:])))
# create tfa_year, tfa_month, tfa_day feature
df['tfa_year'] = np.array([x.year for x in tfa])
df['tfa_month'] = np.array([x.month for x in tfa])
df['tfa_day'] = np.array([x.day for x in tfa])
#isoweekday() 可以返回一周的星期几,e.g.星期日:0;星期一:1
df['tfa_wd'] = np.array([x.isoweekday() for x in tfa])
df_tfa_wd = pd.get_dummies(df.tfa_wd, prefix = 'tfa_wd') # one hot encoding
df = pd.concat((df, df_tfa_wd), axis = 1) #添加df['tfa_wd'] 编码后的特征
df.drop(['tfa_wd'], axis = 1, inplace = True)#删除原有未编码的特征
Y = 2000
seasons = [(0, (date(Y, 1, 1), date(Y, 3, 20))), #'winter'
(1, (date(Y, 3, 21), date(Y, 6, 20))), #'spring'
(2, (date(Y, 6, 21), date(Y, 9, 22))), #'summer'
(3, (date(Y, 9, 23), date(Y, 12, 20))), #'autumn'
(0, (date(Y, 12, 21), date(Y, 12, 31)))] #'winter'
def get_season(dt):
dt = dt.date() #获取日期
dt = dt.replace(year=Y) #将年统一换成2000年
return next(season for season, (start, end) in seasons if start <= dt <= end)
df['tfa_season'] = np.array([get_season(x) for x in tfa])
df_tfa_season = pd.get_dummies(df.tfa_season, prefix = 'tfa_season') # one hot encoding
df = pd.concat((df, df_tfa_season), axis = 1)
df.drop(['tfa_season'], axis = 1, inplace = True)
将date_account_created转换为datetime类型
dac = pd.to_datetime(df.date_account_created)
# create year, month, day feature for dac
df['dac_year'] = np.array([x.year for x in dac])
df['dac_month'] = np.array([x.month for x in dac])
df['dac_day'] = np.array([x.day for x in dac])
# create features of weekday for dac
df['dac_wd'] = np.array([x.isoweekday() for x in dac])
df_dac_wd = pd.get_dummies(df.dac_wd, prefix = 'dac_wd')
df = pd.concat((df, df_dac_wd), axis = 1)
df.drop(['dac_wd'], axis = 1, inplace = True)
# create season features fro dac
df['dac_season'] = np.array([get_season(x) for x in dac])
df_dac_season = pd.get_dummies(df.dac_season, prefix = 'dac_season')
df = pd.concat((df, df_dac_season), axis = 1)
df.drop(['dac_season'], axis = 1, inplace = True)
dt_span = dac.subtract(tfa).dt.days
dt_span.value_counts().head(10)
-1 275369
0 7
6 4
5 4
1 4
2 3
3 3
4 3
28 3
94 2
dtype: int64
分析:数据主要集中在-1,可以猜测,用户当天注册dt_span值便是-1
# create categorical feature: span = -1; -1 < span < 30; 31 < span < 365; span > 365
def get_span(dt):
# dt is an integer
if dt == -1:
return 'OneDay'
elif (dt < 30) & (dt > -1):
return 'OneMonth'
elif (dt >= 30) & (dt <= 365):
return 'OneYear'
else:
return 'other'
df['dt_span'] = np.array([get_span(x) for x in dt_span])
df_dt_span = pd.get_dummies(df.dt_span, prefix = 'dt_span')
df = pd.concat((df, df_dt_span), axis = 1)
df.drop(['dt_span'], axis = 1, inplace = True)
df.drop(['date_account_created','timestamp_first_active'], axis = 1, inplace = True)
#Age 获取年龄
av = df.age.values
#This are birthdays instead of age (estimating age by doing 2014 - value)
#数据来自2014年,故用2014-value
av = np.where(np.logical_and(av<2000, av>1900), 2014-av, av)
df['age'] = av
E:\Anaconda3\envs\sklearn\lib\site-packages\ipykernel_launcher.py:3: RuntimeWarning: invalid value encountered in less
This is separate from the ipykernel package so we can avoid doing imports until
E:\Anaconda3\envs\sklearn\lib\site-packages\ipykernel_launcher.py:3: RuntimeWarning: invalid value encountered in greater
This is separate from the ipykernel package so we can avoid doing imports until
# Age has many abnormal values that we need to deal with.
age = df.age
age.fillna(-1, inplace = True) #空值填充为-1
div = 15
def get_age(age):
# age is a float number 将连续型转换为离散型
if age < 0:
return 'NA' #表示是空值
elif (age < div):
return div #如果年龄小于15岁,那么返回15岁
elif (age <= div * 2):
return div*2 #如果年龄大于15小于等于30岁,则返回30岁
elif (age <= div * 3):
return div * 3
elif (age <= div * 4):
return div * 4
elif (age <= div * 5):
return div * 5
elif (age <= 110):
return div * 6
else:
return 'Unphysical' #非正常年龄
df['age'] = np.array([get_age(x) for x in age])
df_age = pd.get_dummies(df.age, prefix = 'age')
df = pd.concat((df, df_age), axis = 1)
df.drop(['age'], axis = 1, inplace = True)
feat_toOHE = ['gender',
'signup_method',
'signup_flow',
'language',
'affiliate_channel',
'affiliate_provider',
'first_affiliate_tracked',
'signup_app',
'first_device_type',
'first_browser']
#对其他特征进行one-hot-encoding处理
for f in feat_toOHE:
df_ohe = pd.get_dummies(df[f], prefix=f, dummy_na=True)
df.drop([f], axis = 1, inplace = True)
df = pd.concat((df, df_ohe), axis = 1)
我们将对session以及train,test文件中提取的特征进行合并
#将对session提取的特征整合到一起
df_all = pd.merge(df, df_agg_sess, how='left')
df_all = df_all.drop(['id'], axis=1) #删除id
df_all = df_all.fillna(-2) #对没有sesssion data的特征进行缺失值处理
#加了一列,表示每一行总共有多少空值,这也作为一个特征
df_all['all_null'] = np.array([sum(r<0) for r in df_all.values])
E:\Anaconda3\envs\sklearn\lib\site-packages\IPython\core\interactiveshell.py:3267: FutureWarning: 'id' is both an index level and a column label.
Defaulting to column, but this will raise an ambiguity error in a future version
exec(code_obj, self.user_global_ns, self.user_ns)
将train和test数据进行分离操作
Xtrain = df_all.iloc[:train_row, :]
Xtest = df_all.iloc[train_row:, :]
将提取的特征生成csv文件
Xtrain.to_csv("Airbnb_xtrain_v2.csv")
Xtest.to_csv("Airbnb_xtest_v2.csv")
#labels.tofile():Write array to a file as text or binary (default)
labels.tofile("Airbnb_ytrain_v2.csv", sep='\n', format='%s') #存放目标变量
读取特征文件
xtrain = pd.read_csv("Airbnb_xtrain_v2.csv",index_col=0)
ytrain = pd.read_csv("Airbnb_ytrain_v2.csv", header=None)
xtrain.head()
tfa_year | tfa_month | tfa_day | tfa_wd_1 | tfa_wd_2 | tfa_wd_3 | tfa_wd_4 | tfa_wd_5 | tfa_wd_6 | tfa_wd_7 | ... | c_448 | c_449 | c_450 | c_451 | c_452 | c_453 | c_454 | c_455 | c_456 | all_null | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2009 | 3 | 19 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | ... | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | 457 |
1 | 2009 | 5 | 23 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | 457 |
2 | 2009 | 6 | 9 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | 457 |
3 | 2009 | 10 | 31 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | 457 |
4 | 2009 | 12 | 8 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | 457 |
5 rows × 661 columns
ytrain.head()
0 | |
---|---|
0 | NDF |
1 | NDF |
2 | US |
3 | other |
4 | US |
分析:可以发现经过特征提取后特征文件xtrain扩展为665个特征,ytrain中包含训练集中的目标变量
将目标变量进行labels encoding
le = LabelEncoder()
ytrain_le = le.fit_transform(ytrain.values)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\preprocessing\label.py:235: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
ytrain_le
array([ 7, 7, 10, ..., 7, 7, 7])
提取10%的数据进行模型训练
# Let us take 10% of the data for faster training.
n = int(xtrain.shape[0]*0.1)
xtrain_new = xtrain.iloc[:n, :] #训练数据
ytrain_new = ytrain_le[:n] #训练数据的目标变量
标准化数据集
X_scaler = StandardScaler()
xtrain_new = X_scaler.fit_transform(xtrain_new)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\preprocessing\data.py:617: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
return self.partial_fit(X, y)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\base.py:462: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
return self.fit(X, **fit_params).transform(X)
from sklearn.metrics import make_scorer
def dcg_score(y_true, y_score, k=5):
"""
y_true : array, shape = [n_samples] #数据
Ground truth (true relevance labels).
y_score : array, shape = [n_samples, n_classes] #预测的分数
Predicted scores.
k : int
"""
order = np.argsort(y_score)[::-1] #分数从高到低排序
y_true = np.take(y_true, order[:k]) #取出前k[0,k)个分数
gain = 2 ** y_true - 1
discounts = np.log2(np.arange(len(y_true)) + 2)
return np.sum(gain / discounts)
def ndcg_score(ground_truth, predictions, k=5):
"""
Parameters
----------
ground_truth : array, shape = [n_samples]
Ground truth (true labels represended as integers).
predictions : array, shape = [n_samples, n_classes]
Predicted probabilities. 预测的概率
k : int
Rank.
"""
lb = LabelBinarizer()
lb.fit(range(len(predictions) + 1))
T = lb.transform(ground_truth)
scores = []
# Iterate over each y_true and compute the DCG score
for y_true, y_score in zip(T, predictions):
actual = dcg_score(y_true, y_score, k)
best = dcg_score(y_true, y_true, k)
score = float(actual) / float(best)
scores.append(score)
return np.mean(scores)
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
lr = LogisticRegression(C = 1.0, penalty='l2', multi_class='ovr')
RANDOM_STATE = 2017 #随机种子
#k-fold cross validation(k-折叠交叉验证)
kf = KFold(n_splits=5, random_state=RANDOM_STATE) #分成5个组
train_score = []
cv_score = []
# select a k (value how many y):
k_ndcg = 3
# kf.split: Generate indices to split data into training and test set.
for train_index, test_index in kf.split(xtrain_new, ytrain_new):
#训练集数据分割为训练集和测试集,y是目标变量
X_train, X_test = xtrain_new[train_index, :], xtrain_new[test_index, :]
y_train, y_test = ytrain_new[train_index], ytrain_new[test_index]
lr.fit(X_train, y_train)
y_pred = lr.predict_proba(X_test)
train_ndcg_score = ndcg_score(y_train, lr.predict_proba(X_train), k = k_ndcg)
cv_ndcg_score = ndcg_score(y_test, y_pred, k=k_ndcg)
train_score.append(train_ndcg_score)
cv_score.append(cv_ndcg_score)
print ("\nThe training score is: {}".format(np.mean(train_score)))
print ("\nThe cv score is: {}".format(np.mean(cv_score)))
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
The training score is: 0.7595157690333219
The cv score is: 0.7417455860527811
learning curve of logistic regression
# set the iterations
iteration = [1,5,10,15,20, 50, 100]
kf = KFold(n_splits=3, random_state=RANDOM_STATE)
train_score = []
cv_score = []
# select a k:
k_ndcg = 5
for i, item in enumerate(iteration):
lr = LogisticRegression(C=1.0, max_iter=item, tol=1e-5, solver='newton-cg', multi_class='ovr')
train_score_iter = []
cv_score_iter = []
for train_index, test_index in kf.split(xtrain_new, ytrain_new):
X_train, X_test = xtrain_new[train_index, :], xtrain_new[test_index, :]
y_train, y_test = ytrain_new[train_index], ytrain_new[test_index]
lr.fit(X_train, y_train)
y_pred = lr.predict_proba(X_test)
train_ndcg_score = ndcg_score(y_train, lr.predict_proba(X_train), k = k_ndcg)
cv_ndcg_score = ndcg_score(y_test, y_pred, k=k_ndcg)
train_score_iter.append(train_ndcg_score)
cv_score_iter.append(cv_ndcg_score)
train_score.append(np.mean(train_score_iter))
cv_score.append(np.mean(cv_score_iter))
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
ymin = np.min(cv_score)-0.05
ymax = np.max(train_score)+0.05
plt.figure(figsize=(9,4))
plt.plot(iteration, train_score, 'ro-', label = 'training')
plt.plot(iteration, cv_score, 'b*-', label = 'Cross-validation')
plt.xlabel("iterations")
plt.ylabel("Score")
plt.xlim(-5, np.max(iteration)+10)
plt.ylim(ymin, ymax)
plt.plot(np.linspace(20,20,50), np.linspace(ymin, ymax, 50), 'g--')
plt.legend(loc = 'lower right', fontsize = 12)
plt.title("Score vs iteration learning curve")
plt.tight_layout()
分析:随着iteration的增大,逻辑回归模型的评分在不断升高,当iteration超过20的时候,模型的评分基本不变
# Chaning the sampling size
# set the iter to the best iteration: iter = 20
perc = [0.01,0.02,0.05,0.1,0.2,0.5,1]
kf = KFold(n_splits=3, random_state=RANDOM_STATE)
train_score = []
cv_score = []
# select a k:
k_ndcg = 5
for i, item in enumerate(perc):
lr = LogisticRegression(C=1.0, max_iter=20, tol=1e-6, solver='newton-cg', multi_class='ovr')
train_score_iter = []
cv_score_iter = []
n = int(xtrain_new.shape[0]*item)
xtrain_perc = xtrain_new[:n, :]
ytrain_perc = ytrain_new[:n]
for train_index, test_index in kf.split(xtrain_perc, ytrain_perc):
X_train, X_test = xtrain_perc[train_index, :], xtrain_perc[test_index, :]
y_train, y_test = ytrain_perc[train_index], ytrain_perc[test_index]
print(X_train.shape, X_test.shape)
lr.fit(X_train, y_train)
y_pred = lr.predict_proba(X_test)
train_ndcg_score = ndcg_score(y_train, lr.predict_proba(X_train), k = k_ndcg)
cv_ndcg_score = ndcg_score(y_test, y_pred, k=k_ndcg)
train_score_iter.append(train_ndcg_score)
cv_score_iter.append(cv_ndcg_score)
train_score.append(np.mean(train_score_iter))
cv_score.append(np.mean(cv_score_iter))
(142, 661) (71, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(142, 661) (71, 661)
(142, 661) (71, 661)
(284, 661) (142, 661)
(284, 661) (142, 661)
(284, 661) (142, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(711, 661) (356, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(711, 661) (356, 661)
(712, 661) (355, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(1422, 661) (712, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(1423, 661) (711, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(1423, 661) (711, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(2846, 661) (1423, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(2846, 661) (1423, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(2846, 661) (1423, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(7114, 661) (3558, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(7115, 661) (3557, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(7115, 661) (3557, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(14230, 661) (7115, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(14230, 661) (7115, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
(14230, 661) (7115, 661)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
E:\Anaconda3\envs\sklearn\lib\site-packages\sklearn\utils\optimize.py:203: ConvergenceWarning: newton-cg failed to converge. Increase the number of iterations.
"number of iterations.", ConvergenceWarning)
ymin = np.min(cv_score)-0.1
ymax = np.max(train_score)+0.1
plt.figure(figsize=(9,4))
plt.plot(np.array(perc)*100, train_score, 'ro-', label = 'training')
plt.plot(np.array(perc)*100, cv_score, 'bo-', label = 'Cross-validation')
plt.xlabel("Sample size (unit %)")
plt.ylabel("Score")
plt.xlim(-5, np.max(perc)*100+10)
plt.ylim(ymin, ymax)
plt.legend(loc = 'lower right', fontsize = 12)
plt.title("Score vs sample size learning curve")
plt.tight_layout()
分析:随着数据量的增加,逻辑回归模型对测试集的预测评分(蓝色线)在不断上升,因为我们在训练模型时只用了10%的数据,如果使用全部的数据,效果可能会更好
其中的模型包括DecisionTree,RandomForest,AdaBoost,Bagging,ExtraTree,GraBoost
from sklearn.ensemble import AdaBoostClassifier, BaggingClassifier, ExtraTreesClassifier
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import *
from sklearn.svm import SVC, LinearSVC, NuSVC
LEARNING_RATE = 0.1
N_ESTIMATORS = 50
RANDOM_STATE = 2017
MAX_DEPTH = 9
#建了一个tree字典
clf_tree ={
'DTree': DecisionTreeClassifier(max_depth=MAX_DEPTH,
random_state=RANDOM_STATE),
'RF': RandomForestClassifier(n_estimators=N_ESTIMATORS,
max_depth=MAX_DEPTH,
random_state=RANDOM_STATE),
'AdaBoost': AdaBoostClassifier(n_estimators=N_ESTIMATORS,
learning_rate=LEARNING_RATE,
random_state=RANDOM_STATE),
'Bagging': BaggingClassifier(n_estimators=N_ESTIMATORS,
random_state=RANDOM_STATE),
'ExtraTree': ExtraTreesClassifier(max_depth=MAX_DEPTH,
n_estimators=N_ESTIMATORS,
random_state=RANDOM_STATE),
'GraBoost': GradientBoostingClassifier(learning_rate=LEARNING_RATE,
max_depth=MAX_DEPTH,
n_estimators=N_ESTIMATORS,
random_state=RANDOM_STATE)
}
train_score = []
cv_score = []
kf = KFold(n_splits=3, random_state=RANDOM_STATE)
k_ndcg = 5
for key in clf_tree.keys():
clf = clf_tree.get(key)
train_score_iter = []
cv_score_iter = []
for train_index, test_index in kf.split(xtrain_new, ytrain_new):
X_train, X_test = xtrain_new[train_index, :], xtrain_new[test_index, :]
y_train, y_test = ytrain_new[train_index], ytrain_new[test_index]
clf.fit(X_train, y_train)
y_pred = clf.predict_proba(X_test)
train_ndcg_score = ndcg_score(y_train, clf.predict_proba(X_train), k = k_ndcg)
cv_ndcg_score = ndcg_score(y_test, y_pred, k=k_ndcg)
train_score_iter.append(train_ndcg_score)
cv_score_iter.append(cv_ndcg_score)
train_score.append(np.mean(train_score_iter))
cv_score.append(np.mean(cv_score_iter))
train_score_tree = train_score
cv_score_tree = cv_score
ymin = np.min(cv_score)-0.05
ymax = np.max(train_score)+0.05
x_ticks = clf_tree.keys()
plt.figure(figsize=(8,5))
plt.plot(range(len(x_ticks)), train_score_tree, 'ro-', label = 'training')
plt.plot(range(len(x_ticks)),cv_score_tree, 'bo-', label = 'Cross-validation')
plt.xticks(range(len(x_ticks)),x_ticks,rotation = 45, fontsize = 10)
plt.xlabel("Tree method", fontsize = 12)
plt.ylabel("Score", fontsize = 12)
plt.xlim(-0.5, 5.5)
plt.ylim(ymin, ymax)
plt.legend(loc = 'best', fontsize = 12)
plt.title("Different tree methods")
plt.tight_layout()
kaggle比赛中常用的一个模型
import xgboost as xgb
def customized_eval(preds, dtrain):
labels = dtrain.get_label()
top = []
for i in range(preds.shape[0]):
top.append(np.argsort(preds[i])[::-1][:5])
mat = np.reshape(np.repeat(labels,np.shape(top)[1]) == np.array(top).ravel(),np.array(top).shape).astype(int)
score = np.mean(np.sum(mat/np.log2(np.arange(2, mat.shape[1] + 2)),axis = 1))
return 'ndcg5', score
# xgboost parameters
NUM_XGB = 200
params = {}
params['colsample_bytree'] = 0.6
params['max_depth'] = 6
params['subsample'] = 0.8
params['eta'] = 0.3
params['seed'] = RANDOM_STATE
params['num_class'] = 12
params['objective'] = 'multi:softprob' # output the probability instead of class.
train_score_iter = []
cv_score_iter = []
kf = KFold(n_splits = 3, random_state=RANDOM_STATE)
k_ndcg = 5
for train_index, test_index in kf.split(xtrain_new, ytrain_new):
X_train, X_test = xtrain_new[train_index, :], xtrain_new[test_index, :]
y_train, y_test = ytrain_new[train_index], ytrain_new[test_index]
train_xgb = xgb.DMatrix(X_train, label= y_train)
test_xgb = xgb.DMatrix(X_test, label = y_test)
watchlist = [ (train_xgb,'train'), (test_xgb, 'test') ]
bst = xgb.train(params,
train_xgb,
NUM_XGB,
watchlist,
feval = customized_eval,
verbose_eval = 3,
early_stopping_rounds = 5)
#bst = xgb.train( params, dtrain, num_round, evallist )
y_pred = np.array(bst.predict(test_xgb))
y_pred_train = np.array(bst.predict(train_xgb))
train_ndcg_score = ndcg_score(y_train, y_pred_train , k = k_ndcg)
cv_ndcg_score = ndcg_score(y_test, y_pred, k=k_ndcg)
train_score_iter.append(train_ndcg_score)
cv_score_iter.append(cv_ndcg_score)
train_score_xgb = np.mean(train_score_iter)
cv_score_xgb = np.mean(cv_score_iter)
print ("\nThe training score is: {}".format(train_score_xgb))
print ("The cv score is: {}\n".format(cv_score_xgb))
[10:16:51] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 0 pruned nodes, max_depth=2
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 44 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 0 pruned nodes, max_depth=3
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 76 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 62 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 46 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 76 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 0 pruned nodes, max_depth=4
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 106 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 98 extra nodes, 0 pruned nodes, max_depth=6
[0] train-merror:0.432818 test-merror:0.509487 train-ndcg5:0.793868 test-ndcg5:0.746247
Multiple eval metrics have been passed: 'test-ndcg5' will be used for early stopping.
Will train until test-ndcg5 hasn't improved in 5 rounds.
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 0 pruned nodes, max_depth=1
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 26 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 22 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 50 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 66 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 90 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 76 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 58 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=5
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 90 extra nodes, 0 pruned nodes, max_depth=6
[10:16:52] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 0 pruned nodes, max_depth=1
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 32 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 28 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 64 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 78 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 58 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 64 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 108 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 94 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 96 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 34 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 42 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 44 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 110 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 68 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 66 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 102 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 0 pruned nodes, max_depth=1
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 68 extra nodes, 0 pruned nodes, max_depth=6
[10:16:53] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 96 extra nodes, 0 pruned nodes, max_depth=6
[3] train-merror:0.414266 test-merror:0.492762 train-ndcg5:0.805691 test-ndcg5:0.753109
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 48 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 22 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 86 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 70 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 62 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 94 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 22 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 82 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 104 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 20 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 64 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 58 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 94 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 56 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 78 extra nodes, 0 pruned nodes, max_depth=6
[10:16:54] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 100 extra nodes, 0 pruned nodes, max_depth=6
[10:16:55] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 122 extra nodes, 0 pruned nodes, max_depth=6
[10:16:55] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 22 extra nodes, 0 pruned nodes, max_depth=6
[10:16:55] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:16:55] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 90 extra nodes, 0 pruned nodes, max_depth=6
[10:16:55] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 88 extra nodes, 0 pruned nodes, max_depth=6
Stopping. Best iteration:
[0] train-merror:0.432818 test-merror:0.509487 train-ndcg5:0.793868 test-ndcg5:0.746247
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 52 extra nodes, 0 pruned nodes, max_depth=6
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 22 extra nodes, 0 pruned nodes, max_depth=6
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nodes, 0 pruned nodes, max_depth=6
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 92 extra nodes, 0 pruned nodes, max_depth=6
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 62 extra nodes, 0 pruned nodes, max_depth=6
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 58 extra nodes, 0 pruned nodes, max_depth=6
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 96 extra nodes, 0 pruned nodes, max_depth=6
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned nodes, max_depth=3
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 102 extra nodes, 0 pruned nodes, max_depth=6
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 102 extra nodes, 0 pruned nodes, max_depth=6
[0] train-merror:0.453619 test-merror:0.47688 train-ndcg5:0.780043 test-ndcg5:0.771609
Multiple eval metrics have been passed: 'test-ndcg5' will be used for early stopping.
Will train until test-ndcg5 hasn't improved in 5 rounds.
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 0 pruned nodes, max_depth=2
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 36 extra nodes, 0 pruned nodes, max_depth=6
[10:16:59] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 38 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 70 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 112 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 54 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 64 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 86 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 38 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 88 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 0 pruned nodes, max_depth=3
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 50 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 46 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 68 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 102 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 66 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 60 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 96 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 30 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 96 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 100 extra nodes, 0 pruned nodes, max_depth=6
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=4
[10:17:00] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 82 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 40 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 78 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 110 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 54 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 72 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 76 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 102 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nodes, 0 pruned nodes, max_depth=6
[3] train-merror:0.433661 test-merror:0.451441 train-ndcg5:0.793304 test-ndcg5:0.783746
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 32 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 66 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 46 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 86 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 90 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 50 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 84 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 66 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 32 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 64 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 84 extra nodes, 0 pruned nodes, max_depth=6
[10:17:01] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 0 pruned nodes, max_depth=2
[10:17:02] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 86 extra nodes, 0 pruned nodes, max_depth=6
[10:17:02] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 38 extra nodes, 0 pruned nodes, max_depth=6
[10:17:02] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 68 extra nodes, 0 pruned nodes, max_depth=6
[10:17:02] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 98 extra nodes, 0 pruned nodes, max_depth=6
[10:17:02] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 74 extra nodes, 0 pruned nodes, max_depth=6
[10:17:02] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nodes, 0 pruned nodes, max_depth=6
[10:17:02] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 112 extra nodes, 0 pruned nodes, max_depth=6
[10:17:02] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 0 pruned nodes, max_depth=6
[10:17:02] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:02] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 92 extra nodes, 0 pruned nodes, max_depth=6
[10:17:02] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 84 extra nodes, 0 pruned nodes, max_depth=6
Stopping. Best iteration:
[0] train-merror:0.453619 test-merror:0.47688 train-ndcg5:0.780043 test-ndcg5:0.771609
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 34 extra nodes, 0 pruned nodes, max_depth=6
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 0 pruned nodes, max_depth=5
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nodes, 0 pruned nodes, max_depth=6
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 88 extra nodes, 0 pruned nodes, max_depth=6
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 66 extra nodes, 0 pruned nodes, max_depth=6
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 64 extra nodes, 0 pruned nodes, max_depth=6
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 104 extra nodes, 0 pruned nodes, max_depth=6
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 0 pruned nodes, max_depth=1
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 98 extra nodes, 0 pruned nodes, max_depth=6
[10:17:06] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 88 extra nodes, 0 pruned nodes, max_depth=6
[0] train-merror:0.450949 test-merror:0.478426 train-ndcg5:0.782735 test-ndcg5:0.756588
Multiple eval metrics have been passed: 'test-ndcg5' will be used for early stopping.
Will train until test-ndcg5 hasn't improved in 5 rounds.
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 0 pruned nodes, max_depth=3
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 64 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 28 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 78 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 94 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 40 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 74 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 68 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 26 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 76 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 22 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 84 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 20 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 70 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 98 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 66 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 90 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 116 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 46 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 76 extra nodes, 0 pruned nodes, max_depth=6
[10:17:07] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 0 pruned nodes, max_depth=4
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 56 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 20 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 64 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 100 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 84 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 74 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 74 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 28 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 62 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 76 extra nodes, 0 pruned nodes, max_depth=6
[3] train-merror:0.425088 test-merror:0.459873 train-ndcg5:0.798643 test-ndcg5:0.771855
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 16 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 48 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 40 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 76 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 82 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 60 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 104 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 78 extra nodes, 0 pruned nodes, max_depth=6
[10:17:08] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 34 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 104 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 90 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 54 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 58 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 64 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 106 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 78 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 88 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 114 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 30 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 102 extra nodes, 0 pruned nodes, max_depth=6
[10:17:09] d:\build\xgboost\xgboost-0.80.git\src\tree\updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nodes, 0 pruned nodes, max_depth=6
Stopping. Best iteration:
[0] train-merror:0.450949 test-merror:0.478426 train-ndcg5:0.782735 test-ndcg5:0.756588
The training score is: 0.8033695668676714
The cv score is: 0.7713294556308351
model_cvscore = np.hstack((cv_score_tree, cv_score_xgb))
model_name = np.array(['ExtraTree','DTree','RF','GraBoost','Bagging','AdaBoost','Xgboost'])
fig = plt.figure(figsize=(8,4))
sns.barplot(model_cvscore, model_name, palette="Blues_d")
plt.xticks(rotation=0, size = 10)
plt.xlabel("CV score", fontsize = 12)
plt.ylabel("Model", fontsize = 12)
plt.title("Cross-validation score for different models")
plt.tight_layout()
原文:https://www.cnblogs.com/chenxiangzhen/p/10799924.html