首页 > 其他 > 详细

机器学习实战-学习笔记-第二章

时间:2015-12-05 15:56:24      阅读:180      评论:0      收藏:0      [点我收藏+]

2.1节

1.切换到工作目录

2.在工作目录下新建一个python脚本文件kNN.py,内容如下:

from numpy import *
import operator

def createDataSet():
    group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
    labels = [A,A,B,B]
    return group, labels



def classify0(inX, dataSet, labels, k):
    dataSetSize = dataSet.shape[0]
    diffMat = tile(inX, (dataSetSize, 1)) - dataSet
    sqDiffMat = diffMat**2
    sqDistances = sqDiffMat.sum(axis=1)
    distances = sqDistances**0.5
    sortedDistIndicies = distances.argsort()
    classCount={}
    for i in range(k):
        voteILabel = labels[sortedDistIndicies[i]]
        classCount[voteILabel] = classCount.get(voteILabel, 0) + 1
    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
    return sortedClassCount[0][0]
    

然后进入Python REPL:

F:\studio\MachineLearningInAction\ch02>python
Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> from numpy import *
>>> import operator
>>> import kNN
>>> group, labels = kNN.createDataSet()
>>> group
array([[ 1. ,  1.1],
       [ 1. ,  1. ],
       [ 0. ,  0. ],
       [ 0. ,  0.1]])
>>> labels
[A, A, B, B]
>>> kNN.classify0([0,0], group, labels, 3)
B
>>>

 

2.2节

修改kNN.py文件,增加如下内容:

def getLabelID(labelName):
    if (labelName == "largeDoses"): return 3
    elif (labelName == "smallDoses"): return 2
    else: return 1

    
def file2matrix(filename):
    fr = open(filename)
    arrayOLines = fr.readlines()
    numberOfLines = len(arrayOLines)
    returnMat = zeros((numberOfLines, 3))
    classLabelVector = []
    index = 0
    for line in arrayOLines:
        line = line.strip()
        listFromLine = line.split(\t)
        returnMat[index,:] = listFromLine[0:3]
        classLabelVector.append(getLabelID(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

注意这里和原书的代码不一样,这里我定义了一个从labelName到labelID的函数,由于测试数据中的label是名称。

 

输入如下脚本

>>> from numpy import *
>>> import operator
>>> import kNN
>>> datingDataMat,datingLabels = kNN.file2matrix(datingTestSet.txt)
>>> datingLabels[0:20]
[3, 2, 1, 1, 1, 1, 3, 3, 1, 3, 1, 1, 2, 1, 1, 1, 1, 1, 2, 3]
>>> import matplotlib
>>> import matplotlib.pyplot as plt
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.scatter(datingDataMat[:,1],datingDataMat[:,2])
<matplotlib.collections.PathCollection object at 0x0000000002E19278>
>>> plt.show()

注意最后一句plt.show(),如果没有这一句是没有显示的。

>>> ax.scatter(datingDataMat[:,1],datingDataMat[:,2],15*array(datingLabels), 15*array(datingLabels))
<matplotlib.collections.PathCollection object at 0x00000000035A9908>
>>> plt.show()
>>>
from numpy import *
import operator
import kNN
datingDataMat,datingLabels = kNN.file2matrix(datingTestSet.txt)
import matplotlib
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(datingDataMat[:,0],datingDataMat[:,1],15*array(datingLabels), 15*array(datingLabels))
plt.show()

接着对数据进行规范化:

reload(kNN)
normMat, ranges, minVals = kNN.autoNorm(datingDataMat)
normMat
ranges
minVals

结果如下:

F:\studio\MachineLearningInAction\ch02>python
Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> from numpy import *
>>> import operator
>>> import kNN
>>> datingDataMat,datingLabels = kNN.file2matrix(datingTestSet.txt)
>>> normMat, ranges, minVals = kNN.autoNorm(datingDataMat)
>>> normMat
array([[ 0.44832535,  0.39805139,  0.56233353],
       [ 0.15873259,  0.34195467,  0.98724416],
       [ 0.28542943,  0.06892523,  0.47449629],
       ...,
       [ 0.29115949,  0.50910294,  0.51079493],
       [ 0.52711097,  0.43665451,  0.4290048 ],
       [ 0.47940793,  0.3768091 ,  0.78571804]])
>>> ranges
array([  9.12730000e+04,   2.09193490e+01,   1.69436100e+00])
>>> minVals
array([ 0.      ,  0.      ,  0.001156])
>>>

测试算法:向kNN.py中增加如下内容:

def datingClassTest():
    hoRatio = 0.10
    datingDataMat, datingLabels = file2matrix(datingTestSet.txt)
    normMat, ranges, minVals = autoNorm(datingDataMat)
    m = normMat.shape[0]
    numTestVecs = int(m*hoRatio)
    errorCount = 0.0
    for i in range(numTestVecs):
        classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:],datingLabels[numTestVecs:m],3)
        print "the classifier came back with: %d,  the real answer is: %d" %(classifierResult, datingLabels[i])
        if (classifierResult != datingLabels[i]): 
            errorCount += 1.0
            print "ERROR: the classifier came back with: %d,  the real answer is: %d" %(classifierResult, datingLabels[i])
    print "the total error rate is: %f" %(errorCount/float(numTestVecs))
    

在Python REPL中运行脚本

>>> reload(kNN)
<module kNN from kNN.py>
>>> kNN.datingClassTest()
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 2
ERROR: the classifier came back with: 1,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 1
ERROR: the classifier came back with: 3,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 1
ERROR: the classifier came back with: 3,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 3
ERROR: the classifier came back with: 2,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 1
ERROR: the classifier came back with: 3,  the real answer is: 1
the total error rate is: 0.050000

 

机器学习实战-学习笔记-第二章

原文:http://www.cnblogs.com/littlesuccess/p/5021535.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!