模型评估&AUC

时间：2015-03-23 00:26:30 阅读：358 评论：0 收藏：0 [点我收藏+]

在机器学习中评判一个模型好坏的标准有很多，常用的有准确率、召回率、AUC等。本文介绍下AUC及其计算方式。

AUC常用来评估一个二元分类模型，二元分类模型通常有4中预测结局，以是否患高血压为例：

真阳性（TP）：诊断为有，实际上也有高血压。
伪阳性（FP）：诊断为有，实际却没有高血压。
真阴性（TN）：诊断为没有，实际上也没有高血压。
伪阴性（FN）：诊断为没有，实际却有高血压。

我们可以得到一个TPR = TP / (TP + FN) FPR = FP / (FP + TN)

如果我们去很多不同的阈值就可以得到一系列的(FPR, TPR)点，这些点可以拟合成一条曲线，我们称之为ROC(Receiver Operating Characteristic)；该曲线下方与横轴之间的面积大小即为AUC。因此，我们计算AUC的方式如下：

#!/usr/bin/python
import sys

def get_auc(arr_score, arr_label, pos_label):
    score_label_list = []
    for index in xrange(len(arr_score)):
        score_label_list.append((float(arr_score[index]), int(arr_label[index])))
    score_label_list_sorted = sorted(score_label_list, key = lambda line:line[0], reverse = True)

    fp, tp = 0, 0
    lastfp, lasttp = 0, 0
    A = 0
    lastscore = None

    for score_label in score_label_list_sorted:
        score, label = score_label[:2]
        if score != lastscore:
            A += trapezoid_area(fp, lastfp, tp, lasttp)
            lastscore = score
            lastfp, lasttp = fp, tp
        if label == pos_label:
            tp += 1
        else:
            fp += 1

    A += trapezoid_area(fp, lastfp, tp, lasttp)
    A /= (fp * tp)
    return A

def trapezoid_area(x1, x2, y1, y2):
    delta = abs(x2 - x1)
    return delta * 0.5 * (y1 + y2)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print "Error!\n%s pred_model_file"
        sys.exit(-1)
    arr_score, arr_label = [], []
    for line in file(sys.argv[1]):
        line = line.strip().split('\t')
        if len(line) < 2 : continue
        arr_score.append(line[0])
        arr_label.append(line[1])
    print arr_score;print arr_label;
    print "AUC = %s" % get_auc(arr_score, arr_label, 2)

F:\python_workspace\offline_evaluation>python model_evaluation.py pred_model_file.txt
['0.1', '0.4', '0.35', '0.8']
['1', '1', '2', '2']
AUC = 0.75

采用sklearn里的代码也可以得到AUC值，http://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html#sklearn.metrics.auc

>>> import numpy as np
>>> from sklearn import metrics
>>> y = np.array([1, 1, 2, 2])
>>> pred = np.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2)
>>> metrics.auc(fpr, tpr)
0.75

换个例子：

F:\python_workspace\offline_evaluation>python model_evaluation.py tmp.txt
['0.1', '0.2', '0.4', '0.5', '0.35', '0.8', '0.9', '0.95']
['1', '2', '1', '1', '2', '2', '2', '1']
AUC = 0.5

>>> import numpy as np
>>> from sklearn import metrics
>>> y = np.array([1,2,1,1,2,2,2,1])
>>> pred = np.array([0.1,0.2,0.4,0.5,0.35,0.8,0.9,0.95])
>>> fpr, tpr, ths = metrics.roc_curve(y, pred, pos_label=2)
>>> metrics.auc(fpr,tpr)
0.5

从以上2个例子中可以看到与之前自己写的代码得到的AUC值一样！

模型评估&AUC

原文：http://blog.csdn.net/lming_08/article/details/44284155

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)