机器学习分类模型的评价指标

分类指标理论基础

精确率和召回率

机器学习中的评价指标

ROC曲线和AUC面积理解

ROC 曲线怎么画?

详解sklearn的多分类模型评价指标

sklearn API 测评

以经典二分类数据集(乳腺癌数据集)为例,测试 sklearn 中模型评估的API

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 导包
import sklearn.datasets as datasets
import sklearn.metrics as metrics
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# 导入数据集
data = datasets.load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target,test_size=0.2, random_state=1)

# 模型训练和预测
model = LogisticRegression()
model.fit(X_train, y_train)
y_predict = model.predict(X_test)
decision_scores = model.decision_function(X_test) # 决策得分,不是预测的概率

# 评估
metrics.accuracy_score(y_test, y_predict) # 准确率:0.956
metrics.confusion_matrix(y_test, y_predict) # 混淆矩阵:[[37,5],[0,72]]
tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_predict).ravel()
metrics.precision_score(y_test, y_predict) # 精确率: 0.935
metrics.recall_score(y_test, y_predict) # 召回率:1.0
metrics.f1_score(y_test, y_predict) # f1_score: 0.966

PR 曲线

1
2
3
4
5
6
7
8
9
10
11
12
precision, recall, thresholds = metrics.precision_recall_curve(y_test, decision_scores)
AP = metrics.average_precision_score(y_test, decision_scores)

plt.plot(recall, precision, label='AP = %0.4f'% AP)
plt.title("PR-curve")
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.ylim([0.0, 1.05])
plt.xlim([0.0, 1.0])
plt.legend(loc="lower right")
plt.fill_between(recall, precision, facecolor='purple', alpha=0.2)
plt.show()

1.png

ROC 曲线

1
2
3
4
5
6
7
8
9
10
11
12
fpr, tpr, thresholds = metrics.roc_curve(y_test, decision_scores)
AUC = metrics.roc_auc_score(y_test, decision_scores)

plt.plot(fpr, tpr, label='AUC = %0.4f'% AUC)
plt.title("ROC-curve")
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.ylim([0.0, 1.05])
plt.xlim([0.0, 1.0])
plt.legend(loc="lower right")
plt.fill_between(fpr, tpr, facecolor='purple', alpha=0.2)
plt.show()

1.png

参考

sklearn中的模型评估

sklearn metrics API