KNN算法思想总结
在训练集中数据和标签已知的情况下,输入测试数据,将测试数据的特征与训练集中对应的特征进行相互比较,找到训练集中与之最为相似的前K个数据,则该测试数据对应的类别就是K个数据中出现次数最多的那个分类,其算法的描述为:
- 计算测试数据与各个训练数据之间的距离;
- 按照距离的递增关系进行排序;
- 选取距离最小的K个点;
- 确定前K个点所在类别的出现频率;
- 返回前K个点中出现频率最高的类别作为测试数据的预测分类。
加载mnist数据
1 | import tensorflow as tf |
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
1 | print(mnist.train.images.shape) |
(55000, 784)
(10000, 784)
设置属性
1 | trainNum = 55000 # 训练图片总数 |
数据分解
1 | # 生成不重复的随机数 |
trainData.shape= (5000, 784)
trainLabel.shape= (5000, 10)
testData.shape= (5, 784)
testLabel.shape= (5, 10)
testLabel= [[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
数据训练
1 设置变量
1 | trainDataInput = tf.placeholder(shape=[None,784],dtype=tf.float32) |
2 计算KNN距离,使用曼哈顿距离
1 | # expand_dim()来增加维度 |
p1= (5, 1, 784)
p2= (5, 5000, 784)
p3= (5, 5000)
p3[0,0]= 107.035324
3 选取距离最小的K个图片
1 | # tf.negative(x,name=None),取负运算(f4 =-f3) |
p4= (5, 5000)
p4[0,0]= -107.035324
p5= (5, 4)
p6= (5, 4)
p5 [[-58.270588 -63.31764 -66.56078 -66.59606 ]
[-50.70195 -59.564705 -60.10588 -60.713737 ]
[-10.211766 -13.3529415 -13.843139 -14.133332 ]
[-24.886272 -35.011753 -36.38429 -36.733334 ]
[ -8.498037 -9.266665 -11.807843 -12.474509 ]]
p6 [[3015 3148 3455 3798]
[4024 937 4708 4898]
[2627 4520 4514 3382]
[1312 4535 1769 3221]
[2512 4388 2169 2942]]
4 确定K个图片在类型出现的概率
1 | # 根据索引找到对应的标签值 |
p7= (5, 4, 10)
p7[] [[[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]
[[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]]
[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]]
p8= (5, 10)
p8[]= [[0. 0. 0. 0. 4. 0. 0. 0. 0. 0.]
[0. 0. 4. 0. 0. 0. 0. 0. 0. 0.]
[0. 4. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 4. 0. 0. 0.]
[0. 4. 0. 0. 0. 0. 0. 0. 0. 0.]]
p9= (5,)
p9[]= [4 2 1 6 1]
5 检验结果
1 | with tf.Session() as sess: |
p10[]= [4 2 1 6 1]
ac= 100.0