精品91自产拍在线观看55,人人看人人拍国产精品

主頁(yè) > 知識(shí)庫(kù) > Python機(jī)器學(xué)習(xí)之底層實(shí)現(xiàn)KNN

Python機(jī)器學(xué)習(xí)之底層實(shí)現(xiàn)KNN

一、導(dǎo)入數(shù)據(jù)

借助python自帶的pandas庫(kù)導(dǎo)入數(shù)據(jù)，很簡(jiǎn)單。用的數(shù)據(jù)是下載到本地的紅酒集。

代碼如下（示例）：

import pandas as pd
def read_xlsx(csv_path):
    data = pd.read_csv(csv_path)
    print(data)
    return data

二、歸一化

KNN算法中將用到距離，因此歸一化是一個(gè)重要步驟，可以消除數(shù)據(jù)的量綱。我用了歸一化，消除量綱也可以用標(biāo)準(zhǔn)化，但是作為新手，我覺(jué)得歸一化比較簡(jiǎn)單。

其中最大最小值的計(jì)算用到了python中的numpy庫(kù)，pandas導(dǎo)入的數(shù)據(jù)是DateFrame形式的，np.array()用來(lái)將DateFrame形式轉(zhuǎn)化為可以用numpy計(jì)算的ndarray形式。

代碼如下（示例）：

import numpy as np
def MinMaxScaler(data):
    col = data.shape[1]
    for i in range(0, col-1):
        arr = data.iloc[:, i]
        arr = np.array(arr) #將DataFrame形式轉(zhuǎn)化為ndarray形式，方便后續(xù)用numpy計(jì)算
        min = np.min(arr)
        max = np.max(arr)
        arr = (arr-min)/(max-min)
        data.iloc[:, i] = arr
    return data

三、分訓(xùn)練集和測(cè)試集

先將數(shù)據(jù)值和標(biāo)簽值分別用x和y劃分開(kāi)，設(shè)置隨機(jī)數(shù)種子random_state，若不設(shè)置，則每次運(yùn)行的結(jié)果會(huì)不相同。test_size表示測(cè)試集比例。

def train_test_split(data, test_size=0.2, random_state=None):
    col = data.shape[1]
    x = data.iloc[:, 0:col-1]
    y = data.iloc[:, -1]
    x = np.array(x)
    y = np.array(y)
    # 設(shè)置隨機(jī)種子，當(dāng)隨機(jī)種子非空時(shí)，將鎖定隨機(jī)數(shù)
    if random_state:
        np.random.seed(random_state)
        # 將樣本集的索引值進(jìn)行隨機(jī)打亂
        # permutation隨機(jī)生成0-len(data)隨機(jī)序列
    shuffle_indexs = np.random.permutation(len(x))
    # 提取位于樣本集中20%的那個(gè)索引值
    test_size = int(len(x) * test_size)
    # 將隨機(jī)打亂的20%的索引值賦值給測(cè)試索引
    test_indexs = shuffle_indexs[:test_size]
    # 將隨機(jī)打亂的80%的索引值賦值給訓(xùn)練索引
    train_indexs = shuffle_indexs[test_size:]
    # 根據(jù)索引提取訓(xùn)練集和測(cè)試集
    x_train = x[train_indexs]
    y_train = y[train_indexs]
    x_test = x[test_indexs]
    y_test = y[test_indexs]
    # 將切分好的數(shù)據(jù)集返回出去
    # print(y_train)
    return x_train, x_test, y_train, y_test

四、計(jì)算距離

此處用到歐氏距離，pow()函數(shù)用來(lái)計(jì)算冪次方。length指屬性值數(shù)量，在計(jì)算最近鄰時(shí)用到。

def CountDistance(train,test,length):
    distance = 0
    for x in range(length):
        distance += pow(test[x] - train[x], 2)**0.5
    return distance

五、選擇最近鄰

計(jì)算測(cè)試集中的一條數(shù)據(jù)和訓(xùn)練集中的每一條數(shù)據(jù)的距離，選擇距離最近的k個(gè)，以少數(shù)服從多數(shù)原則得出標(biāo)簽值。其中argsort返回的是數(shù)值從小到大的索引值，為了找到對(duì)應(yīng)的標(biāo)簽值。

tip:用numpy計(jì)算眾數(shù)的方法

import numpy as np
#bincount（）：統(tǒng)計(jì)非負(fù)整數(shù)的個(gè)數(shù)，不能統(tǒng)計(jì)浮點(diǎn)數(shù)
counts = np.bincount(nums)
#返回眾數(shù)
np.argmax(counts)

少數(shù)服從多數(shù)原則，計(jì)算眾數(shù)，返回標(biāo)簽值。

def getNeighbor(x_train,test,y_train,k):
    distance = []
    #測(cè)試集的維度
    length = x_train.shape[1]
    #測(cè)試集合所有訓(xùn)練集的距離
    for x in range(x_train.shape[0]):
        dist = CountDistance(test, x_train[x], length)
        distance.append(dist)
    distance = np.array(distance)
    #排序
    distanceSort = distance.argsort()
    # distance.sort(key= operator.itemgetter(1))
    # print(len(distance))
    # print(distanceSort[0])
    neighbors =[]
    for x in range(k):
        labels = y_train[distanceSort[x]]
        neighbors.append(labels)
        # print(labels)
    counts = np.bincount(neighbors)
    label = np.argmax(counts)
    # print(label)
    return label

調(diào)用函數(shù)時(shí)：

getNeighbor(x_train,x_test[0],y_train,3)

六、計(jì)算準(zhǔn)確率

用以上KNN算法預(yù)測(cè)測(cè)試集中每一條數(shù)據(jù)的標(biāo)簽值，存入result數(shù)組，將預(yù)測(cè)結(jié)果與真實(shí)值比較，計(jì)算預(yù)測(cè)正確的個(gè)數(shù)與總體個(gè)數(shù)的比值，即為準(zhǔn)確率。

def getAccuracy(x_test,x_train,y_train,y_test):
    result = []
    k = 3
    # arr_label = getNeighbor(x_train, x_test[0], y_train, k)
    for x in range(len(x_test)):
        arr_label = getNeighbor(x_train, x_test[x], y_train, k)
        result.append(arr_label)
    correct = 0
    for x in range(len(y_test)):
        if result[x] == y_test[x]:
           correct += 1
    # print(correct)
    accuracy = (correct / float(len(y_test))) * 100.0
    print("Accuracy:", accuracy, "%")
    return accuracy

總結(jié)

KNN算是機(jī)器學(xué)習(xí)中最簡(jiǎn)單的算法，實(shí)現(xiàn)起來(lái)相對(duì)簡(jiǎn)單，但對(duì)于我這樣的新手，還是花費(fèi)了大半天時(shí)間才整出來(lái)。

在github上傳了項(xiàng)目：https://github.com/chenyi369/KNN

到此這篇關(guān)于Python機(jī)器學(xué)習(xí)之底層實(shí)現(xiàn)KNN的文章就介紹到這了,更多相關(guān)Python底層實(shí)現(xiàn)KNN內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

python 如何通過(guò)KNN來(lái)填充缺失值
python實(shí)現(xiàn)KNN近鄰算法
Python圖像識(shí)別+KNN求解數(shù)獨(dú)的實(shí)現(xiàn)
python KNN算法實(shí)現(xiàn)鳶尾花數(shù)據(jù)集分類
python運(yùn)用sklearn實(shí)現(xiàn)KNN分類算法
使用python實(shí)現(xiàn)kNN分類算法
python實(shí)現(xiàn)KNN分類算法
python使用KNN算法識(shí)別手寫數(shù)字

標(biāo)簽：泉州長(zhǎng)春怒江安慶洛陽(yáng) 吉林清遠(yuǎn) 岳陽(yáng)

巨人網(wǎng)絡(luò)通訊聲明：本文標(biāo)題《Python機(jī)器學(xué)習(xí)之底層實(shí)現(xiàn)KNN》，本文關(guān)鍵詞 Python,機(jī)器,學(xué),習(xí)之,底層,；如發(fā)現(xiàn)本文內(nèi)容存在版權(quán)問(wèn)題，煩請(qǐng)?zhí)峁┫嚓P(guān)信息告之我們，我們將及時(shí)溝通與處理。本站內(nèi)容系統(tǒng)采集于網(wǎng)絡(luò)，涉及言論、版權(quán)與本站無(wú)關(guān)。