K-Nearest Neighbours Classifiers- R vs Python

About K-Nearest Neighbours :

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems.
KNN stores all available cases and classifies (or gives expected value of) new cases based on a similarity measure.

Data Description: The bank possesses demographic and transactional data of its loan customers. If the bank has a robust model to predict defaulters it can undertake better resource allocation.

Objective: To predict whether the customer applying for the loan will be a defaulter

KNN Classifier using R :

Importing data and removing unwanted variables

bankloan<-read.csv("BANK LOAN KNN.csv",header=T)
bankloan2<-subset(bankloan,select=c(-AGE,-SN,-DEFAULTER))

head(bankloan2)

##   EMPLOY ADDRESS DEBTINC CREDDEBT OTHDEBT
## 1     17      12     9.3    11.36    5.01
## 2      2       0    17.3     1.79    3.06
## 3     12      11     3.6     0.13    1.24
## 4      3       4    24.4     1.36    3.28
## 5     24      14    10.0     3.93    2.47
## 6      6       9    16.3     1.72    3.01

Scaling variables

bankloan3<-scale(bankloan2)

head(bankloan3)

##       EMPLOY    ADDRESS    DEBTINC      CREDDEBT     OTHDEBT
## 1  1.5656796  0.6216799 -0.2881684  3.8774339687  0.51519694
## 2 -0.8239988 -1.1852951  0.7889154  0.0289356115 -0.02571385
## 3  0.7691201  0.4710987 -1.0555906 -0.6386200074 -0.53056393
## 4 -0.6646869 -0.5829701  1.7448273 -0.1439854223  0.03531198
## 5  2.6808628  0.9228424 -0.1939235  0.8895193612 -0.18937404
## 6 -0.1867512  0.1699362  0.6542799  0.0007856758 -0.03958336

Creating training and testing data sets

library(caret)

## Loading required package: lattice

## Loading required package: ggplot2

index<-createDataPartition(bankloan$SN,p=0.7,list=FALSE)
head(index)

##      Resample1
## [1,]         3
## [2,]         4
## [3,]         5
## [4,]         7
## [5,]         8
## [6,]        10

traindata<-bankloan3[index,]
testdata<-bankloan3[-index,]

dim(traindata)

## [1] 273   5

dim(testdata)

## [1] 116   5

Creating class vectors

Ytrain<-bankloan$DEFAULTER[index]

Ytest<-bankloan$DEFAULTER[-index]

KNN classification (Contunuous predictors)

knn() in package “class” undertakes k-nearest neighbour classification testing set using training data. Distance is calculated by Euclidean measure, and the classification is decided by majority vote, with ties broken at random.

library(class)

model<-knn(traindata,testdata,k=20,cl=Ytrain)

KNN Classifier using Python :

Here same BANK LOAN DATA is used.

import numpy as np
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import confusion_matrix, f1_score, precision_score, recall_score, accuracy_score,roc_curve, roc_auc_score

Importing data and removing unwanted variables

bankloan = pd.read_csv("BANK LOAN KNN.csv")
bankloan1 = bankloan.drop(['SN','AGE'], axis = 1)

bankloan1.head()

##    EMPLOY  ADDRESS  DEBTINC  CREDDEBT  OTHDEBT  DEFAULTER
## 0      17       12      9.3     11.36     5.01          1
## 1       2        0     17.3      1.79     3.06          1
## 2      12       11      3.6      0.13     1.24          0
## 3       3        4     24.4      1.36     3.28          1
## 4      24       14     10.0      3.93     2.47          0

Creating training and testing data sets

X = bankloan1.loc[:,bankloan1.columns != 'DEFAULTER']
y = bankloan1.loc[:, 'DEFAULTER']

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                   test_size=0.30, 
                                                   random_state = 999)

Preparing/Scaling variables

scaler = StandardScaler()
scaler.fit(X_train)

## StandardScaler(copy=True, with_mean=True, with_std=True)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Building the KNN Classifier (Continuous Predictors)
KNeighborsClassifier() from sklearn.neighbors undertakes k-nearest neighbour classification testing set using training data

KNNclassifier = KNeighborsClassifier(n_neighbors = 
                                     int(np.sqrt(len(X)).round()))

KNNclassifier.fit(X_train, y_train)

## KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
##                      metric_params=None, n_jobs=None, n_neighbors=20, p=2,
##                      weights='uniform')