Support Vector Machine- R vs Python

About Support Vector Machine :

Support Vector Machines (SVM’s) are a relatively new learning method generally used for classification problem. Although the first paper dates way back to early 1960’s it is only in 1992-1995 that this powerful method was universally adopted as a mainstream machine learning paradigm.

The basic idea is to find a hyper plane which separates the d-dimensional data perfectly into its classes. However, since training data is often not linearly separable, SVM’s introduce the notion of a “Kernel-induced Feature Space” which casts the data into a higher dimensional space where the data is separable.

SVC Kernels
SVM classification is done in Python with the help of kernels. For different types of datasets SVC provides different kernel methods. A Kernel helps input data to tranform into required form. The Kernel converts nonseperable problems to seperable by adding more dimensions to it. It is most useful in non-linear separation problem. Kernel helps in building a more accurate classifier.

Linear Kernel: A linear kernel can be used as normal dot product for any two given observations. The product between two vectors is the sum of the multiplication of each pair of input values.
Polynomial Kernel: A polynomial kernel is a more generalized form of the linear kernel. The polynomial kernel can distinguish curved or nonlinear input space.
Radial Basis Function Kernel: The Radial basis function kernel is a popular kernel function commonly used in support vector machine classification. RBF can map an input space in infinite dimensional space.
Sigmoid Kernel: Can be used it as the proxy for neural networks.

Data Description: The bank possesses demographic and transactional data of its loan customers. If the bank has a robust model to predict defaulters it can undertake better resource allocation.

Objective: To predict whether the customer applying for the loan will be a defaulter.

SVM using R :

Importing data

bankloan<-read.csv("BANK LOAN.csv",header=T)
bankloan$AGE<-as.factor(bankloan$AGE)
str(bankloan)

## 'data.frame':    700 obs. of  8 variables:
##  $ SN       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ AGE      : Factor w/ 3 levels "1","2","3": 3 1 2 3 1 3 2 3 1 2 ...
##  $ EMPLOY   : int  17 10 15 15 2 5 20 12 3 0 ...
##  $ ADDRESS  : int  12 6 14 14 0 5 9 11 4 13 ...
##  $ DEBTINC  : num  9.3 17.3 5.5 2.9 17.3 10.2 30.6 3.6 24.4 19.7 ...
##  $ CREDDEBT : num  11.36 1.36 0.86 2.66 1.79 ...
##  $ OTHDEBT  : num  5.01 4 2.17 0.82 3.06 ...
##  $ DEFAULTER: int  1 0 0 0 1 0 0 0 1 0 ...

Running SVM model

library(e1071) 
model<-svm(formula=DEFAULTER~AGE+EMPLOY+ADDRESS+
           DEBTINC+CREDDEBT+OTHDEBT,data=bankloan,
           type="C",probability=TRUE,kernel="linear")
model

## 
## Call:
## svm(formula = DEFAULTER ~ AGE + EMPLOY + ADDRESS + DEBTINC + 
##     CREDDEBT + OTHDEBT, data = bankloan, type = "C", probability = TRUE, 
##     kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  1 
## 
## Number of Support Vectors:  312

SVM using Python :

Importing libraries and data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, f1_score, precision_score, recall_score, accuracy_score, roc_curve, roc_auc_score

bankloan = pd.read_csv("BANK LOAN.csv")
bankloan['AGE'] = pd.Categorical(bankloan['AGE'])
bankloan.info()

## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 700 entries, 0 to 699
## Data columns (total 8 columns):
## SN           700 non-null int64
## AGE          700 non-null category
## EMPLOY       700 non-null int64
## ADDRESS      700 non-null int64
## DEBTINC      700 non-null float64
## CREDDEBT     700 non-null float64
## OTHDEBT      700 non-null float64
## DEFAULTER    700 non-null int64
## dtypes: category(1), float64(3), int64(4)
## memory usage: 39.2 KB

Running SVM model

X = bankloan.loc[:,bankloan.columns != 'DEFAULTER']
y = bankloan.loc[:, 'DEFAULTER']
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                   test_size=0.30, 
                                                   random_state = 999)

svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

## SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
##     decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
##     max_iter=-1, probability=False, random_state=None, shrinking=True,
##     tol=0.001, verbose=False)

SVC() trains a support vector machine.

kernel= specifies the kernel type to be used in the algorithm‘(linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’).