AI learning 2:SVM Algorithm

AI learning 2:SVM Algorithm

Exchange blogroll: laker.me
Github:https://github.com/younglaker

Preparation

Environment:

  • Anacoda Navigator
  • python 3.6

Python packages(all in Anacoda, don’t have to install extractly):

  • numpy: for scientific computing
  • pandas: for data analysis
  • sklearn: Simple and efficient tools for data mining and data analysis

Code and datas https://pan.baidu.com/s/1wX3rXqWvjlnCFRk6yyXznQ

Coding

Import packages and set alias.

1
2
3
4
5
6
import numpy as np
import pandas as pd
# import SVC in svm module
from sklearn.svm import SVC
#import train_test_split in model_selection module
from sklearn.model_selection import train_test_split

Import training file 16.MNIST.train.csv, it’s a matrix of a picture. The first column is type of then row.
The number means grayscale value of the picture pixel.
And we only use the last 5000 rows to train.

1
2
3
4
# inputData is object of pandas, and use pandas function iloc to select the last 5000 rows.
inputData = pd.read_csv('./16.MNIST.train.csv')
inputData = inputData.iloc[-5000:, :]

============== Usage of pandas function iloc ==============
https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/
There are two “arguments” to iloc – a row selector, and a column selector.

data.iloc[rows, columns]

data.iloc[a:b, c:d] means from row a to row b, column c to column d.

data.iloc[a:, :d] means from row a to column end, column one to column d.

Single selections using iloc and DataFrame

1
2
3
4
5
6
7
8
9
# Rows:
data.iloc[0] # first row of data frame - Note a Series data type output.
data.iloc[1] # second row of data frame
data.iloc[-1] # last row of data frame
# Columns:
data.iloc[:,0] # first column of data frame
data.iloc[:,1] # second column of data frame
data.iloc[:,-1] # last column of data frame

Multiple columns and rows can be selected together using the .iloc indexer.

1
2
3
4
data.iloc[0:5] # first five rows of dataframe
data.iloc[:, 0:2] # first two columns of data frame with all rows
data.iloc[[0,3,6,24], [0,5,6]] # 1st, 4th, 7th, 25th row + 1st 6th 7th columns.
data.iloc[0:5, 5:8] # first 5 rows and 5th, 6th, 7th columns of data frame

Python list https://www.w3schools.com/python/python_lists.asp

============== / End Usage of pandas function iloc ==============

In the taining data, the first colums is lable, so we drop label column using drop(['label'], 1), extract it to parameter ‘label’ using inputData['label'], and convert it to array using np.array().

1
2
data = np.array(inputData.drop(['label'], 1))
label = np.array(inputData['label'])

============== Usage of pandas function drop ==============
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html

1
2
3
4
5
6
7
>>> df = pd.DataFrame(np.arange(12).reshape(3,4),
... columns=['A', 'B', 'C', 'D'])
>>> df
A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11

Drop columns

1
2
3
4
5
6
# axis : {0 or ‘index’, 1 or ‘columns’}, default 0
>>> df.drop(['B', 'C'], axis=1)
A D
0 0 3
1 4 7
2 8 11

or:

1
2
3
4
5
>>> df.drop(columns=['B', 'C'])
A D
0 0 3
1 4 7
2 8 11

Drop a row by index

1
2
3
>>> df.drop([0, 1])
A B C D
2 8 9 10 11

============== / End Usage of pandas function drop ==============

============== Usage of numpy function array ==============
https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html

1
2
3
>>> np.array([[1, 2], [3, 4]])
array([[1, 2],
[3, 4]])

============== / End Usage of numpy function array ==============

Then deal with the data, it’s a little tricky. Set data which is smaller than 100 to 0, and which is bigger than 100 to 1.

1
2
3
# Improve accuracy from 0.1 to 0.9
data[data<=100] = 0
data[data>100] = 1

Extract the first 5 rows.

1
inputData.head()

============== Usage of pandas function iloc ==============
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html

DataFrame.head(n)

Return the first n rows, default n=5.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

See also

DataFrame.tail(n)

Returns the last n rows.

1
2
3
4
5
6
7
8
9
10
11
12
13
>>> df = pd.DataFrame({'animal':['alligator', 'bee', 'falcon', 'lion',
... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra

Viewing the first n lines (three in this case)

1
2
3
4
5
>>> df.head(3)
animal
0 alligator
1 bee
2 falcon

============== /End Usage of pandas function iloc ==============

Use 70% data to train and 30% to test the taining result using test_size=0.3. Here we don’t specify the train_size, and it will automatically set to the complement of the test size, which is 1-0.3=0.7.

1
data_train, data_test, label_train, label_test = train_test_split(data, label, test_size=0.3, random_state=87)

======= Usage of sklearn.model_selection.train_test_split =======
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

train_test_split(*arrays, **options)

Split arrays or matrices into random train and test subsets

1
2
>>> import numpy as np
>>> from sklearn.model_selection import train_test_split

Parameters:

  • *arrays : sequence of indexables with same length / shape[0]
    Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.

  • test_size : float, int or None, optional (default=0.25)
    If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. By default, the value is set to 0.25.

  • train_size : float, int, or None, (default=None)
    If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

  • random_state : int, RandomState instance or None, optional (default=None)
    If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • shuffle : boolean, optional (default=True)
    Whether or not to shuffle the data before splitting. If shuffle=False then stratify must be None.

  • stratify : array-like or None (default=None)
    If not None, data is split in a stratified fashion, using this as the class labels.

Returns:

  • splitting : list, length=2 * len(arrays)
    List containing train-test split of inputs.

======= /End Usage of sklearn.model_selection.train_test_split =======

Use SVC model and fit() to train, set more parameters in SVC() improve accuracy:

1
2
SVM_model = SVC()
SVM_model.fit(data_train, label_train)

======= Usage of sklearn.model_selection.train_test_split =======

http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

SVC(C=1.0, kernel=’rbf’, degree=3, gamma=’auto_deprecated’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=’ovr’, random_state=None)

http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.fit

fit(X, y, sample_weight=None)

Fit the SVM model according to the given training data.

======= /End Usage of sklearn.model_selection.train_test_split =======

Use the first 10 rows to test and get the predict result. Then compare the predict result and label to score a point.

1
2
3
4
5
SVM_model.predict(data_test[:10])
label_test[:10]
SVM_model.score(data_test, label_test)

We can write a score() fuction like this instead of using score().

1
2
3
4
5
6
7
8
9
10
correct = 0
prediction = SVM_model.predict(data_test)
for i, lab in enumerate(label_test):
# if prediction == label, correct+1
if prediction[i] == lab:
correct += 1
# accuracy = correct numbers / all test numbers
accuracy = correct / len(label_test)
print("Accuracy is:", accuracy)

AI learning 2:SVM Algorithm

本文原创自http://laker.me/blog,转载请注明出处,欢迎交换友链

如果本文对您有帮助,微信扫一扫,请我吃个鸡腿吧

评论组件不稳定,有事请联系V信 lakerHQ (请备注来自博客)