memo: Scene Recognition by Chainer

Introduction

　In the previous page, a scene recognition (15-class classification) was performed using Caffe. In this page, the same problem is addressed by fine-tuning a pre-trained Caffe model by means of Chainer.

Computation Environment

　　An instance g2.2xlarge in the Amazon EC2 which mounts the GPU device is used.

Dataset

　The CNN is trained on the dataset LSP15 which consists of the 15 directories as follows:

MITcoast
MITforest
MIThighway
MITinsidecity
MITmountain
MITopencountry
MITstreet
MITtallbuilding
bedroom
CALsuburb
industrial
kitchen
livingroom
PARoffice
store

The name of the directory represents the category of the scene. Each directory contains about 200 to 300 images.

Data Augmentation

　In order to augment dataset, the mirror images are added to it. Then the images are divided by the split ratio 6:1. The former group corresponds to a training dataset, and the latter one a testing dataset. The numbers of them are shown below.

label	name	number of train	number of test
0	MITcoast	610	100
1	MIThighway	440	70
2	MITmountain	630	100
3	MITstreet	490	80
4	MITforest	550	90
5	MITinsidecity	520	80
6	MITopencountry	690	110
7	MITtallbuilding	600	100
8	bedroom	360	60
9	CALsuburb	400	60
10	industrial	520	80
11	kitchen	360	60
12	livingroom	490	80
13	PARoffice	360	60
14	store	540	90
		7560	1220

A maximum square centered on an image is resized to $256 \times 256$. Text files in which paths to image files and their labels are described look like this:

--- test.txt ---
--- train_valid.txt ---

Load and Save of Caffe Model

　In this work, bvlc_reference_caffenet.caffemodel that Caffe provides as a pre-trained model is employed. The script to load it is as follows: Since it takes CaffeFunction a long time to load a Caffe model, it is a good idea to save it using cPickle for later reference.

Implementation of Network

　The network equivalent to the Caffe model (bvlc_reference_caffenet.caffemodel) needs to be represented in Chainer before proceeding. The sample code Chainer provides (chainer-1.6.0/examples/imagenet/alex.py) helps us implement it. Referring to the prototxt file distributed with bvlc_reference_caffenet.caffemodel also makes it easy for us to do so. My implementation looks like this:

--- reference_caffenet.py ---
That code makes it easy to implement the class used in the current work. The top layer of the network, fc8, is renamed to scene_fc8, and the number of output units is changed from 1000 to 15. The modified class can be written as follows:

--- modified_reference_caffenet.py ---

Fine-Tuning

　The Caffe model saved using cPickle contains those instances of types L.Convolution2D and L.Linear which have pre-trained parameters $W$s and $b$s. The class ModifiedReferenceCaffeNet defined above has the same layers as the original Caffe model except for the layer scene_fc8. $W$s and $b$s of conv1, conv2, conv3, conv4, conv5, fc6, and fc7 of the original Caffe model are copied to those of the modified model (ModifiedReferenceCaffeNet). Then the modified model is fine-tuned on the current dataset (15-classification dataset).

Training

　The script to train is shown below:

--- train.py ---

15th line : a path to the Caffe model saved using cPickle
16th line ： a path to a file to save the fine-tuned model
17th line ： a path to a mean image path
21st line : a learning rate
22nd line ： a mini batch size
23rd line ： the number of epochs
24th line : a decay rate of the learning rate per epoch
29th line : a path to a training dataset
30th line ： a path to a testing dataset
33〜45th lines : The python function def test calculates both the average loss and the average accuracy over the testing dataset.
52〜63rd lines : These lines load dataset. The class DataLoader is discussed later.
75th line ： The line loads the original Caffe model.
78th line : The line generates an instance of the type ModifiedReferenceCaffeNet.
81st line : The line copies the pre-trained parameters $W$s and $b$s of the original model to those of the instance of the ModifiedReferenceCaffeNet. The top layer scene_fc8 of it has the parameters initialized with default values. The function copy_model is discussed later.
85th line : The line copies the model to the GPU device.
86th line : The line selects the SGD as the optimization algorithm.
87th line : The line passes to the SGD algorithm the model that is to be optimized using the SGD.
93〜115th lines ： These lines are for fine-tuning loop.
123rd line : The line saves the fine-tuned model.

The definitions of DataLoader and ImageCropper are as follows:

--- data_loader.py ---

--- image_cropper.py ---
The function copy_model is described in this page.

Training

　The accuracies and the losses for the training and testing datasets are shown below.

Training dataset

Testing dataset

The accuracy for the testing dataset achieves about 96.23%.

Construction of Classifier

　The class for a classifier is implemented as shown below.

--- modified_reference_caffenet_with_predictor.py ---
The usage of the classifier looks like this:

--- predict.py ---

12th line : a path to the testing dataset
13th line : a path to the mean image
14th line : a path to the fine-tuned model
18th line : The line loads the model.
21st line : The line generates the instance of the class ModifiedReferenceCaffeNetWithPredictor.
24th line : The line copies the parameters to the predictor. If ModifiedReferenceCaffeNetWithPredictor had been used to train the model, this procedure would not have been needed.
27〜32nd lines : The lines load the testing dataset.
35〜40th lines : The lines reduces the number of the testing dataset and construct instances of Variable.
43,44th lines ： The lines classify the data.

The execution of the script predict.py yields the following output. The predictor correctly infers labels except for the first one.

Construction of Feature Extractor

　The following code specifies how feature vectors are extracted from the layer fc7.

--- modified_reference_caffenet_with_feature_extractor.py ---
The following code shows how the data which is passed to liblinear as input argument is generated from the feature vectors.

--- extract.py ---

59th line : The code loads the fine-tuned model.
62nd line : The line generates the extractor.
65th line : The line copies the parameters of the fine-tuned model to those of the extractor
71st line : The line makes input data for liblinear.

Execution of SVM

　The following command is run to train SVM. The following command is run to predict labels. As shown above, the accuracy by Softmax layer achieves 96.23%. The SVM improves it by about 1%.

memo

2016年3月28日月曜日

Scene Recognition by Chainer

Introduction

Computation Environment

Dataset

Data Augmentation

Load and Save of Caffe Model

Implementation of Network

Fine-Tuning

Training

Training

Construction of Classifier

Construction of Feature Extractor

Execution of SVM

0 件のコメント:

コメントを投稿