## 2016年3月28日月曜日

### Scene Recognition by Chainer

in Japanese

#### Introduction

In the previous page, a scene recognition (15-class classification) was performed using Caffe. In this page, the same problem is addressed by fine-tuning a pre-trained Caffe model by means of Chainer.

#### Computation Environment

An instance g2.2xlarge in the Amazon EC2 which mounts the GPU device is used.

#### Dataset

The CNN is trained on the dataset LSP15 which consists of the 15 directories as follows:
1. MITcoast
2. MITforest
3. MIThighway
4. MITinsidecity
5. MITmountain
6. MITopencountry
7. MITstreet
8. MITtallbuilding
9. bedroom
10. CALsuburb
11. industrial
12. kitchen
13. livingroom
14. PARoffice
15. store
The name of the directory represents the category of the scene. Each directory contains about 200 to 300 images.

#### Data Augmentation

In order to augment dataset, the mirror images are added to it. Then the images are divided by the split ratio 6:1. The former group corresponds to a training dataset, and the latter one a testing dataset. The numbers of them are shown below.

label name number of train number of test
0 MITcoast 610 100
1 MIThighway 440 70
2 MITmountain 630 100
3 MITstreet 490 80
4 MITforest 550 90
5 MITinsidecity 520 80
6 MITopencountry 690 110
7 MITtallbuilding 600 100
8 bedroom 360 60
9 CALsuburb 400 60
10 industrial 520 80
11 kitchen 360 60
12 livingroom 490 80
13 PARoffice 360 60
14 store 540 90
7560 1220

A maximum square centered on an image is resized to $256 \times 256$. Text files in which paths to image files and their labels are described look like this:

--- test.txt ---
--- train_valid.txt ---

#### Load and Save of Caffe Model

In this work, bvlc_reference_caffenet.caffemodel that Caffe provides as a pre-trained model is employed. The script to load it is as follows: Since it takes CaffeFunction a long time to load a Caffe model, it is a good idea to save it using cPickle for later reference.

#### Implementation of Network

The network equivalent to the Caffe model (bvlc_reference_caffenet.caffemodel) needs to be represented in Chainer before proceeding. The sample code Chainer provides (chainer-1.6.0/examples/imagenet/alex.py) helps us implement it. Referring to the prototxt file distributed with bvlc_reference_caffenet.caffemodel also makes it easy for us to do so. My implementation looks like this:

--- reference_caffenet.py ---
That code makes it easy to implement the class used in the current work. The top layer of the network, fc8, is renamed to scene_fc8, and the number of output units is changed from 1000 to 15. The modified class can be written as follows:

--- modified_reference_caffenet.py ---

#### Fine-Tuning

The Caffe model saved using cPickle contains those instances of types L.Convolution2D and L.Linear which have pre-trained parameters $W$s and $b$s. The class ModifiedReferenceCaffeNet defined above has the same layers as the original Caffe model except for the layer scene_fc8. $W$s and $b$s of conv1, conv2, conv3, conv4, conv5, fc6, and fc7 of the original Caffe model are copied to those of the modified model (ModifiedReferenceCaffeNet). Then the modified model is fine-tuned on the current dataset (15-classification dataset).

#### Training

The script to train is shown below:

--- train.py ---
1. 15th line : a path to the Caffe model saved using cPickle
2. 16th line ： a path to a file to save the fine-tuned model
3. 17th line ： a path to a mean image path
4. 21st line : a learning rate
5. 22nd line ： a mini batch size
6. 23rd line ： the number of epochs
7. 24th line : a decay rate of the learning rate per epoch
8. 29th line : a path to a training dataset
9. 30th line ： a path to a testing dataset
10. 33〜45th lines : The python function def test calculates both the average loss and the average accuracy over the testing dataset.
11. 52〜63rd lines : These lines load dataset. The class DataLoader is discussed later.
12. 75th line ： The line loads the original Caffe model.
13. 78th line : The line generates an instance of the type ModifiedReferenceCaffeNet.
14. 81st line : The line copies the pre-trained parameters $W$s and $b$s of the original model to those of the instance of the ModifiedReferenceCaffeNet. The top layer scene_fc8 of it has the parameters initialized with default values. The function copy_model is discussed later.
15. 85th line : The line copies the model to the GPU device.
16. 86th line : The line selects the SGD as the optimization algorithm.
17. 87th line : The line passes to the SGD algorithm the model that is to be optimized using the SGD.
18. 93〜115th lines ： These lines are for fine-tuning loop.
19. 123rd line : The line saves the fine-tuned model.
The definitions of DataLoader and ImageCropper are as follows:

--- image_cropper.py ---
The function copy_model is described in this page.

#### Training

The accuracies and the losses for the training and testing datasets are shown below.

Training dataset
Testing dataset

The accuracy for the testing dataset achieves about 96.23%.

#### Construction of Classifier

The class for a classifier is implemented as shown below.

--- modified_reference_caffenet_with_predictor.py ---
The usage of the classifier looks like this:

--- predict.py ---
1. 12th line : a path to the testing dataset
2. 13th line : a path to the mean image
3. 14th line : a path to the fine-tuned model
4. 18th line : The line loads the model.
5. 21st line : The line generates the instance of the class ModifiedReferenceCaffeNetWithPredictor.
6. 24th line : The line copies the parameters to the predictor. If ModifiedReferenceCaffeNetWithPredictor had been used to train the model, this procedure would not have been needed.
7. 27〜32nd lines : The lines load the testing dataset.
8. 35〜40th lines : The lines reduces the number of the testing dataset and construct instances of Variable.
9. 43,44th lines ： The lines classify the data.
The execution of the script predict.py yields the following output. The predictor correctly infers labels except for the first one.

#### Construction of Feature Extractor

The following code specifies how feature vectors are extracted from the layer fc7.

--- modified_reference_caffenet_with_feature_extractor.py ---
The following code shows how the data which is passed to liblinear as input argument is generated from the feature vectors.

--- extract.py ---
1. 59th line : The code loads the fine-tuned model.
2. 62nd line : The line generates the extractor.
3. 65th line : The line copies the parameters of the fine-tuned model to those of the extractor
4. 71st line : The line makes input data for liblinear.

#### Execution of SVM

The following command is run to train SVM. The following command is run to predict labels. As shown above, the accuracy by Softmax layer achieves 96.23%. The SVM improves it by about 1%.