Introduction
In the previous page, a scene recognition (15-class classification) was performed using Caffe. In this page, the same problem is addressed by fine-tuning a pre-trained Caffe model by means of Chainer.
Computation Environment
An instance g2.2xlarge in the Amazon EC2 which mounts the GPU device is used.
Dataset
The CNN is trained on the dataset LSP15 which consists of the 15 directories as follows:
- MITcoast
- MITforest
- MIThighway
- MITinsidecity
- MITmountain
- MITopencountry
- MITstreet
- MITtallbuilding
- bedroom
- CALsuburb
- industrial
- kitchen
- livingroom
- PARoffice
- store
Data Augmentation
In order to augment dataset, the mirror images are added to it. Then the images are divided by the split ratio 6:1. The former group corresponds to a training dataset, and the latter one a testing dataset. The numbers of them are shown below.
label | name | number of train | number of test |
---|---|---|---|
0 | MITcoast | 610 | 100 |
1 | MIThighway | 440 | 70 |
2 | MITmountain | 630 | 100 |
3 | MITstreet | 490 | 80 |
4 | MITforest | 550 | 90 |
5 | MITinsidecity | 520 | 80 |
6 | MITopencountry | 690 | 110 |
7 | MITtallbuilding | 600 | 100 |
8 | bedroom | 360 | 60 |
9 | CALsuburb | 400 | 60 |
10 | industrial | 520 | 80 |
11 | kitchen | 360 | 60 |
12 | livingroom | 490 | 80 |
13 | PARoffice | 360 | 60 |
14 | store | 540 | 90 |
7560 | 1220 |
A maximum square centered on an image is resized to $256 \times 256$. Text files in which paths to image files and their labels are described look like this:
--- test.txt --- --- train_valid.txt ---
Load and Save of Caffe Model
In this work,
bvlc_reference_caffenet.caffemodel
that Caffe provides as a pre-trained model is employed. The script to load it is as follows:
Since it takes CaffeFunction a long time to load a Caffe model, it is a good idea to save it using cPickle for later reference.
Implementation of Network
The network equivalent to the Caffe model (bvlc_reference_caffenet.caffemodel) needs to be represented in Chainer before proceeding. The sample code Chainer provides (chainer-1.6.0/examples/imagenet/alex.py) helps us implement it. Referring to the prototxt file distributed with
bvlc_reference_caffenet.caffemodel
also makes it easy for us to do so. My implementation looks like this:
--- reference_caffenet.py --- That code makes it easy to implement the class used in the current work. The top layer of the network,
fc8
, is renamed to scene_fc8
, and the number of output units is changed from 1000 to 15. The modified class can be written as follows:
--- modified_reference_caffenet.py ---
Fine-Tuning
The Caffe model saved using
cPickle
contains those instances of types L.Convolution2D
and L.Linear
which have pre-trained parameters $W$s and $b$s. The class ModifiedReferenceCaffeNet
defined above has the same layers as the original Caffe model except for the layer scene_fc8
. $W$s and $b$s of conv1
, conv2
, conv3
, conv4
, conv5
, fc6
, and fc7
of the original Caffe model are copied to those of the modified model (ModifiedReferenceCaffeNet
). Then the modified model is fine-tuned on the current dataset (15-classification dataset).
Training
The script to train is shown below:
--- train.py ---
- 15th line : a path to the Caffe model saved using cPickle
- 16th line : a path to a file to save the fine-tuned model
- 17th line : a path to a mean image path
- 21st line : a learning rate
- 22nd line : a mini batch size
- 23rd line : the number of epochs
- 24th line : a decay rate of the learning rate per epoch
- 29th line : a path to a training dataset
- 30th line : a path to a testing dataset
- 33〜45th lines : The python function
def test
calculates both the average loss and the average accuracy over the testing dataset. - 52〜63rd lines : These lines load dataset. The class
DataLoader
is discussed later. - 75th line : The line loads the original Caffe model.
- 78th line : The line generates an instance of the type
ModifiedReferenceCaffeNet
. - 81st line : The line copies the pre-trained parameters $W$s and $b$s of the original model to those of the instance of the
ModifiedReferenceCaffeNet
. The top layerscene_fc8
of it has the parameters initialized with default values. The functioncopy_model
is discussed later. - 85th line : The line copies the model to the GPU device.
- 86th line : The line selects the SGD as the optimization algorithm.
- 87th line : The line passes to the SGD algorithm the model that is to be optimized using the SGD.
- 93〜115th lines : These lines are for fine-tuning loop.
- 123rd line : The line saves the fine-tuned model.
DataLoader
and ImageCropper
are as follows:
--- data_loader.py ---
--- image_cropper.py --- The function
copy_model
is described in this page.
Training
The accuracies and the losses for the training and testing datasets are shown below.
Training dataset Testing dataset
The accuracy for the testing dataset achieves about 96.23%.
Construction of Classifier
The class for a classifier is implemented as shown below.
--- modified_reference_caffenet_with_predictor.py --- The usage of the classifier looks like this:
--- predict.py ---
- 12th line : a path to the testing dataset
- 13th line : a path to the mean image
- 14th line : a path to the fine-tuned model
- 18th line : The line loads the model.
- 21st line : The line generates the instance of the class
ModifiedReferenceCaffeNetWithPredictor
. - 24th line : The line copies the parameters to the predictor. If
ModifiedReferenceCaffeNetWithPredictor
had been used to train the model, this procedure would not have been needed. - 27〜32nd lines : The lines load the testing dataset.
- 35〜40th lines : The lines reduces the number of the testing dataset and construct instances of
Variable
. - 43,44th lines : The lines classify the data.
predict.py
yields the following output.
The predictor correctly infers labels except for the first one.
Construction of Feature Extractor
The following code specifies how feature vectors are extracted from the layer
fc7
.--- modified_reference_caffenet_with_feature_extractor.py --- The following code shows how the data which is passed to
liblinear
as input argument is generated from the feature vectors.--- extract.py ---
- 59th line : The code loads the fine-tuned model.
- 62nd line : The line generates the extractor.
- 65th line : The line copies the parameters of the fine-tuned model to those of the extractor
- 71st line : The line makes input data for
liblinear
.
Execution of SVM
The following command is run to train SVM. The following command is run to predict labels. As shown above, the accuracy by Softmax layer achieves 96.23%. The SVM improves it by about 1%.