memo: 11月 2015

2015年11月22日日曜日

3D Object Recognition by CNN and SVM

Introduction

　A method for 3D object recognition, which is based on the work by H. Su et al., has been already proposed and evaluated on two 3D object datasets of the ModelNet10 and the ModelNet40. The classification accuracy for the ModelNet10 dataset has been shown here and that for the ModelNet40 dataset here. The method makes use of a convolutional neural network (CNN) as both a feature extractor and a classifier. In this page the method is extended to apply the CNN as only a feature extractor and to perform a classification by means of a support vector machine (SVM). It is demonstrated that the classification accuracy of the method is a little higher than those of the previous methods.

Classification Accuracies

　The following table shows classification accuracies for the previous and the proposed methods. The previous results are quoted from the Princeton ModelNet page.

Algorithm	Accuracy(ModelNet10)	Accuracy(ModelNet40)
MVCNN		90.1%
VoxNet	92%	80.3%
DeepPano	85.45%	77.63%
3DShapeNets	83.5%	77%
the proposed method	92.84%	90.92%

Proposed Method

　In the proposed method the CNN is used as a feature extractor and the SVM is applied as a classifier.

Feature Extractor:

　The CNN model (bvlc_reference_caffenet.caffemodel) that Caffe provides is fine-tuned on the training set of ModelNet10 or the ModelNet40. The detailed information on the fine-tuning procedures is described in this page. After training, output of the layer "fc7" is used as a feature vector (4096 dimensions). As explained here, in the proposed method a 3D object yields 20 gray images which are converted from depth ones. The fine-tuned CNN used as the feature extractor is applied to each of them and 20 feature vectors are obtained per 3D object.

According to the work by H. Su et al., element-wise maximum value across the 20 vectors is taken to make a single feature vector with 4096 dimensions as shown below.

Classifier:

　Since one feature vector per 3D object is made, it is straightforward to perform a classification by the SVM. In the current work the library LIBLINEAR in which linear SVMs are implemented is used. For the ModelNet10 training set the following command is run to train the SVM, and for the ModelNet40 training set the command is as follows, It is worthwhile to note that in order to improve the accuracy, square root of all elements of the vector is computed before applying the linear SVM. In other words, a Hellinger kernel classifier is employed. After training, the prediction command is run for the testing sets of both the ModelNet10 and the ModelNet40 as, The programs "train" and "predict" are offered by LIBLINEAR.

References

2015年11月14日土曜日

3D Object Recognition by Caffe 〜 40-class classification 〜

Introduction

　In the previous page, a new method for a 3D object recognition was proposed. The method is based on the work of H. Su et al. and was applied to a 10-class classification by means of the ModelNet10 dataset. A high classification accuracy of 90.2% was achieved. In this page, the method is evaluated on the ModelNet40 dataset. It is shown that a classification accuracy of 87.16% is reached.

Dataset

　The CNN model (bvlc_reference_caffenet.caffemodel) that Caffe provides is fine-tuned using the ModelNet40 dataset which consists of 40 categories. Each category has training and testing 3D models which are in Object File Format (OFF). The numbers of the models are as follows:

the number of trainings	the number of testing
9843	2468

Fine-tuning Result

　The following figure shows a fine-tuning process.

The x-axis indicates the iteration number and the y-axis the recognition accuracy. Because in the current case the total iteration number is 90,000 and the solver is designed to output the accuracy once per 1000 iterations, the maximum value of the x-axis is 90(=90,000/1000). It can be seen that the recognition accuracy reaches about 83.6%.

Classification Accuracy

　As described in the previous page, a 3D model yields 20 gray images. The fine-tuned CNN is applied to each of them and 20 labels are obtained per 3D model. The final label is decided by majority vote. This algorithm is evaluated on the 3D models belonging to the test phase . The number of them is 2468 as shown above. The classification accuracy reaches 87.16%(=2151/2468). The following table is quoted from the Princeton ModelNet page. The algorithm proposed here is not bad in spite of the simple procedures.

Algorithm	Accuracy
MVCNN	90.1%
VoxNet	80.3%
DeepPano	77.63%
3DShapeNets	77%
the proposed algorithm	87.16%

The detailed information on accuracy for each category is shown below.

References

2015年11月11日水曜日

3D Object Recognition by Caffe 〜 20-class classification 〜

Introduction

　In the previous page, a new method for a 3D object recognition was proposed. The method was applied to a 10-class classification and a classification accuracy of 90.2% was achieved. In this page, the method is evaluated on a 20-class classification. It is shown that a high classification accuracy of 95.3% is reached.

Dataset

　The pre-trained CNN model (bvlc_reference_caffenet.caffemodel) that Caffe provides is fine-tuned using a dataset consisting of 20 categories which are chosen among the ModelNet40 dataset. These categories are shown below.

airplane
bathtub
bed
bench
bookshelf
bottle
bowl
car
chair
cone
cup
curtain
desk
door
dresser
flower_pot
glass_box
guitar
keyboard
lamp

Each category has training and testing 3D models which are in Object File Format (OFF). The numbers of the models in categories are as follows:

label	name	the number of trainings	the number of testings
0	airplane	626	100
1	bathtub	106	50
2	bed	515	100
3	bench	173	20
4	bookshelf	572	100
5	bottle	335	100
6	bowl	64	20
7	car	197	100
8	chair	889	100
9	cone	167	20
10	cup	79	20
11	curtain	138	20
12	desk	200	86
13	door	109	20
14	dresser	200	86
15	flower_pot	149	20
16	glass_box	171	100
17	guitar	155	100
18	keyboard	145	20
19	lamp	124	20
		5114	1202

Results of CNN and Classification

　The fine-tuning yields a high recognition accuracy of 93% as shown below.

As described in the previous page, a 3D model yields 20 gray images. The fine-tuned CNN is applied to each of them and 20 labels are obtained per 3D model. The final label is decided by majority vote. This algorithm is evaluated on those 3D models belonging to the test phase whose number is 1202 as shown above. The classification accuracy of 95.3% is reached.

memo

2015年11月22日日曜日

3D Object Recognition by CNN and SVM

Introduction

Classification Accuracies

Proposed Method

References

2015年11月14日土曜日

3D Object Recognition by Caffe 〜 40-class classification 〜

Introduction

Dataset

Fine-tuning Result

Classification Accuracy

References

2015年11月11日水曜日

3D Object Recognition by Caffe 〜 20-class classification 〜

Introduction

Dataset

Results of CNN and Classification

2015年11月5日木曜日

my own memo: cpplinq

Sample code

Output