2015年11月14日土曜日

3D Object Recognition by Caffe 〜 40-class classification 〜

Introduction


 In the previous page, a new method for a 3D object recognition was proposed. The method is based on the work of H. Su et al. and was applied to a 10-class classification by means of the ModelNet10 dataset. A high classification accuracy of 90.2% was achieved. In this page, the method is evaluated on the ModelNet40 dataset. It is shown that a classification accuracy of 87.16% is reached.

Dataset


 The CNN model (bvlc_reference_caffenet.caffemodel) that Caffe provides is fine-tuned using the ModelNet40 dataset which consists of 40 categories. Each category has training and testing 3D models which are in Object File Format (OFF). The numbers of the models are as follows:

the number of trainings the number of testing
9843 2468

Fine-tuning Result


 The following figure shows a fine-tuning process.
The x-axis indicates the iteration number and the y-axis the recognition accuracy. Because in the current case the total iteration number is 90,000 and the solver is designed to output the accuracy once per 1000 iterations, the maximum value of the x-axis is 90(=90,000/1000). It can be seen that the recognition accuracy reaches about 83.6%.

Classification Accuracy


 As described in the previous page, a 3D model yields 20 gray images. The fine-tuned CNN is applied to each of them and 20 labels are obtained per 3D model. The final label is decided by majority vote. This algorithm is evaluated on the 3D models belonging to the test phase . The number of them is 2468 as shown above. The classification accuracy reaches 87.16%(=2151/2468). The following table is quoted from the Princeton ModelNet page. The algorithm proposed here is not bad in spite of the simple procedures.

Algorithm Accuracy
MVCNN 90.1%
VoxNet 80.3%
DeepPano 77.63%
3DShapeNets 77%
the proposed algorithm 87.16%

The detailed information on accuracy for each category is shown below.

References

  1. Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller, Multi-view Convolutional Neural Networks for 3D Shape Recognition
  2. D. Maturana and S. Scherer. VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition. IROS2015.
  3. B Shi, S Bai, Z Zhou, X Bai. DeepPano: Deep Panoramic Representation for 3-D Shape Recognition. Signal Processing Letters 2015.
  4. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang and J. Xiao. 3D ShapeNets: A Deep Representation for Volumetric Shapes. CVPR2015.

0 件のコメント:

コメントを投稿