Introduction
In this page, I explain my attempt to implement a scene recognition. This site has an interpretation enough to accomplish my purpose. As shown in the site, the steps to realize the scene recognition are as follows:
- extraction of local features
- detection of visual words
- construction of histograms by visual words
- training of Support Vector Machine(SVM)
- prediction by SVM
Extraction of Local Features
The opencv2 supplies the class DescriptorExtractor. It has the static method
create
. Passing a string that indicates a type of local features as an argument to it, it returns the following extractors:
- SIFT
- SURF
- ORB
- BRIEF
src
is an input image, keypoints
is a cv::KeyPoint object including coordinates of grid points, and descriptors_
is an output (i.e. descriptors of local features) whose type is cv::Mat.
Detection of Visual Words
From the obtained local features I chose some ones, and ran the k-means clustering over them. The opencv2 has the class BOWKMeansTrainer to do the k-means clustering. After adding the selected local features by the method
add
,
the method cluster
is used to run the clustering
where, trainer_
is a BOWKMeansTrainer object, descriptors
is a set of selected local features, and vocabulary
is an output (i.e. centroids of clusters).
Construction of Histograms
For this purpose, I used the class BOWImgDescriptorExtractor supplied by the opencv2. Setting the
vocabulary
calculated earlier to a BOWImgDescriptorExtractor object bow_extractor_
,
I calculated a histogram as below,
where, keypoints
is the same cv::KeyPoint object as used in Extraction of Local Features, and histogram
is an output (i.e. histogram). One histogram is obtained from one image. Notice that the histogram calculated above is already normalized.
Training of SVM
The effective kernels to realize a scene recognition are as follows:
- χ2 kernel
- histogram-intersection kernel
positive
and negative
passed to the method SvmTrainer::execute
are histograms calculated from indoor and outdoor images. On the forth line, they are unified into one quantity features
. On the ninth and tenth lines, I creat labels to distinguish between positive(indoor) and negative(outdoor) data. The kernel is constructed and initialized on the twelfth and thirteenth lines. Lastly, on the sixteenth and seventeenth lines, the svm is trained. As shown in the above codes, the backend library is libsvm. After training, the svm object is saved as follows:
The library shogun has the method save_serializable
that serializes the object.
At this point, the classifier is completed.
Prediction by SVM
To restore the svm object, I used the method
load_serializable
.
On the ninth line, the svm object is restored, and the prediction is executed on the twenty first line. The result is returned with the type of shogun::CLabels
. It is saved on the twenty third line.
Parameter Setting
extraction of local features
I selected SIFT as the algorithm of the descriptor. Its dimension is 128. The grid interval is 5 pixels. The member variablessize
and angle
in the class cv::KeyPoint
are (16,24,32)-pixels and (0,120,240)-degrees, respectively.
detection of visual words
The number of clusters is 10000. The number of descriptors that are passed as an input is 648659. These descriptors are calculated from about one 60th of all images.construction of histograms
I selected the BruteForce as the local search algorithm.others
As I said at the start of this page, 15 categories are classified into the indoor group and the outdoor one. The number of images belonging to the indoor group is about half that belonging to the outdoor group. Adding mirror-reversed images to the indoor group, I balanced the number of the indoor images with that of the outdoor images.Results
I carried out K-fold cross-validation. I set K=3. I partitioned all images into 3 groups, one group was retained as the validation data for testing the svm predictor, and remaining groups were used as training data. The cross-validation process was then repeated 3 times. The 3 results was averaged to produce a single estimation.
χ2kernel
width=0.5, c=10indoor | outdoor | |
---|---|---|
recognition rate | 0.820 | 0.819 |
histogram-intersection kernel
β=1,c=10indoor | outdoor | |
---|---|---|
recognition rate | 0.781 | 0.827 |
gauss kernel
width=0.5,c=100indoor | outdoor | |
---|---|---|
recognition rate | 0.813 | 0.805 |
Todo List
- I will open my source codes.
- My purpose was to construct the framework of the scene recognition. I used the standard algorithms. I will consider the Gaussian Mixture Model that has a continuous distribution in contrast to the histogram. I would like to study the Fisher's Vector under intense investigation.
- I will extend the two-class classifier to 15-class one.
- I will consider parallelization by Hadoop to train the svm and predict by it.
- I will implement a web application that receives an image from an user and sends a prediction to the user.
Reference Sites
- n_hidekey's dialy:This site gives a detailed description of procedures for a scene recognition.
- LSP15:dataset used in this page
- shogun:SVM library
- More Than Technical:an implementation sample of a scene recognition with the opencv
- "Trend and Theory in Large-Scale Object and Scene Recognition", Tatsuya Harada.
I need code.
返信削除Plz give it to me..
返信削除plzzzzzzzzzzzzz