Introduction
In this page, I will describe a brief explanation on the theory of the K-means clustering and implement a simple image segmentation by means of a functioncv::kmeans
the OpenCV provides to us.
Theory
Suppose we have a data setOur goal is to partition the data set into some number K of clusters, where the value of K is given.
When we introduce a vector
where,
The procedures to minimize J are as follows:
- Initialize the cluster centers
in some way.
- Minimize J with respect to the
, keeping the
fixed.
-
Minimize J with respect to the
, keeping the
fixed.
-
Repeat these optimizations until both of the
and the
converge.
In order to consider the 3rd procedure, we set J's derivative with respect to
It results in the following equation,
The denominator in this expression is equal to the number of points assigned to the cluster k. Namely,
- Decide the initial centers of the clusters in some way.
- Assign points to the closest clusters.
- Calculate the mean of the points assigned to the cluster, and replace the center by the mean.
- Repeat the 2nd and the 3rd procedures until the centers of the clusters converge.
Practice in OpenCV
A function to execute the K-means clustering is cv::kmeans. In the following program, the 3 dimensional space (RGB) is considered. A pixel on an image corresponds to a point in 3D space. The K-means clustering yields the K clusters each of which has a set of points with similar color.- 3rd-11th lines : Display an input image. A variable
image
indicates a H x W matrix with 3 channels. H and W are the height and the width of the image, respectively. - 13th-15th lines : The function
cv::kmeans
requires the matrix in which each row corresponds to a coordinate of a point. In this sample program, the coordinate of the point is (B,G,R). To meet the requirement, the matriximage
must be modified by using a functionreshape
. After the modification,image
translates into a T x 3 matrix with 1 channel (reshaped_image
), where T=W * H. - 18th-21th lines : As the function
cv::kmeans
also requires the matrix consisting of floating-point elements,reshaped_image
must be converted to the floating-point matrix (reshaped_image32f
) by applying a functionconvertTo
. - 23th-27th lines : After preparing two matrices for outputs (
labels
andcenters
) and creating an instance ofcv::TermCriteria
which represents conditions to quit the algorithm, we execute the functioncv::kmeans
. In this case, I chose the maximum number of iteration (100) as the condition to quit the algorithm. The final argument of the constructor ofcv::TermCriteria
is not used. The 6th argument of the functioncv::kmeans
is set tocv::KMEANS_RANDOM_CENTERS
. This means that the initial centers of the clusters are randomly decided. - 29th line : Display result. A function
show_result
is as follows:
- 3rd-7th lines : A variable
labels
indicates a T x 1 matrix. The each row of it has the value of the label. Ifcluster_number
is 3, the value is either 0, 1, or 2. A variablecenters
is acluster_number
x 3 matrix. The each row of it has a coordinate of a center of a cluster. In this case, the coordinate corresponds to the vector specified in the RGB space. - 9th line : Prepare a matrix
rgb_image
in which the final image is stored. - 14th-16th lines : The matrix
centers
has floating-point values. These values are converted into std::uint8_t-type ones with the range [0,255]. Moreover, the number of channels is also translated into 3. - 18th-23th lines:The label is replaced by the corresponding (B,G,R) to make the final image
rgb_image
.
cluster_number=2
cluster_number=3
cluster_number=5
cluster_number=10
original image
Development Environment
- Mac OS X 10.8.2
- Processor:3.06 GHz Intel Core 2 Duo
- Memory:4GB
- Xcode4.6 with Apple LLVM 4.2(C++ Language Dialect → GNU++11, C++ Standard Library → libc++)
- boost-1.51.0 built by the Apple LLVM 4.1. See here
- opencv-2.4.3 built by the Apple LLVM 4.2. See here.
Reference
- Pattern Recognition and Machine Learning, Christopher M. Bishop, Springer, p424-430
- OpenCV Document