memo

Introduction

In this page, I will describe a brief explanation on the theory of the K-means clustering and implement a simple image segmentation by means of a function cv::kmeans the OpenCV provides to us.

Theory

Suppose we have a data set

consisting of N points each of which is defined in the D-dimensional Euclidean space as

.
Our goal is to partition the data set into some number K of clusters, where the value of K is given.
When we introduce a vector

to represent a center of each cluster, an objective function we should minimize is defined as

where,

is a variable that equals 1 if the point

belongs to the cluster k, otherwise 0.
The procedures to minimize J are as follows:

Initialize the cluster centers in some way.
Minimize J with respect to the , keeping the fixed.
Minimize J with respect to the , keeping the fixed.
Repeat these optimizations until both of the and the converge.

It's easy to realize the 2nd procedure. It's only necessary to choose

which has the minimum distance to the point

we now consider and set

to 1.
In order to consider the 3rd procedure, we set J's derivative with respect to

to zero,

It results in the following equation,

The denominator in this expression is equal to the number of points assigned to the cluster k. Namely,

indicates the mean of all of the data points assigned to the cluster k. On the basis of these discussion, the procedures to minimize J are rewritten as follows:

Decide the initial centers of the clusters in some way.
Assign points to the closest clusters.
Calculate the mean of the points assigned to the cluster, and replace the center by the mean.
Repeat the 2nd and the 3rd procedures until the centers of the clusters converge.

A direct implementation of the 2nd procedure is to compute the distance between every center and every point and to find the minimum distance. As it is relatively slow, we have to introduce a data structure such as a tree.

Practice in OpenCV

A function to execute the K-means clustering is cv::kmeans. In the following program, the 3 dimensional space (RGB) is considered. A pixel on an image corresponds to a point in 3D space. The K-means clustering yields the K clusters each of which has a set of points with similar color.

3rd-11th lines : Display an input image. A variable image indicates a H x W matrix with 3 channels. H and W are the height and the width of the image, respectively.
13th-15th lines : The function cv::kmeans requires the matrix in which each row corresponds to a coordinate of a point. In this sample program, the coordinate of the point is (B,G,R). To meet the requirement, the matrix image must be modified by using a function reshape. After the modification, image translates into a T x 3 matrix with 1 channel (reshaped_image), where T=W * H.
18th-21th lines : As the function cv::kmeans also requires the matrix consisting of floating-point elements, reshaped_image must be converted to the floating-point matrix (reshaped_image32f) by applying a function convertTo.
23th-27th lines : After preparing two matrices for outputs (labels and centers) and creating an instance of cv::TermCriteria which represents conditions to quit the algorithm, we execute the function cv::kmeans. In this case, I chose the maximum number of iteration (100) as the condition to quit the algorithm. The final argument of the constructor of cv::TermCriteria is not used. The 6th argument of the function cv::kmeans is set to cv::KMEANS_RANDOM_CENTERS. This means that the initial centers of the clusters are randomly decided.
29th line : Display result. A function show_result is as follows:

3rd-7th lines : A variable labels indicates a T x 1 matrix. The each row of it has the value of the label. If cluster_number is 3, the value is either 0, 1, or 2. A variable centers is a cluster_number x 3 matrix. The each row of it has a coordinate of a center of a cluster. In this case, the coordinate corresponds to the vector specified in the RGB space.
9th line : Prepare a matrix rgb_image in which the final image is stored.
14th-16th lines : The matrix centers has floating-point values. These values are converted into std::uint8_t-type ones with the range [0,255]. Moreover, the number of channels is also translated into 3.
18th-23th lines：The label is replaced by the corresponding (B,G,R) to make the final image rgb_image.

The results are as follows:
cluster_number=2

cluster_number=3

cluster_number=5

cluster_number=10

original image

Development Environment

Mac OS X 10.8.2
Processor：3.06 GHz Intel Core 2 Duo
Memory：4GB
Xcode4.6 with Apple LLVM 4.2（C++ Language Dialect → GNU++11, C++ Standard Library → libc++）
boost-1.51.0 built by the Apple LLVM 4.1. See here
opencv-2.4.3 built by the Apple LLVM 4.2. See here.

Reference

Pattern Recognition and Machine Learning, Christopher M. Bishop, Springer, p424-430
OpenCV Document

in English

はじめに

K-meansクラスタリングの理論とOpenCVを使った実践をまとめます。

理論

D次元ユークリッド空間内の点

を考える。

いま、N個の点

が与えられたとき、これらをK個のクラスタに分割することを考える。
各クラスタの中心座標を

としたとき、最小にすべき目的関数Jを次式で定義する。

ここで、

は、点

がクラスタkに属するなら1を、そうでなければ0をとる変数である。
Jを最小にする手順は以下の通りである。

に適当な初期値を与える。
を固定し、についてJを最小にする。
を固定し、についてJを最小にする。
とが収束するまで、2,3の手順を繰り返す。

手順2を実現することは容易である。ある点

を考えたとき、それに最も近い

を検索すればよい。
手順3を実現するため、Jを

で偏微分する。

これより次式を得る。

上式の分母はクラスタkに属する点の総数である。すなわち、

は、クラスタkに属する点の重心となる。これらを踏まえて、Jを最小にする手順を書き直すと以下のようになる。

クラスタの中心座標を適当に決める。
各点を一番近いクラスタに割り振る。
割り振られた点の重心を計算し、それを新たなクラスタの中心座標とする。
クラスタの中心座標が変化しなくなるまで、2と3の手順を繰り返す。

手順2の最も直接的な実現方法は、各点と全てのクラスタの中心座標との距離を計算し、その最小値を探すというものである。これは大変時間がかかる。木構造などを導入し高速化すべきである。

OpenCVによる実践

K-meansクラスタリングを実行する関数はcv::kmeansである。以下の実装では、3次元色空間内の点を考え、類似色が同一のクラスタに属するような領域分割を行っている。

3行目から11行目まで：入力画像を読み込み表示する。変数imageはチャンネル数3のH x Wの行列である。Hは画像の高さ、Wは画像の幅を表す。
13行目から15行目まで：関数kmeansは、各行に点の座標を持つ行列を要求する。従って、入力画像を保持する行列imageの形状を、reshapeを使って変更する。この変更により、各行にB,G,Rの値の並ぶT x 3の行列になる。ここでT=W * Hである。
18行目から21行目まで：関数kmeansは浮動小数点の行列を要求するので、convertToを使って値の型を変更する。
23行目から27行目まで：関数kmeansを実行する。出力結果を格納する2つの行列labels, centersを用意する。さらに、アルゴリズムを停止させる条件を表現したクラス cv::TermCriteria のインスタンスを作成する。ここでは、最大繰り返し数を100とした。最後の引数の1はここではダミーである。kmeansの第6引数にcv::KMEANS_RANDOM_CENTERSを指定した。これは、クラスタの中心座標値の初期値を乱数で決めることを意味する。
29行目：結果を表示する。その中身は以下の通りである。

3行目から7行目まで：labelsはT x 1の行列である。各行にはラベルの値が収められている。cluster_numberを3とすれば、整数値0,1,2のいずれかとなる。centersはcluster_number x 3の行列である。各行にクラスタの中心座標が収められる。いまの場合、(B,G,R)の値が並ぶことになる。
9行目から12行目まで：最終結果を収める行列rgb_imageを作る。
14行目から16行目まで：centersの要素の型は浮動小数点である。これを0から255までのstd::uint8_tの型に変更する。さらに、チャンネル数を3に変更する。
18行目から23行目まで：ラベルの値を(B,G,R)に置き換え、最終画像rgb_imageを作る。

結果を示す。
cluster_number=2

cluster_number=3

cluster_number=5

cluster_number=10

元画像

開発環境

Mac OS X 10.8.2
プロセッサ：3.06 GHz Intel Core 2 Duo
メモリ：4GB
Xcode4.6 with Apple LLVM 4.2（C++ Language Dialect → GNU++11, C++ Standard Library → libc++）
boost-1.51.0（Apple LLVM 4.1でコンパイルしたもの。こちら）
opencv-2.4.3（Apple LLVM 4.2でコンパイルしたもの。こちら）

参考文献

Pattern Recognition and Machine Learning, Christopher M. Bishop, Springer, p424-430
OpenCVドキュメント