2016年4月30日土曜日

OpenGLで影を付ける方法

はじめに


 物体と平面を考える。OpenGLにて物体の影を平面上に描画する最も簡単な方法は影行列なるものを使う方法である。ここでは、この影行列を求め、実装してみる。

影行列の導出


 同次座標系で考える。すなわち、ベクトルは全て4次元ベクトル、行列は$4\times4$行列である。物体上の任意の点を $\vec{p}$、光源の位置を $\vec{l}$、平面の法線ベクトルを $\vec{n}$、点 $\vec{p}$ を平面上に投影した点を $\vec{q}$ とおく(上図参照)。点 $\vec{q}$ は平面上にあるから \begin{equation} <\vec{n},\vec{q}> = 0 \label{eq_1} \end{equation} が成り立つ。ここで、$<\vec{n},\vec{q}>$ は内積 $\vec{n}^T \cdot \vec{q}$ を表す。これはスカラー量である。また、次式が成り立つ。 \begin{equation} \vec{q}-\vec{l} = t\;(\vec{p}-\vec{l}) \label{eq_2} \end{equation} $t$ は求めるべき実数である。式(\ref{eq_2})の両辺に$\vec{n}$をかけて \begin{equation} <\vec{n},\vec{q}>-<\vec{n},\vec{l}>=t<\vec{n},\vec{p}-\vec{l}> \end{equation} 式(\ref{eq_1})より \begin{equation} -<\vec{n},\vec{l}>=t<\vec{n},\vec{p}-\vec{l}> \end{equation} よって \begin{equation} t=\frac{-<\vec{n},\vec{l}>}{<\vec{n},\vec{p}-\vec{l}>} \label{eq_3} \end{equation} を得る。式(\ref{eq_2})に代入して \begin{eqnarray} \vec{q} &=&\vec{l} - \frac{<\vec{n},\vec{l}>}{<\vec{n},\vec{p}-\vec{l}>} (\vec{p}-\vec{l}) \\ &=&\frac{1}{<\vec{n},\vec{p}-\vec{l}>}\left[ -<\vec{n},\vec{l}>\vec{p}\;+<\vec{n},\vec{p}>\vec{l} \right]\\ &=&\frac{1}{<\vec{n},\vec{l}-\vec{p}>}\left[ <\vec{n},\vec{l}>\vec{p}\;-<\vec{n},\vec{p}>\vec{l} \right]\\ &=&\alpha\left[ <\vec{n},\vec{l}>\vec{p}\;-<\vec{n},\vec{p}>\vec{l} \right] \label{eq_4} \end{eqnarray} $\alpha\equiv 1/<\vec{n},\vec{l}-\vec{p}>$としたが、同次座標系を考えているので $\alpha$ は任意の実数として良い。 ここで、右辺第2項の第 $i$ 成分 $<\vec{n},\vec{p}>l_i$ を考える。 \begin{eqnarray} <\vec{n},\vec{p}>l_i &=&\sum_j\;n_j p_j l_i \\ &=&\sum_j\;l_i n_j p_j \\ &=&\left[\left(\vec{l}\cdot\vec{n}^{T}\right)\vec{p}\right]_i \end{eqnarray} 従って \begin{eqnarray} \vec{q} &=& \alpha\left[<\vec{n},\vec{l}>\vec{p}\;-\left(\vec{l},\vec{n}^{T}\right)\vec{p}\right] \\ &=& \alpha\left[ <\vec{n},\vec{l}>E-\left(\vec{l},\vec{n}^{T}\right)\right] \vec{p} \end{eqnarray} となる。$E$ は単位行列である。以上から \begin{eqnarray} \vec{q}&=&\alpha\;M\;\vec{p}\\ M&\equiv& <\vec{n},\vec{l}>E -\left(\vec{l},\vec{n}^{T}\right) \end{eqnarray} を得る。$M$が影行列である。

実装

影行列は207行目から222行目にかけて計算されている。$\alpha=1$ とした。また、床は $z=0$ ではなく $z=-0.005$ としてある。影をきれいに描画させるためである。

結果


開発環境


 Xcode Version 6.2

疑問


 $\alpha$ を負の数に取ると影が描画されない。何か勘違いしているのか?

2016年4月24日日曜日

gitでのbranchの作り方

ブランチを作る。 ブランチ一覧を見る。 ブランチを切り替える。 ブランチにpushする。 特定のブランチをcloneする。 特定のタグに切り替える。(ブランチを作らない) そのあともとに戻す。

2016年4月4日月曜日

Fully Convolutional Networks by Chainer

in Japanese

Introduction


 In the previous page, a scene recognition (15-class classification) was performed by using Chainer. In this page, a Fully Convolutional Network (FCN) is simplified and implemented by means of Chainer.

Computation Environment


 The same instance as the previous page, g2.2xlarge in the Amazon EC2, is used.

Dataset


 In this work an FCN is trained on a dataset VOC2012 which includes ground truth segmentations. The examples are given below:
The number of images is 2913. I divided them by the split ratio 4:1. The former set of images corresponds to a training dataset, and the latter a testing one.

number of train number of test
2330 580

The number of the training (testing) images is rounded off to make it a multiple of 10 (5). 10 (5) is a mini batch size for the training (testing) images. Though the original FCN operates on an input of any size, for simplicity the FCN in this work takes a fixed-sized (224$\times$224) input. Therefore all images are resized in advance as shown below:
I cropped a maximum square centered on an image and resized it to 224$\times$224.
 VOC2012 provides 21 classes as specified below:

label category
0 background
1 aeroplane
2 bicycle
3 bird
4 boat
5 bottle
6 bus
7 car
8 cat
9 chair
10 cow
11 diningtable
12 dog
13 horse
14 motorbike
15 person
16 potted plant
17 sheep
18 sofa
19 train
20 tv/monitor

Network Structure


 The detailed structure of a network used in this work is as follows:
In the original FCN, there are also some layers that follow the layer pool5. For simplicity, these layers are removed in this work.
  1. name: a layer name
  2. input: a size of an input feature map
  3. in_channels: the number of input feature maps
  4. out_channels: the number of output feature maps
  5. ksize: a kernel size
  6. stride: a stride size
  7. pad: a padding size
  8. output: a size of an output feature map
pool3, pool4, and pool5 are followed by score-pool3, score-pool4, and score-pool5, respectively. Their parameters are shown below:
I call their (score-pool3, score-pool4, and score-pool5) outputs p3, p4, and p5.
The layers, upsampled_pool4 and upsampled_pool5 shown in the table above, are applied to p4 and p5, respectively. After summing their (upsample_pool4 and upsample_pool5) outputs and p3, the sum is upsampled back to the original image size through the layer upsample_final shown in the table. This net structure corresponds to the FCN-8s described in the original FCN.

Implementation of Network


 The network can be written in Chainer like this:

-- myfcn.py -- I assigned pixels on the borderline between objects and a background with a label of -1. To do so leads to no contribution from those pixels when calculating softmax_cross_entropy (see the Chainer's specification for details). The function calculate_accuracy also throws away contribution from the pixels on the borderline. The function add is defined as:

-- add.py --

Training


 The script to train is as follows:

-- train.py -- -- mini_batch_loader.py --
I used copy_model described in the page. The file VGGNet.py was downloaded from the page. The procedures expressed in the script train.py are as follows:
  1. make an instance of the type MiniBatchLoader
  2. make an instance of the type VGGNet
  3. make an instance of the type MyFcn
  4. copy parameters of the VGGNet instance to those of MyFcn one
  5. select MomentumSGD as an optimization algorithm
  6. run a loop to train the net
To load all the training data on the GPU memory at a time causes the error "cudaErrorMemoryAllocation: out of memory" to occur. Therefore, only the minibatch-size data is loaded every time the training procedure requires it.

Results


 Iterations are terminated at the 62nd epoch. The accuracies and the losses for the training and testing datasets are shown below.

-- Training Dataset --
-- Testing Dataset --
 The accuracy is defined as the average percentage of those pixels per image, which have labels classified correctly, over the training or the testing dataset. The behavior of the accuracy and the loss for the training dataset is what we would expect it to be, while for the testing dataset the loss curve surges at some point. It seems to indicate overfitting. The final accuracies are 99% for the training dataset and 82% for the testing dataset. A sample set of an input image, a predicted segmentation, and a ground truth is shown with the accuracy below:
Other samples of predicted and groundtruth images are given below:

-- Training Dataset --

-- Testing Dataset --
Because the accuracy benefits from the background pixels, I think that in order to get satisfied results the accuracy has to achieve over 90%.