Deep Adaptive Log-Demons: Diffeomorphic Image Registration with Very Large Deformations

This paper proposes a new framework for capturing large and complex deformation in image registration. Traditionally, this challenging problem relies firstly on a preregistration, usually an affine matrix containing rotation, scale, and translation and afterwards on a nonrigid transformation. According to preregistration, the directly calculated affine matrix, which is obtained by limited pixel information, may misregistrate when large biases exist, thus misleading following registration subversively. To address this problem, for two-dimensional (2D) images, the two-layer deep adaptive registration framework proposed in this paper firstly accurately classifies the rotation parameter through multilayer convolutional neural networks (CNNs) and then identifies scale and translation parameters separately. For three-dimensional (3D) images, affine matrix is located through feature correspondences by a triplanar 2D CNNs. Then deformation removal is done iteratively through preregistration and demons registration. By comparison with the state-of-the-art registration framework, our method gains more accurate registration results on both synthetic and real datasets. Besides, principal component analysis (PCA) is combined with correlation like Pearson and Spearman to form new similarity standards in 2D and 3D registration. Experiment results also show faster convergence speed.


Introduction
The aim of image registration is to establish spatial correspondences between two or more images of the same/or different scene acquired at different times, from different viewpoints, and/or by different sensors. Usually the ability to capture complex and large image deformations is vital to many computer vision applications including image registration and atlas construction. The problem becomes more challenging when the object in the image or edge of the image undergoes severe deformation [1].
Take medical image registration for example, tissues and organs or body itself are prone to deform, move, and rotate under most circumstances. Most methods iteratively reach a satisfying overlap under specific mathematical criterions, maximizing or minimizing deformation energy as described in (1). Fixed image is defined as , while moving image as . Registration aims to find the optimal model that best satisfies energy . As a result, model , objective function (similarity metric), and optimization method constitutes the three main components of image registration. Consider According to a state-of-the-art survey [2], registration can be classified into rigid and nonrigid registration. Rigid models restrain the optimum to a few parameters to achieve global registration, while nonrigid models recover local deformation through physical model like elastic or viscous, or statistical model or support vector regression framework, and so forth. In order to fully overlap two images, researchers commonly adopt the two-step strategy, which contains initial registration and following iterative registration [3].
In the two-step strategy, registration firstly begins with a global affine transformation for initial global alignment, take state-of-the-art method FLIRT [4] and ELASTIX [5,6] for example. Or fiducial markers are firstly detected through feature descriptors, for example, the SIFT method [7], so the initial registration is carried out to establish correspondences between these point sets. In preregistration procedure, rotation, scale and translation of the moving image are modified by the calculated affine matrix. After that, nonrigid registration iteratively goes on. One severe problem of preregistration in affine matrix is that when large distortion and rotation both exists, accuracy is limited by correspondences between those region-based descriptors. If descriptor itself is not accurate, problem becomes more severe. Once descriptors fail to discover point correspondences, accuracy of following registration would be badly influenced. As a result, imprecision may also be introduced misleading the following procedure. Besides, traditional FLIRT and ELASTIX method declares that images for registration must be with the best quality, otherwise poor registration may occur. In order to address these above limitations and capture very complex and large deformations, we proposed a new approach for image registration based on a twolayer deep adaptive registration framework. Firstly, in the preregistration procedure, rotation, scale and translation extent between two images are obtained separately to achieve initial registration. This is quite different from traditional "one time calculated" affine matrix. For rotation parameter, a CNN classifier is trained offline in order to identify the level of current image rotation under sever distortion. Then scale and translation parameters are obtained. An optimum preregistration is calculated relating to above gained parameters. As for 3D images, a triplanar 2D CNNs [8] around each voxel is utilized for calculation of final affine matrix. Until now, preregistration is done. Secondly, the rectified images are further recovered through the following nonrigid demons registration procedure. In the next circle, the former registration further facilitates results of the later registration of last iteration. This iterative procedure is carried out until an optimum overlap between the two images is achieved. Besides, PCA is introduced to extract the most valuable features, and detected features are put into SSD, Kendall, Pearson and Spearman, and so forth to form new similarity metric. Also, a triplanar 2D PCA is proposed to process 3D registration problem and Figure 1 gives details of the algorithm. As a result, convergence speed is accelerated while maintaining the same registration accuracy. Figures 1, 2 and Algorithm 3 illustrate work flow of our framework of processing 2D and 3D image registration.
The work introduced in this paper contributes in the following aspects:  images, triplanar 2D CNNs is constructed to estimate parameters of affine matrix. This new preregistration performs better than state-of-the-art ELASTIX and SURF-based methods.
(ii) A two-layer adaptive registration framework is constructed and it performers better than other so-called two-step strategies.
(iii) PCA is used to extract valuable features and introduced into traditional similarity metric as SSD, Pearson, and so forth. For 3D images, triplanar 2D PCA is proposed to process 3D registration problem. Experiment results show that convergence speed is accelerated with the new similarity standard.
(iv) The proposed framework is tested under both synthetic and nature 2D and 3D images under various extent deformation. Experiment results show that our two-layer deep adaptive registration framework is able to identify the extent of rotation under sever deformation more precisely and correct large and complex distortions with high dice ratio than the comparative methods as it adaptively modify differences between images while others does not have any deep insight of deformation between images.
The rest of the paper is organized as follows. The whole architecture of the proposed two-layer adaptive registration framework for 2D and 3D images is illustrated in Section 2; Section 3 explains methodology of our CNNs classifier preregistration; Section 4 introduces our preregistration in combination with demons nonrigid registration and our new PCA related similarity metric; the proposed methods are evaluated in Section 5 under different datasets and evaluation principles; finally, the conclusion of this work is given in Section 6.

Architecture
2.1. CNNs for 2D Images. The whole workflow of our 2D image preregistration compared with traditional method is illustrated in Figure 1. In traditional algorithms, an affine matrix is calculated through correspondences between detected features, containing information of rotation, scale, and translation. This procedure is significantly influenced by accuracy of detected feature points. And under sever deformation images, traditional feature methods usually corrupt. Our algorithm processes each of the three above elements separately. By refining each procedure, accurate correspondences between fixed and moving image is obtained. It works as follows.
(i) For rotation, firstly the CNNs classifier is trained offline in order to rectify rotation extent of image under sever deformation. The trained CNNs classifier can identify as much as 360 classes of rotation.
(ii) For scale, image size information is utilized to achieve consistency between fixed and moving image.
(iii) For translation, centroid of each image is calculated through statistical algorithm and translation is achieved by utilizing position information of centroids.

Triplanar
CNNs for 3D Images. Different from the 2D image preregistration, CNNs classifier here is used for the slice location of one voxel ( , , three directions) instead of the rotation identifier. The work flow of 3D image preregistration is showed in Figure 2. The main procedure includes sampling, slices classification, transform matrix calculation, and image transformation by the matrix. Using CNNs on 3D image registration is a new attempt to resolve image registration for high deformation. Detailed method is introduced in Section 5.2.

Preregistration
Our strategy consists of firstly preregistration through CNNs classifier on both 2D and 3D images and then utilizing CNNs and demons algorithm adaptively in the following nonrigid registration and finally improving similarity metric for acceleration of registration convergence speed. In this section, we show our preregistration methodology by introducing our CNNs rotation classifier.

Why We Use CNNs
(1) The Robustness of Classification. CNNs are a kind of data based classification method which undertakes training by appropriate amount of data. CNN is suitable for nearly any types of data and can make classification with high accuracy, especially for the low quality of fMRI, CT images or images under high deformation (Experiment in Section 5 shows these two kinds of real data are suitable for CNN processing method). Detailed CNNs structure and back propagation training method will be described in Section 3.5.
(2) Automatic Image Feature Perception. Nearly all kinds of preregistration method are based on precise feature perception so that the different feature perception methods are playing the key role in this procedure. Traditional image feature perception method is usually based on expert designated data feature. Usually experts give some fixed method to detect specific features of limited kinds of images. For example FLIRT method using inter-model voxel similarity measures where correlation ratio and mutual information are used to detect voxel relationships of different parts. This method has high limitation to the image sources, quality and variable settings. When exceptional case happens, some large deformation images are input for example, it will not work well. While the features from CNNs method are learned by network itself from training data such as edge, brightness, high or low frequency feature, distribution features and so on. Once the training data is updated, the network will get fit for more features automatically at the same time. Although long training time and complicated network variable learning makes CNN method not so easy to use, because of its high accuracy, it is still the image processing trend and future.
(3) High Efficiency Classification. Although the data training time of CNNs is long (depending on the detailed training method, network layer structure and hardware equipment like GPU), the total time spent on testing or classification is very short. Once the network is trained well, the only time consuming for processing is as short as linear operation.
Above all, even though there are some good affine transformation methods based on expert knowledge, we still need a smarter one to adapt to more complex image processing tasks in the future.

Theory of CNNs.
The concept of deep learning was raised by Hinton and Salakhutdinov [9] in 2006, and it has brought great advances to machine learning since then. Deep learning aims to construct/use brain simulations to recognize data such as image/video, audio and text in an unsupervised way. Deep learning framework uses a multilayer "encoder" network to transform the high-dimensional data into a lowdimensional code and a similar "decoder" network to recover the data from the code. Outputs of low layer network acts as inputs of higher layer network. The whole network aims to equal inputs and outputs without loss of information. By using lower layer features to represent higher layer feature/classification, distributed feature representation of data is found. Auto encoder, Sparse coding, Restricted Boltzmann Machine (RBM), Deep Belief Networks (DBNs) and CNNs are five kinds of deep learning framework. Convolutional neural networks are excellent deep learning architectures, which were firstly introduced by Fukushima [10] and applied for handwritten digit recognition. Image recognition and segmentation tasks have also successfully used CNNs since then, with an error rate as low as 0.23 percent on the MINST database [11]. Besides, it is of high speed and accuracy for image classification in [12]. In facial recognition [13,14] and video quality analysis [15], CNNs also gained large decrease in error rate and root mean square error.
A CNN is a multilayer perceptron consisting of multilayers, each layer with a convolutional layer followed by a subsampling layer. Through locally connected networks, stationary features of natural images are exploited by the network topology. Firstly, images are sampled into small patches. In the convolutional layers, small feature detectors are learned based on these extracted samples. Then, a feature is calculated by convolution of the feature detector and the image at that point. In the sampling layer, the number of features is reduced to reduce computational complexity and introduce invariance properties. One significant property of features learnt by CNNs is invariance to translation, rotation, scale and other deformations. This twice feature extraction structure enables CNNs with high distortion tolerance when identifying input samples.

CNNs Methodology.
The goal of CNNs has no difference with other classification methods. They both focus on minimal total square error. Here we use to denote the class number, and to denote the training dataset, the total square error function can be shown: Here is the -dimension of the dataset, the stands for the output from the network, activation function in CNNs is sigmoid function for faster convergence rate. For each single dataset , (2) can be describes as (3). The final aim of CNNs is to achieve smallest total square error between and . Consider For traditional full connection neural network, BP (Back propagation method) is used to calculate partial derivative to get the minimum square error, usually the current layer, the output of can be shows as (4), where is sigmoid function. Consider Unlike (4), as (5) shows, for the convolutional layer , the image features ( ) from prior layer is convoluted by kernel which is different in different layers, is the offset of sigmoid function . Consider For the sample layer, the image feature numbers and styles are the same with prior layer except the feature size is scaled  down. Each feature contains a multi and addition kind offset. The down sample size in this paper is 2 which means the next layer image size is shrink two times by both weight and height. So through combination of (4) and (5), we can get sample equation (7) in which stands for the value of no. output with no. input features. By calculating and training kernels by back propagation method we can finally get the best features from different layers with high classification accuracy. Consider Constraint condition ∑ = 1, and 0 ≤ ≤ 1. As shown in Figure 3, input images are defined as input layer; detailed introduction can be found in Sections 3.4 and 3.5. Hidden layer is the four pairs of convolutional and subsampling layer, which are denoted as ( =1,2,3,4) , ( =1,2,3,4) and called local connection layer. The output layer is a combination of full connection layer and softmax classifier for classification. Each layer of ( =1,2,3,4) and ( =1,2,3,4) is constructed with multi-maps and each map is consisted of multi independent neural cells. Let ( −1) and ( ) be the input and output for the th layer, ( ) × ( ) and ( ) × ( ) be the size of the input and output map, ( ) and ( ) be the number of input and output maps respectively of that layer. According to CNNs, ( ) = ( −1) ( ) = ( −1) .

CNNs Structure Design.
We adopt a ten-layer CNNs perceptron network (input and output layers are included; convolutional and sample layers are separately calculated). Key variables setting including kernel size and sample rate of different layers in proposed CNN is showed in Table 1 and Figure 3. Learning rate alpha = 1, variable update batch 1st convolutional layer 9 * 9 3 1 s t s a m p l e l a y e r 2 4 2nd convolution layer 5 * 5 5 2 n d s a m p l e l a y e r 2 6 3rd convolutional layer 5 * 5 7 3 r d s a m p l e l a y e r 2 8 4th convolution layer 5 * 5 9 4 t h s a m p l e l a y e r 2 10 Output layer None size = 10, iteration times = 1000, any training and test images are normalized to 128 * 128 size gray images with [0, 1] pixel size.

Training Image Rotation Classifier through CNNs.
Our input images for training are difference images between fixed and moving image: − . is under deformation with different extent of rotation. Each rotation angle of 360 ∘ is defined as one class, producing as much as 360 classes. Two distinguishing characters of CNNs are perception field and shared weights. Perception field means each neural cell in each layer is not connected wtih all neural cells in adjacent layers, but limited to a local area of neural cells (9 * 9 as in Figure 3). Shared weights means the connection weight parameters (9 * 9) of every neural cell to the local area cell are the same. As shown in Figure 3, suppose size of input image is 128 * 128. After convolution with filters, the kernel size of which is 9 * 9, image changes into Ts1 of 120 * 120 size. Image then scales into Tc1 60 * 60 in layer S1. After four pairs of and , the original image is represented as Ts4 of only 4 * 4 matrix. In this hidden layer, all neural cells on feature maps are not all connected but with same weights. As a result, only 9 * 9 weight parameters need to be calculated, greatly reducing computation complexity. An all connection exists between Ts4 matrix and output layer, eliminating disparity caused by partial connection in the hidden layer. Then softmax classifier identifies the matrix and outputs the detected results. After that, the parameters are fine-tuned through back propagation of 1000 times until convergence. After all these steps, a finite classifier is obtained.

Diffeomorphic Log Demons Registration.
In the 19th century, Maxwell firstly introduced the concept of demons to illustrate a paradox of thermodynamics. In 1998, Thirion [16] proposed a registration algorithm under demons model, which had a high registration precision and efficiency through pixel velocities caused by edge based forces.
Vercauteren et al. [17] proposed nonparametric diffeomorphic demons algorithm. It considers the demons algorithm as a procedure of optimization on the whole space of velocity fields and adapts that procedure in a space of diffeomorphic transformations. The transformation result is smoother and more accurate. Then Vercauteren et al. [18] brings the process into log-domain, that is, he uses a stationary velocity field. Besides, the algorithm is symmetric with respect to the order of the input images. Lorenzi et al. [19] implements a symmetric local correlation coefficient to log-demons diffeomorphic algorithm. Lombaert et al. [1] proposed spectral log-demons to capture large deformations. Peyrat et al. [20] implements multichannel demons to register 4D time-series cardiac images.
(ii) Diffeomorphic Log Demons Algorithm. Here, diffeomorphic log demons algorithm is briefly reminded. A diffeomorphic transformation is related to the exponential map of the velocity field V : = exp(V) (Algorithm 1) [1]. The log-demons framework alternates between optimization of a similarity metric updated by Euler-Lagrangian function in (10). In general, procedure of diffeomorphic log demons framework is described in Algorithm 2. Consider

New Similarity Metric by Combination of PCA.
Mathematically, PCA is defined as an orthogonal linear transformation that transforms the data to a new coordinate system to extract the greatest variance in the data set. As a result, it is able to avoid influences caused by image biases. Traditionally, PCA is used for dimensionality reduction to facilitate classification, visualization, communication, and storage of high-dimensional data. Here, PCA is applied in both 2D and 3D medical and usual images, and the detected feature representations are used as inputs of similarity metric to achieve anatomical correspondence and assist optimization procedure in registration. There are many classical metric measures, such as SSD, mutual information (MI), cross correlation (CC), pattern Input: Velocity field V. Output: Diffeomorphic map = exp(V).
(i) Choose such that 2 − V is close to 0 e.g., such that max ‖2 − V‖ ≤ 0.5 pixels (ii) Scale velocity field ← 2 − V. for times do (iii) Square ← ∘ . end for Algorithm 1: Exponential = exp(V) [1]. intensity and also their corresponding improved edition. In this paper, Pearson, Spearman, kendall, SSD together with extracted features by PCA are utilized as the new similarity metric. Pearson, Spearman and Kendall are concepts in statistics and are frequently used in data mining. Pearson is short for Pearson product-moment correlation coefficient (PPMCC), which was developed to measure the linear correlation between two variables. Spearman's rank correlation coefficient is a nonparametric measure of statistical dependence between two variables. Both of their value is between +1 and −1. Spearman has no requirement on variables, while pearson insists variables meets normal distribution. Our utilization of log demons registration avoids the influence brought by this.
(i) For 2D images of size × , firstly, PCA is applied to both fixed image and registered moving image , gaining pca and pca . Thus, most important information of image can be fully utilized by combination of pca and pca as inputs of pearson, spearman, and so forth, forming new similarity metric.
(ii) For 3D images of size × × , firstly, PCA is applied to every slice of axis and gains a series of pca ( = 1, 2, . . . , ). By summarizing each of pca ( = 1, 2, . . . , ), pca is obtained. The same operation is carried out on and axis data, obtaining pca , pca . Then, PCA of both fixed ( , , ) and registered moving ( , , ) image is calculated. Thus, information of image can be fully utilized by combination of ( , , ) and ( , , ) as inputs of PPMCC, Spearman, and so forth. Workflow of this part is shown in Figure 4.

Two-Layer Iterative Registration Framework.
Traditionally, the two step registration means an initial affine registration in the very beginning to coarsely rectify deformation and a following iterative registration to optimize a similarity metric achieving fine registration. We also adopt the two step strategy. But before the two step registration, we build up a classifier offline under CNNs training to identify rotation between fixed and moving image under very large distortions, then scale and translation. Also in each iteration, the initial and following registration are carried out iteratively. This feed-back procedure assists achieving higher Registration: Input: Images , and initial velocity field V Output: Transformation = exp(V) from to Pre-registration: through SURF related affine transformation or ELASTIX to globally register and Repeat Demons registration:  registration accuracy comparing with traditional SURF and affine method. Besides, at the end of each iteration, we utilized a new similarity metric by combining PCA with traditional SSD, pearson, and so forth, fully containing most important features of image. As a result, convergence speed is highly accelerated than traditional SSD without PCA while maintaining the same registration accuracy. Algorithm 3 shows the over flow of the framework.

Experiment Results
In this section, the performance of the whole two-layer registration method is evaluated on both 2D and 3D images, synthetic and nature datasets. For comparison, traditional two step methods, ELASTIX and SURF related algorithm are used to preregistrate moving and fixed image. Then demons nonrigid registration is conducted. These methods are set as the baseline methods, which are denoted as ELASTIX+demons and SURF+demons. They all firstly use detected features initially to register images through affine transformation and original SSD as similarity metric under the diffeomorphic log demons framework. Our method is different from their framework both in preregistration and following nonrigid registration framework. For 2D images, firstly train a rotation classifier through CNNs and preregister moving image under large distortion and rotation, then together with scale and translation transformation, preregistration is done. For 3D images, a pretrained triplanar 2D CNNs is utilized to locate voxels, establishing feature correspondences. Finally, PCA related similarity metric iteratively registering images under diffeomorphic log demons framework.
The improvement of our two-layer method in registration accuracy, robustness to large deformation and rotation, and convergence speed are all assessed with ground truth data.   images. We also tested other number layer CNNs, results showed that ten-layer CNNs achieved highest score when classifying rotation of deformed images. Four kinds of 2D source images [24][25][26][27] served as samples. An example of sample image is shown in Figure 5. Take image 1 for example, linear transformation like rotation or translation is added to image 1 by multipling rotation matrix coded through matlab; then four kinds of large and complex nonlinear transformation is added to 1 through special processing by photoshop. 1 image with only rotation is denoted as 1 ∘ , with only deformation noted as 1 , and with both rotation and deformation denoted as 1 ∘ . The same notation is with 1 and Lena image. Figure 6 is an illustration after all those processing. In order for accurate identification of rotation, here for training, difference image of and (with only rotation) 1 = ‖ 1 − 1 ∘ ‖ is input of CNNs. After training, each angle of 360 ∘ is defined as one class, obtaining 360 class of distortion. For other CNNs, number of classes is 180, 90, 36.
Our test is carried out on computer of windows 7 system, with 8 GB RAM, i7-4770 CPU @ 3.4 GHz. Take BrainWeb data [23,24], for example, Table 3 shows test results of the classifier according to these data.
As we can see from Table 3, when input images are resized into 64×64 pixels, identification of rotation can reach as much as 99.86% for classifier 90; while images are resized into 28×28 pixels, the identification accuracy for classifier 36 is 99.97%. All these are done under condition that training data is also for testing. When the testing data of BrainWeb is put into the trained classifier, accuracy reaches 99.56%, even lower than the training data itself, but still very high according to many usual classifiers. For Lena, ITK and T1 training data, classifier 36 gains an accuracy rate of 99.94%. Number of iteration is set to 1000 for every training.
(2) CNNs Preregistration Test. SURF related method, ELASTIX and our CNNs method are tested. Here, SURF related method means using firstly SURF algorithm to detect features and then affine transformation to initially register images.   Figure 5: An example of original sample image.
Lena ∘ r F 1 ∘ r T 1 ∘ r   (ii) However, when rotation and large deformation simultaneously appears in moving image as Lena ∘ in Figure 8, both ELASTIX and SURF method crushed. Under such circumstances, in our tests SURF only found one pair of correspondence points. As there are not enough feature correspondences, initial registration failed.
(iii) On the contrary, our trained CNNs classifier and following scale and translation operation directly identified Lena image's rotation angle accurately (90 ∘ rotation), and turned it back to Lena as in Figure 9. For better comparison, we used software to show ways of rotation processing in CNNs as SURF's manner, feature detecting and matching in Lena analogy-CNN . As enough number of so-called features are detected, CNNs is able to recover rotation added to Lena .
(3) Accuracy Evaluation of Registration. Mathematically, dice ratio is used to evaluate overlap between two datasets. It is defined in (11). In this section, both dice ratio and subjective human evaluation method is used to assess accuracy of ELASTIX and SURF related registration and our method result After preregistration in Section 5.1.1, ELASTIX and SURF related method performs diffeomorphic log demons algorithm iteratively to achieve for best registration; while our method iteratively carries out CNNs classifier and diffeomorphic log demons algorithm to optimize registration. This new two-layer registration framework makes full use of both preregistration and following demons method and registration results show that it indeed improves accuracy. Figure 10 shows registration procedure and result of ELASTIX (Figure 10(c)) and SURF (Figure 10(b)) related method, while Figure 11 shows that of our method. When both rotation and deformation exists in image 1 , our registration result 1 − 1C+demons is much better than 1ELASTIX+demons and 1SURF+demons apparently. Besides, to test dice ratio of registration, original fixed image 1 and registered moving image of two methods are put into function (11) separately. Dice ratio of ELASTIX and SURF-demons method is 0.889 and 0.88, while our CNNs-demons-iterative method achieves as much as 0.8964.

Lung Atlases.
Description of lung dataset can be found in Table 2 [28]. Empire 10 lung datasets are firstly used for the MICCAI conference 2010. It contains 20 intra-patient thoracitic CT image pairs. Figures 12 and 13 shows our   Figure 14. The shortest time of one slice is more than 1000 ms (1 s) and time for slice 8 is 3500 ms. Although training of our CNNs classifier costs long time, it is offline. And our CNNs rotation, scale and translation operation costs a total of only 39 ms. As a result, it is quite attractive for real-time clinical applications.

Brain Atlases.
We select the cross section 2D image of the BrainWeb MRI 20 object, 10 for training and the other 10 for testing. From Figure 15, we can see that our proposed preregistration can rectify both rotation and translation more successfully than traditional Elastix affine registration.

An Attempt on 3D Image Registration by Using CNNs.
For the 3D image registration part we focus on the brain atlases registration and give a CNNs 3D image registration method. We train brain atlas from 18 people's 3D image data in Brain-Web Brain database by four steps: (1) Randomly select 10 label points by Normal distribution in 3D image. (2) Adjust 3D brain image and separate it to 2D image on three directions ( , , ). (3) Test each 2D slice position by triplanar aftertrained CNNs classifier (each dimension enjoys one CNN    network) and get the right slice position (predicted voxels).
(4) Adjusting the 3D image to make label voxels and predict voxels that enjoy smallest hamming distance. Experiments shows the high accuracy CNNs classify results will greatly improve moving 3D image's similarity to the fixed 3D image. The detailed procedure is shown in Figure 16.  Figure 17(e). The course-to-fine (in here, three level is recommended) registration strategy is adopted in here. In Figure 17, horizontal axis stands for iteration times and vertical axis stands for the values of metric. Firstly, mean convergence extent of the three-level registration is  calculated. Then normalization is carried out on the mean value. Several conclusions can be gained:

Convergence Speed
(1) both PCA related and original SSD methods converge regularly, (2) as a whole, PCA-SSD and PCA-Pearson methods perform best and converge faster than original SSD metric; (3) PCA-spearman metric firstly converges fastest, but latterly it slows down; (4) Kendall metric performs worst compared with other metrics.

Conclusion
In this paper, a comprehensive method of constructing rotation classifier for images under severe deformation and rotation was proposed through CNNs. The classifier is able to identify distortion as much as 360 classes according to analysis of rotation angles. The classifier is utilized to assist our proposed two-layer deep adaptive registration framework. In each registration iteration, preregistration with identification of the trained classifier, scale, and translation operator and following diffeomorphic log demons registration facilitates each other one after another. Besides, proposed PCA related similarity metric helps achieve faster convergence speed.
The new two-layer registration framework is compared with traditional diffeomorphic log demons registration in combination with state-of-the-art ELASTIX and SURF preregistration. As baseline method carries out preregistration only once, large deformations cannot be fully modified. From tests on different image resources containing various kinds of both 2D and 3D, MRI, and CT datasets, our framework indeed outperforms the baseline method on both registration quality and convergence speed. In the following work, we would combine other kinds of deep learning framework as independent subspace analysis (ISA) [30], sparse coding [31], and so forth to improve current registration. Also, more performance tests of the proposed two-layer registration framework should be carried out on more data resources. Besides, the proposed method performance should be compared with other deep learning models.