This paper proposes a new framework for capturing large and complex deformation in image registration. Traditionally, this challenging problem relies firstly on a preregistration, usually an affine matrix containing rotation, scale, and translation and afterwards on a nonrigid transformation. According to preregistration, the directly calculated affine matrix, which is obtained by limited pixel information, may misregistrate when large biases exist, thus misleading following registration subversively. To address this problem, for twodimensional (2D) images, the twolayer deep adaptive registration framework proposed in this paper firstly accurately classifies the rotation parameter through multilayer convolutional neural networks (CNNs) and then identifies scale and translation parameters separately. For threedimensional (3D) images, affine matrix is located through feature correspondences by a triplanar 2D CNNs. Then deformation removal is done iteratively through preregistration and demons registration. By comparison with the stateoftheart registration framework, our method gains more accurate registration results on both synthetic and real datasets. Besides, principal component analysis (PCA) is combined with correlation like Pearson and Spearman to form new similarity standards in 2D and 3D registration. Experiment results also show faster convergence speed.
The aim of image registration is to establish spatial correspondences between two or more images of the same/or different scene acquired at different times, from different viewpoints, and/or by different sensors. Usually the ability to capture complex and large image deformations is vital to many computer vision applications including image registration and atlas construction. The problem becomes more challenging when the object in the image or edge of the image undergoes severe deformation [
Take medical image registration for example, tissues and organs or body itself are prone to deform, move, and rotate under most circumstances. Most methods iteratively reach a satisfying overlap under specific mathematical criterions, maximizing or minimizing deformation energy as described in
In the twostep strategy, registration firstly begins with a global affine transformation for initial global alignment, take stateoftheart method FLIRT [
In order to address these above limitations and capture very complex and large deformations, we proposed a new approach for image registration based on a twolayer deep adaptive registration framework. Firstly, in the preregistration procedure, rotation, scale and translation extent between two images are obtained separately to achieve initial registration. This is quite different from traditional “one time calculated” affine matrix. For rotation parameter, a CNN classifier is trained offline in order to identify the level of current image rotation under sever distortion. Then scale and translation parameters are obtained. An optimum preregistration is calculated relating to above gained parameters. As for 3D images, a triplanar 2D CNNs [
The work flow of 2D image registration.
3D image preregistration structure.
The work introduced in this paper contributes in the following aspects:
Preregistration is improved through estimation of rotation, scale, and translation separately. A multisource CNNs is developed to precisely classify various levels of rotation under sever distortion and help identify rotation extent with high accuracy. For 3D images, triplanar 2D CNNs is constructed to estimate parameters of affine matrix. This new preregistration performs better than stateoftheart ELASTIX and SURFbased methods.
A twolayer adaptive registration framework is constructed and it performers better than other socalled twostep strategies.
PCA is used to extract valuable features and introduced into traditional similarity metric as SSD, Pearson, and so forth. For 3D images, triplanar 2D PCA is proposed to process 3D registration problem. Experiment results show that convergence speed is accelerated with the new similarity standard.
The proposed framework is tested under both synthetic and nature 2D and 3D images under various extent deformation. Experiment results show that our twolayer deep adaptive registration framework is able to identify the extent of rotation under sever deformation more precisely and correct large and complex distortions with high dice ratio than the comparative methods as it adaptively modify differences between images while others does not have any deep insight of deformation between images.
The rest of the paper is organized as follows. The whole architecture of the proposed twolayer adaptive registration framework for 2D and 3D images is illustrated in Section
The whole workflow of our 2D image preregistration compared with traditional method is illustrated in Figure
For rotation, firstly the CNNs classifier is trained offline in order to rectify rotation extent of image under sever deformation. The trained CNNs classifier can identify as much as 360 classes of rotation.
For scale, image size information is utilized to achieve consistency between fixed and moving image.
For translation, centroid of each image is calculated through statistical algorithm and translation is achieved by utilizing position information of centroids.
Different from the 2D image preregistration, CNNs classifier here is used for the slice location of one voxel (
Our strategy consists of firstly preregistration through CNNs classifier on both 2D and 3D images and then utilizing CNNs and demons algorithm adaptively in the following nonrigid registration and finally improving similarity metric for acceleration of registration convergence speed. In this section, we show our preregistration methodology by introducing our CNNs rotation classifier.
Above all, even though there are some good affine transformation methods based on expert knowledge, we still need a smarter one to adapt to more complex image processing tasks in the future.
The concept of deep learning was raised by Hinton and Salakhutdinov [
A CNN is a multilayer perceptron consisting of multilayers, each layer with a convolutional layer followed by a subsampling layer. Through locally connected networks, stationary features of natural images are exploited by the network topology. Firstly, images are sampled into small patches. In the convolutional layers, small feature detectors are learned based on these extracted samples. Then, a feature is calculated by convolution of the feature detector and the image at that point. In the sampling layer, the number of features is reduced to reduce computational complexity and introduce invariance properties. One significant property of features learnt by CNNs is invariance to translation, rotation, scale and other deformations. This twice feature extraction structure enables CNNs with high distortion tolerance when identifying input samples.
The goal of CNNs has no difference with other classification methods. They both focus on minimal total square error. Here we use
Here
For traditional full connection neural network, BP (Back propagation method) is used to calculate partial derivative to get the minimum square error, usually
Unlike
For the sample layer, the image feature numbers and styles are the same with prior layer except the feature size is scaled down. Each feature contains a multi and addition kind offset. The down sample size in this paper is 2 which means the next layer image size is shrink two times by both weight and height. So through combination of
As shown in Figure
An illustration of CNNs.
We adopt a tenlayer CNNs perceptron network (input and output layers are included; convolutional and sample layers are separately calculated). Key variables setting including kernel size and sample rate of different layers in proposed CNN is showed in Table
Key setting variables in CNN network.
Layer  Name  Kernel size or sample rate 

1  Input layer  None 
2  1st convolutional layer 

3  1st sample layer  2 
4  2nd convolution layer 

5  2nd sample layer  2 
6  3rd convolutional layer 

7  3rd sample layer  2 
8  4th convolution layer 

9  4th sample layer  2 
10  Output layer  None 
Our input images for training are difference images between fixed and moving image:
In the 19th century, Maxwell firstly introduced the concept of demons to illustrate a paradox of thermodynamics. In 1998, Thirion [
Vercauteren et al. [
(i) Choose
e.g., such that
(ii) Scale velocity field
(iii) Square
end for
ELASTIX to globally register
(i) Find updates
(ii) Smooth updates:
(iii) Update velocity field:
(approximated with
(iv) Smooth velocity field:
(v)
(i) Find updates
(ii) Smooth updates:
(iii) Update velocity field:
(approximated with
(iv) Smooth velocity field:
(v)
PCAspearman, and kendall.
Mathematically, PCA is defined as an orthogonal linear transformation that transforms the data to a new coordinate system to extract the greatest variance in the data set. As a result, it is able to avoid influences caused by image biases. Traditionally, PCA is used for dimensionality reduction to facilitate classification, visualization, communication, and storage of highdimensional data. Here, PCA is applied in both 2D and 3D medical and usual images, and the detected feature representations are used as inputs of similarity metric to achieve anatomical correspondence and assist optimization procedure in registration.
There are many classical metric measures, such as SSD, mutual information (MI), cross correlation (CC), pattern intensity and also their corresponding improved edition. In this paper, Pearson, Spearman, kendall, SSD together with extracted features by PCA are utilized as the new similarity metric. Pearson, Spearman and Kendall are concepts in statistics and are frequently used in data mining. Pearson is short for Pearson productmoment correlation coefficient (PPMCC), which was developed to measure the linear correlation between two variables. Spearman’s rank correlation coefficient is a nonparametric measure of statistical dependence between two variables. Both of their value is between +1 and −1. Spearman has no requirement on variables, while pearson insists variables meets normal distribution. Our utilization of log demons registration avoids the influence brought by this.
For 2D images of size
For 3D images of size
Calculation procedure of 3D PCArelated similarity metric.
Traditionally, the two step registration means an initial affine registration in the very beginning to coarsely rectify deformation and a following iterative registration to optimize a similarity metric achieving fine registration. We also adopt the two step strategy. But before the two step registration, we build up a classifier offline under CNNs training to identify rotation between fixed
Besides, at the end of each iteration, we utilized a new similarity metric by combining PCA with traditional SSD, pearson, and so forth, fully containing most important features of image. As a result, convergence speed is highly accelerated than traditional SSD without PCA while maintaining the same registration accuracy. Algorithm
In this section, the performance of the whole twolayer registration method is evaluated on both 2D and 3D images, synthetic and nature datasets. For comparison, traditional two step methods, ELASTIX and SURF related algorithm are used to preregistrate moving and fixed image. Then demons nonrigid registration is conducted. These methods are set as the baseline methods, which are denoted as ELASTIX+demons and SURF+demons. They all firstly use detected features initially to register images through affine transformation and original SSD as similarity metric under the diffeomorphic log demons framework. Our method is different from their framework both in preregistration and following nonrigid registration framework. For 2D images, firstly train a rotation classifier through CNNs and preregister moving image under large distortion and rotation, then together with scale and translation transformation, preregistration is done. For 3D images, a pretrained triplanar 2D CNNs is utilized to locate voxels, establishing feature correspondences. Finally, PCA related similarity metric iteratively registering images under diffeomorphic log demons framework.
The improvement of our twolayer method in registration accuracy, robustness to large deformation and rotation, and convergence speed are all assessed with ground truth data. Our matlab code is under Lombaert’s work [
Specifically, we downloaded brain and lung dataset from BrainWeb MRI Simulated Normal Brain Database [
Listing of data used in lung registration [
Pair  Data category  Pair  Data category  Pair  Data category 

1  InspExp  11  InspInsp  21  InspExp 
2  InspInsp  12  Warped  22  InspInsp 
3  InspInsp  13  4D  23  4D 
4  Ovine  14  InspExp  24  Ovine 
5  Warped  15  InspInsp  25  Warped 
6  Contrast  16  4D  26  Contrast 
7  InspExp  17  4D  27  InspInsp 
8  InspExp  18  InspExp  28  InspExp 
9  InspInsp  19  InspInsp  29  Ovine 
10  Ovine  20  InspExp  30  Warped 
A lot of registration has been evaluated on synthetic deformation images for algorithm test according to previous work [
An example of original sample image.
An illustration of sample image after sever distortion and large rotation.
Our test is carried out on computer of windows 7 system, with 8 GB RAM, i74770 CPU @ 3.4 GHz. Take BrainWeb data [
Performance of classifier.
Image size  Accuracy of classifier 36  Accuracy of classifier 90  Time (s each iteration)  

BrainWeb 
64 
99.86%  41.2  
28 
99.97%  6.17  
BrainWeb 
28 
99.56%  
Lena, 
28 
99.94%  2.4 
As we can see from Table
When only rotation exists as
However, when rotation and large deformation simultaneously appears in moving image as
On the contrary, our trained CNNs classifier and following scale and translation operation directly identified Lena image’s rotation angle accurately (
Preregistration result of ELASTIX and SURF method with only rotation on image.
Preregistration result of ELASTIX and SURF method with both rotation and large deformation on image.
Preregistration result of CNNs method with both rotation and large deformation on image.
After preregistration in Section
Figure
Preregistration result of SURF related method (b) and ELASTIX method (c) with both rotation and large deformation on image.
Preregistration result of CNN method with both rotation and large deformation on image.
Description of lung dataset can be found in Table
Lung slice 8 registration.
Lung slice 6 registration.
ELASTIX preregistration consuming time of each slice is shown in Figure
ELASTIX preregistration consuming time of the 30 slices of 4D lung dataset.
We select the cross section 2D image of the BrainWeb MRI 20 object, 10 for training and the other 10 for testing. From Figure
Brain slice registration.
For the 3D image registration part we focus on the brain atlases registration and give a CNNs 3D image registration method. We train brain atlas from 18 people’s 3D image data in BrainWeb Brain database by four steps: (1) Randomly select 10 label points by Normal distribution in 3D image. (2) Adjust 3D brain image and separate it to 2D image on three directions (
3D Sample voxel slice images (three slices,
Sections
both PCA related and original SSD methods converge regularly,
as a whole, PCASSD and PCAPearson methods perform best and converge faster than original SSD metric;
PCAspearman metric firstly converges fastest, but latterly it slows down;
Kendall metric performs worst compared with other metrics.
T2 Data: (a) fixed image, (b) moving image, (c) registered moving image, (d) difference between (a) and (c), (e) convergence extent of the first ten iteration.
Convergence extent
In this paper, a comprehensive method of constructing rotation classifier for images under severe deformation and rotation was proposed through CNNs. The classifier is able to identify distortion as much as 360 classes according to analysis of rotation angles. The classifier is utilized to assist our proposed twolayer deep adaptive registration framework. In each registration iteration, preregistration with identification of the trained classifier, scale, and translation operator and following diffeomorphic log demons registration facilitates each other one after another. Besides, proposed PCA related similarity metric helps achieve faster convergence speed. The new twolayer registration framework is compared with traditional diffeomorphic log demons registration in combination with stateoftheart ELASTIX and SURF preregistration. As baseline method carries out preregistration only once, large deformations cannot be fully modified. From tests on different image resources containing various kinds of both 2D and 3D, MRI, and CT datasets, our framework indeed outperforms the baseline method on both registration quality and convergence speed.
In the following work, we would combine other kinds of deep learning framework as independent subspace analysis (ISA) [
The authors declare that there is no conflict of interests regarding the publication of this paper.
This paper is supported by The Project for the National Key Technology R&D Program under Grant no. 2011BAC12B03, The Innovation Team of Beijing, The National Natural Science Foundation of China under Grant nos. 81370038 and 61100131, The Beijing Natural Science Foundation under Grant no. 7142012, The Beijing Nova Program under Grant no. Z141101001814107, The Science and Technology Project of Beijing Municipal Education Commission under Grant nos. KZ201310005004 and km201410005003, The Rixin Fund of Beijing University of Technology under Grant nos. 2013RXL04 and 2012RX03, and The Basic Research Fund of Beijing University of Technology under Grant no. 002000514312015.