Features Conduction Neural Response and Its Application in Content-Based Image Retrieval

A novel image representation is proposed for content-based image retrieval (CBIR). The core idea of the proposed method is to do deep learning for the local features of image and to melt semantic component into the representation through a hierarchical architecture which is built to simulate human visual perception system, and then a new image descriptor of features conduction neural response (FCNR) is constructed. Compared with the classical neural response (NR), FCNR has lower computational complexity and is more suitable for CBIR tasks. The results of experiments on a commonly used image database demonstrate that, compared with those of NR related methods or some other image descriptors that were originally developed for CBIR, the proposed method has wonderful performance on retrieval efficiency and effectiveness.


Introduction
Driven by the demand of search service market, the method of content-based image retrieval (CBIR) becomes a hot issue in the research field of pattern recognition and artificial intelligence for many years.The common ground for CBIR systems is to extract a signature for every image based on its pixel values and to define a rule for comparing images.The components of the signature are called features.An obvious advantage of a signature over the original pixel values is the significant compression of image representation.However, a more important reason for using the signature is to gain an improved correlation between image representation and image semantics.Actually, the main task of designing a signature is to bridge the gap between image semantics and the pixel representation, that is, to create a better correlation with image semantics [1].
The researchers have tried to use machine learning techniques to derive the similarity measure of the high-level semantics of the image from the existing image representations [2] or to cluster the images by self-organizing maps firstly and then to do retrieval [3] with the former such as bandletized regions through support vector machines (BRSVM) learning and online multiple kernel similarity (OMKS) learning [4][5][6] and the latter such as tree structured self-organizing maps (TS-SOM) and growing hierarchical quadtree selforganizing map (GHSOQM) [7,8].These methods are often used in combination with relevance feedback technology, which can enhance the retrieval effectiveness to a certain extent [8,9].However, these methods are very technical and often need a lot of training time which makes them difficult to be applied in practice.
On the other hand, the research of image representation for CBIR is constantly advancing, and many creative image representation methods are proposed.These methods can be broadly divided into two categories: the global feature based approach and the local feature based approach.For example, the edge histogram descriptor (EHD) [10], multiple texture histogram (MTH) [11], and color difference histogram (CDH) [12] are all based on the global characteristics of the algorithm.These algorithms to extract characteristics have good identification ability and robustness.However, we know that the overly complex feature representation is not always applicable to CBIR [1,13].At the same time, the local feature extraction method has also been a great concern [14,15].These methods focus on the feature representation of the image using the key points [16] or significant blocks in the image [17,18].How to determine the key points and the salient regions of the image are often dependent on the complex image segmentation technology.So far, however, the image segmentation technology is still one of the difficult problems in image processing and thus limits the application of these methods in CBIR.
In recent years, the human visual cortex neural science and the related hierarchical learning methods provide a new direction for studying of this problem.Research has shown that the human visual perception system has very good abilities of learning and generalization through a few examples, and these abilities are given by the hierarchical structure of the visual cortex [19][20][21].Based on the hierarchical structure of visual cortex, Smale et al. proposed the concept of derived kernel and the related theory of neural responses (NR) [22].They established a mathematical model to simulate the process of hierarchical processing information of the human visual system.In the NR model, the inner product defined by the neural response led to a similarity measure between images which was called the derived kernel.Based on a hierarchical architecture, a recursive definition of the neural response and associated derived kernel was given.The derived kernel can be used in a variety of application domains such as classification of images, strings of text, and genomics data.Theoretical analysis and experimental results show that the NR model is an effective feature extraction method.It has the potential to be further improved and enhanced in many applications [17,[23][24][25].Most important of all, the NR model has a key semantic component: a system of templates which can fuse the visual features and the semantic features of an image together and which is very important in CBIR.
However, because of the underlying neural response using the pixel value of the bottom subblock of image and then being passed to the upper level of the subblock, this algorithm is not suitable for CBIR.Because, in the task of CBIR, the image databases are usually very large and the resolution of the image is usually very high, the exhaustion algorithm of pixel to pixel is difficult to bridge the "semantic gap" in complex scene images and a huge amount of computation also limits its application in practice.In order to capture the high-level semantic feature of the image and at the same time improve the efficiency of retrieval, we propose the concept and the corresponding algorithm of features conduction neural response (FCNR) on the basis of the related theory of NR.
In the proposed method, we divide the spatial domain of an image in a simple way firstly and then obtain the local feature representation of the image by extracting the basic characteristics such as color, texture, and shape feature on the local area of the image.Next, we establish a hierarchical structure for the local feature representation of the image; at the same time, for each layer of the structure, a local feature template set is constructed.In the first layer of the hierarchical structure, local features are used to construct initial neural response and then these features are conducted to the senior subblocks layer by layer by the normalized inner product of the neural response.Finally, the image is expressed as a vector which is called FCNR, which can be used as an image representation for CBIR.The major advantages of the proposed method can be summarized as given here.
(i) The FCNR is derived from a local feature array rather than just using the pixel values, which overcome the drawback of overlearning problem in classical NR method.
(ii) The high-level semantic component of the image is introduced into the feature representation by the interaction between the subblocks of the image and the templates in every layer of the constructed hierarchical structure.
(iii) Without loss of the excellent identification and the invariance, the FCNR gets rid of the plight of the pixel to pixel exhaustion algorithm of the NR and reduces the computational complexity significantly, which is essential for the CBIR purpose.
The rest of this paper is organized as follows.In Section 2 the models of FCNR are constructed firstly.In Section 3 the image retrieval method based on FCNR is introduced.Then, in Section 4, we verify the effectiveness of the proposed method with extensive experiments on popular data sets and compare it with other CBIR methods.Finally, conclusions are drawn, and some future research issues are discussed in Section 5.

Feature Conduction Neural Response (FCNR)
The starting point of NR was to establish the mathematical model for visual mechanism of primate visual cortex [19,26,27].In order to simulate the hierarchical information processing of visual cortex, Smale et al. [22] divided the image domain into some nested blocks as shown in Figure 1.
The NR of an image was defined in bottom-up fashion based on the hierarchical architecture.As a feature vector of an image, NR can be used to define the similarity between images.The theoretical analysis and experimental results show that the NR model has good performance on discrimination and it was robust to transformations, which suggested that the learning process of NR model possessed the characteristics of the human visual system in a certain degree.

Notation and Preliminaries.
In this paper, we consider the case of a three-layer hierarchical architecture.
As shown in Figure 2, let regions , V, and  in R 2 ( ⊂ V ⊂ ) be pieces of the domain on which the patches or subpatches of images are defined.In the vision interpretation, these regions can be considered as receptive fields with different sizes.When we are working with gray scale images, an image or an image patch can be seen as a discrete function of two variables which take the corresponding gray values as the functional values.That is to say, an image of size  can be seen as a function defined on the domain .In this case, an image set consisting of the images defined on the domain  can be denoted by I  .For description convenience, we denote the cardinality of a set  as || in the rest of this paper.Accordingly, the images in I  can be denoted by Similarly, the sets of image patches of size  and size V can be denoted by I  and I V with respectively.As an example, Figure 3 shows the nested architectures and the relationship of the image and the image patches.In Figure 3 } with ℎ V : V →  can be defined.In this paper, the transformations are limited to translations and take the form ℎ() = +.Consequently, we can consider H  as a set of translations corresponding to moving a sliding window of size  in patch V and similarly H V as a set of translations corresponding to moving a sliding window of size V in patch .For example, given an image of size  × , if the step length equals one pixel, ( −  + 1) × ( −  + 1) image patches can be obtained by restricting the image on the given subpatch of size  × .
The following fundamental assumption related to image sets and transformation sets is supposed to be satisfied throughout this paper [22].
The last essential factor in NR model is series templates sets.The finite elements T  ∈ I  are selected as the first-layer templates and the first-layer template set In the same way, the second-layer template set } can be obtained.Obviously, templates are some image patches which can be seen as image elements frequently encountered and serve as building blocks to represent other images.Those templates implicate abundant higher semantic information of the images which can be used to promote identification ability in image retrieval.

The Construction of FCNR.
The first step of constructing FCNR is to segment the whole image in a simply way, which is different from other feature extraction methods based on region segmentation technology [8,14], here just using the perpendicular line network to segment the image into some small rectangular area of the same size.Then in each small region we extract features such as color, texture, and shape, and all these characteristics are represented by a vector.So, an image can be represented as a three-dimension character array.On the basis of this three-dimension array, the local characteristics are conducted step by step to higher layer following the same mode of NR, and finally the FCNR can be obtained.Specific process is given below.
For any image f ⊂ I  , we divide it into  ×  rectangular blocks f  ( = 1, 2, . . ., ;  = 1, 2, . . ., ) with the same size using perpendicular lines network; that is, We extract some visual characteristics on each rectangular block in the same way.The details of features extraction methods will be presented in the third part of this paper.Normalizing the vectors with these characteristics as components and denoting them by w  ( = 1, 2, . . ., ;  = 1, 2, . . ., ), we can get an array which is the local feature representation of the image f.It should be emphasized that these w  are all normalized vectors with the same dimension and each component of these vectors represents a feature of image block.An obvious advantage of normalization is said to be invariance to the brightness change of the image.If  characteristics are extracted from each rectangle image block f  , then w f is a three-dimensional array and can be simply represented as in which   ( = 1, 2, . . ., ;  = 1, 2, . . ., ;  = 1, 2, . . ., ) denotes the th feature of the image block in th row and th column of the image f.In this circumstance, the related notations and their meanings introduced in the previous section should be adjusted accordingly.The set of local feature representations of the images in the set I  is denoted by W  * ; that is, where  * denotes the area of size  × .Accordingly, we use the sets of patches of size  and size V, respectively.It is needed to emphasize that the elements of previous W  and W V are obtained by sampling from the rows and columns of the array w f by moving windows rather than directly sampling from the image f.
For example, assume that f is an image of size 256 × 384.It is divided into 8 × 8 square subblocks and we extract 6 characteristics from each subblock.At this point, the local feature representation w f is a three-dimensional array of size 32 × 48 × 6 and the size of  * is 32 × 48.If -size is 15 × 15 and V-size is 21 × 21, then w  ∈ W  and w V ∈ W V are threedimensional arrays of size 15 × 15 × 6 and size 21 × 21 × 6, respectively.
The notations H  and H V still denote the transformation sets of the transformations from  to V and V to  * .The template sets T  ⊂ W  and T V ⊂ W V are obtained from w f in a similar way by moving the window on V and  * , respectively.The elements in these template sets denoted by t   ( = 1, 2, . . ., |T  |) and t V  ( = 1, 2, . . ., |T V |) are also some three-dimensional arrays.Now we can define the feature conduction neural response.Firstly, assume w V ∈ W V and, for any ℎ  ∈ H  , we have w V ∘ ℎ  ∈ W  according to Axiom 1. Taking a template t  ∈ T  , we call neural responses of w V to the template t  , where (w V ∘ ℎ  ) ⋅ t  denotes a three-dimensional array that is obtained by multiplying the corresponding elements of two threedimensional arrays (w V ∘ ℎ  ) and t  , and ((w V ∘ ℎ  ) ⋅ t  )  represents the element of (w V ∘ ℎ  ) ⋅ t  in th row and th column of th page.When t  take over the template set T  , we can get a |T  | dimensional vector which is called the first layer of neural response of w V to the template set T  .After normalization, it is denoted as NV (w V ); that is, where ⟨⋅, ⋅⟩ is inner product of two vectors in the usual sense.Next, set w  ∈ W  * , and, according to Axiom 1, we know that w f ∘ ℎ V ∈ W V .For any template t V ∈ T V , we call neural responses of w f to the template t V .When t V take over the template set T V , we can get a |T V | dimensional vector which is called the second layer of neural response of w f to the template set T V .Finally, for any image f ∈ I  , we define N  * (w f ) as features conduction neural response (FCNR) of the image f ∈ I  and it is denoted by N(f); that is, We add some remarks.
(i) The FCNR of an image is a vector whose dimension is equal to the number of templates in the second layer and has nothing to do with the dimension of the image itself.Therefore, in the process of image processing, we can transform all the images into vectors with the same dimension, regardless of the idea that the sizes of the images are the same or not.
(ii) Due to the use of image low-level visual features in the underlying layer, FCNR model effectively overcomes the shortcomings of pixel to pixel exhaustion algorithm of the NR model.At the same time, the low-level visual features of image are conducted to the upper layer by the interaction between the subblocks of the image and the templates in every layer of the constructed hierarchical structure and make the FCNR contain high-level semantic elements of the image and this is very important in the task of CBIR.
(iii) From the perspective of learning theory, the feature extraction method of the FCNR belongs to the category of unsupervised learning [2,19,22], and the hierarchical structure is introduced to do deep learning for the low-level visual features.

Computational Complexity Analysis.
In image retrieval task, we often have the real-time requirements.As a result, the complexity of the algorithm is very important when constructing the feature representation for CBIR.Here we analyze the computational complexity of the proposed method in this paper.
Consider the case of the  layers hierarchical architecture as shown in Figure 1.We define a set of global transformations where the range is always the entire image domain  rather than the next larger patch recursively setting for any 1 ≤  ≤  − 1, where    contains only the identity { :  → }.In the above formula, we denote the transformation set from patch of th layer to the next larger patch by   .
We denote the template in the th layer by   and ignoring the cost of normalization and of precomputing the neural responses of the templates, the number of required operations to export the NR is given by where we denote for notational convenience the cost of computing the initial kernel by | 0 | [22].
Because the image is preprocessed, |   | in (12) in the calculation of the FCNR will be less than that in the calculation of the NR.This will eventually lead to the fact that  of FCNR is far less than that of NR.In order to illustrate this point intuitively, we give a specific example.
Suppose f is an original image of size 256 × 384.In the calculation of the NR of f, we take the -size as 112 × 112 pixels and the V-size as 172 × 172 pixels.On the other hand, the image f will be divided into the square subblock of size 16 × 16 and 14 features will be extracted from each block before the calculation of the FCNR of f.In this case, we take the -size as 7 × 7 and the V-size as 11 × 11 which correspond to the 112 × 112 pixels and the 172 × 172 pixels in the original image.We also assume that the number of templates selected in each layer of the two methods is equal.Note that  = 3 in this paper; we can calculate  of NR and FCNR using (12), respectively.The values of the parameters in (12) and the results  of NR and FCNR are listed in Table 1.From Table 1, we can see that the number of operations of FCNR is less than one five-thousandth of the number of operations of NR.This means that the computational complexity of NR is much higher than that of FCNR.In fact, for the image with high resolution, it is not practical to directly calculate the NR.Usually, we will do some simple preprocessing for image before calculating its NR.Therefore, the difference of computation is not so great in practice (see Section 4).

CBIR System Based on FCNR
For a given image library, we divided all the images in the library into rectangular blocks with appropriate size using mutually perpendicular lines.In each rectangular block, the low-level features are extracted in the same way and thus the local feature representation of an original image is obtained.The local features representations of all images in the library will constitute a local characteristic database.So, we can construct a hierarchical architecture for the local feature representation and the template sets of every layer of the architecture can be obtained using the local characteristic database.On this basis and using the algorithm as mentioned in Section 2.2 to compute FCNR for all images in the library, we can establish a FCNR library associated with original image library.If a proper similarity measure is defined on the feature space of FCNR, then the image retrieval can be carried out.
When the user enters a query image for the relevant images retrieval, first of all, the user calculates the FCNR of the query image according to the above-mentioned steps and then calculates the similarity between the query image and all images in the image database according to the defined similarity measure.Finally, the user sorts the image in the library in decreasing order according to the similarity, and some numbers of images which are arranged in the top are output to the user.The flow diagram of CBIR based on FCNR is shown in Figure 4.

Local Low-Level Feature Extraction.
In this paper, some simple and robust methods are used to extract fourteen basic low-level features, including color feature, texture feature, and shape feature, for the image block.
Similar to some CBIR related literatures, we use the wellknown YCbCr color space in the extraction of color features [1,8].In this color space, the luminance information is stored with a single component , and the color information is stored with two color difference components  and .We calculate the mean and standard deviation of , , and  for each subblock, among which the mean values are denoted as  1 ,  2 , and  3 and the standard deviations are denoted as  4 ,  5 , and  6 .In this way we can get six color features (for monochrome images, only two brightness features can be extracted).
Next, we will use Haar wavelet transform to extract texture features from the  component of the rectangular image block.First of all, we will take Haar wavelet transform on each 4 × 4 subblock in the rectangular image block and four 2 × 2 matrixes can be obtained, which include a sampling approximation and three detail matrixes in three directions (horizontal, vertical, and diagonal).Set the three detail matrixes to be respectively, and let After wavelet transformation, we just assign the three variables to each pixel of the rectangular image block.Then, we can compute the averages and standard deviations of the three variables , , and  for each rectangular image block and denote the averages as  7 ,  8 , and  9 and the standard deviations as  10 ,  11 , and  12 , respectively.Note that the standard deviation  4 of the  component of the rectangular image block has been obtained; we take the thirteenth feature as which is the smoothness of the image block and it reflects the relative smooth degree of brightness in the corresponding region.The last feature  14 is the entropy of the  component of the rectangular image block; that is, where () is the gray level histogram of the  component of the rectangular image block and  is the number of possible gray series.Entropy is a measure of the randomness of the image elements [13].
In this way, the 14 features mentioned above are combined together; we can get low-level visual features representation of the rectangular image block, and we denote it as g; that is, After obtaining the low-level visual features representation of all rectangle blocks of an image, we can get the local feature representation of the whole image as shown in (2).

The Similarity Measure.
Retrieval accuracy is not only dependent on a robust feature representation, but also dependent on a good similarity measure.In order to highlight the advantages of FCNR in the image feature representation, we adopt a very basic and very natural way in this paper; that is, the similarity between two images is defined as the normalized inner product of their FCNRs.Specifically, for any of f ∈ I  , its FCNR is a vector of , where |T V | represents the number of the templates in the second layer.To normalize N(f), we can obtain and the similarity of two images f, f * ∈ I  can be defined as It is not difficult to see that the mode of definition of image similarity comes down in one continuous line of the definition of similarity of image patches at all layers in the process of construction of FCNR.Thus, when the user inputs the query image q, the system firstly computes N(q) according to (10) and N(q) according to (18) and then calculates the similarities (q, f  ) ( = 1, 2, . . ., |I  |) of the query image and all images in the database according to ( 18) and (19).Finally, descending sorts the images f  based on the similarity (q, f  ) and outputs the top  images to the user as the query result, where the parameter  is specified by the user according to the query requirements.

Experiments
In this section, we will discuss simulation experiments to demonstrate the performance of the proposed method in image retrieval.Firstly, the evaluation standards of the performance of CBIR system are given and the appropriate parameters of FCNR method are selected.Then, we will compare the performance of the FCNR with the classical NR and the local neural responses (LNR) [17] in image retrieval.Finally, we also compare the proposed method with several feature extraction methods which were originally designed for image retrieval, including the benchmark method and some relatively new methods.
The image library used in the experiments contains 1000 images with size 256 × 384 or 384 × 256 selected from COREL database which is a general-purpose image database including about 60,000 pictures [1].These selected images have ten classes, each of which has a semantic name and contains 100 pictures.For the sake of clarity, these 1000 images are numbered from 0 to 999.The semantic name and the corresponding number range of each class are listed in Table 2 [8].We randomly selected four pictures from each class and show them in Figure 5.
It is necessary to emphasize that the templates used in the experiment for calculating FCNR are randomly intercepted from the local feature arrays of the image (and not from the original image) by moving or rotating window with specific size.The experiments were conducted on a computer with 4 GB random access memory and 2.60 GHz Intel(R) Core (TM) i5-3230 M processor, and the code was implemented in MATLAB in which the image processing toolbox functions are called [13].

Evaluation Standards and Parameter Determinations.
There are a variety of ways for evaluation of the performance of retrieval.In this paper, we mainly use the recall-precision graph which is the most commonly used in community of image retrieval to evaluate the performance of FCNR.Precision  is defined as where  is the number of retrieved images and   is the number of relevant images in the retrieved images.Recall  is defined as where  is the number of all relevant images in the library.An optimal recall-precision graph would have a straight line; that is, precision is always at 1. Typically, when recall increases, precision decreases accordingly.However, the results of one or two times of retrieval can not fully exhibit the advantage and disadvantage of an algorithm, and it is not convenient to compare with other methods.Therefore, we randomly selected 50 images from the image database to form a set of query images.For fixed recall, averaging the precision of the 50 time queries, we can obtain the recall-average precision graph which is a relatively reliable evaluation standard.In general, the high average precision and high recall mean that the algorithm has good performance.This means that the algorithm whose recallaverage precision graph is over the right upper is better.In addition, due to the requirements of real-time in CBIR task, the shorter the query time the better the performance of the algorithm.
In our experiments, the images of size 384 × 256 are transformed into images of size 256 × 384 firstly through the rotation, and then all the images are divided into the square subblock size of 16 × 16, which is a total of 16 × 24 blocks for each image.For each image block, we extract local features by the methods described in Section 3.1 and we can get a three-dimensional array of 16 × 24 × 14 for every image.The templates sets are constructed by randomly extracting 500 patches of -size and 300 patches of V-size, respectively, from the local feature arrays of some 10 images per class.In the process of constructing FCNR, two very important parameters are the size of  and V.In order to select the proper sizes, we have carried out a series of experiments for different sizes of  and V.
Figure 6 shows four recall-precision graphs corresponding to four different patch sizes.In these experiments, the number of retrieved images  is taken as 30.It is not difficult to see from Figure 6 that the sizes of , and V are too small or too large to get good results.In contrast, when the -size is 7 × 7 pixels and the V-size is 11 × 11 pixels, the retrieval results are the best.Therefore, we use these sizes in the experiments shown in Figure 6.
Figure 7 shows the top 20 images of two queries.The image in the front of the list is the query image and the figure at the bottom of each image is the number of the image in the library.As seen from Figure 7, the proposed CBIR system based on FCNR was done efficiently on the COREL image library.For the query semantics of "flower," the outputs of the top 20 images are all the theme of flowers, and these flowers take different color, size, background, and forms.This suggests that the high-level semantics of "flower" can be correctly identified by the system.It is worth mentioning that the two images of number 674 and number 677 are rotated before the local feature extraction and still can be retrieved, indicating that the FCNR algorithm preserves the rotation invariance of NR [24,27,28].For the query semantics as "the elephant," the top 13 images of output are relevant to the query image, and among the top twenty images, only four images are inconsistent with the query image (the added border images in Figure 7).Next, we will compare the performance of FCNR with the classic NR method and the LNR method in CBIR [17].
Local neural response is an improved version of the neural response, which uses sparse techniques in the representation of image and its subblocks.Before calculating NR and LNR, it is necessary to do preprocessing to the images as mentioned in Section 2.3.In order to be relatively fair, we adopt the approach which makes the algorithm perform best as reported in the relevant literature: we convert images into 60 × 90 gray images and the -size is 15 × 15 pixels and the Vsize is 21 × 21.We use a similar manner to select the template in the three methods, that is, to intercept 500 templates of -size and 300 templates of V-size randomly from the gray images or the local feature matrix of images.Table 3 shows the time consumption of the three different methods at different stages, and the retrieval performance is shown in Figure 8.In these experiments, the number of retrieved images  is still taken as 30.
It can be seen from Table 3 that both the hierarchical feature learning time and the query time of FCNR-based methods are significantly shorter than the other two methods.This is mainly because the latter two methods use exhaustion algorithm of translations of pixel to pixel.In particular, the LNR method, which introduces the solution of the quadratic optimization problems, is the most time-consuming method.Therefore, although the retrieval method based on FCNR can take some time on the extraction of local features, the time of  learning the FCNR and the query time can be greatly reduced.This point is very important for the image retrieval task, because the real-time performance is the basic requirement in image retrieval [29].
Besides that, it is not hard to see from Figure 8 that the retrieval precision of the method based on FCNR is better than those methods based on NR or LNR.The main reason is that the FCNR-based method effectively overcomes the shortcomings of comparison of pixel values in the underlying image blocks just as in the NR and LNR methods.At the same time, the loss of color information also affected the performance of LNR and NR to a certain extent.By the way, the results based on LNR are better than that based on the NR.This is mainly because the localization and the sparse encoding in the LNR method make the image on the target location have a high value of neural response.

Comparison with Some Other Methods Proposed for
Image Retrieval.Finally, we will compare FCNR with some other methods originally proposed for image retrieval, which include the edge histogram descriptor (EHD) method [10], the color difference histogram descriptor (CDH) [12], and the latest methods such as the error diffusion block truncation coding (EDBTC) [18] and the bandletized regions through support vector machines (BRSVM) [6].
As a benchmark, the EHD was used for texture image retrieval.In order to be fair, we extract features from the three components of R, G, and B of the image, and the edge intensity of the image block over 11 is used for the calculation of the histogram.Each component corresponds to an 80-dimensional feature vector, so that the resulting EHD feature vector is 240 dimensions.In the CDH representation we use YCbCr color space and the color and the orientation parameters are taken as 90 and 18, which is the relatively good performance parameter configuration as reported in the literature [12].Thus the image of the CDH is expressed as a 108-dimension vector.The EDBTC produces two color quantizers and a bitmap image, which are further processed using vector quantization to generate the image feature descriptor.There are two features that are introduced in EDBTC, namely, color histogram feature and bit pattern histogram feature, to measure the similarity between a query image and the target image in database.The BRSVM method was prosed to overcome drawbacks and limitations of this traditional image segmentation technology.In BRSVM method, a bandelets transform based image representation technique is presented, which reliably returns the information about the major objects found in an image, and support vector machine is applied for image retrieval purposes.Similarly, we randomly select 50 images from the image database to form a query image set , and the output image number is set to 10, 20, and 50, respectively.Table 4 lists the average precision and recall in the three different cases corresponding to the five methods, and Figure 9 shows the PR curve when the number of output images is set to 50.
It is not difficult to see from Table 4 and Figure 9 in the COREL-1000 image database that the performance of proposed method not only is significantly better than

Conclusions
This research has been devoted to construct a new image feature representation of FCNR to be used for CBIR.It preserves the excellent characteristics of NR and LNR, such as being invariant to translation and illumination and robust to local distortion and clutter.More importantly, it also takes both visual feature and semantic feature into account in image recognition.The experimental results on the COREL-1000 image database have shown that, compared with NR and LNR, FCNR is more suitable for image retrieval tasks.In addition, the proposed method achieves a higher retrieval accuracy compared with other methods originally proposed for image retrieval in the COREL database.We attribute the effectiveness of the proposed method to both the local feature extraction and the hierarchical architecture which is used in deep learning of low-level visual features and take the highlevel semantics of the image into its feature representation.
Although both theoretical analysis and experimental results show that FCNR is an applicable representation of image for CBIR, there are still some problems to be further studied.
(i) The template sets and the number of selected templates in this paper are determined according to experience; we do not pay any attention to qualitative and quantitative analysis.How to select the more representative templates and what are the optimal numbers of templates which are needed in the construction of FCNR are still important issues for future work [30].
(ii) In this paper, we used the simple inner product kernels as the similarity measure.In order to obtain a better retrieval performance, how to combine FCNR and the mainstream of similarity learning technology in recent years is worth studying problem.
(iii) As is well known, the relevance feedback technique plays an important role in image retrieval [9], and how to introduce relevance feedback into the algorithm proposed in this paper is another interesting subject.

Figure 1 :
Figure 1: The  layers of nested architectures.

Figure 2 :
Figure 2: The three layers of nested architectures.

Figure 3 :
Figure 3: The hierarchical relationship of image and image patches.

Figure 4 :
Figure 4: The flow diagram of CBIR based on FCNR.

Figure 5 :
Figure 5: Example of images in COREL database.

Figure 6 :
Figure 6: Recall-precision graphs of different patch sizes.

Table 2 :
Ten classes of 1000 experimental images.

Table 3 :
Time consumption in three different methods.

Table 4 :
Average precision and recall of four different methods in different situations.MPEG-7 standard feature extraction method such as EHD but also has a stroke above those latest methods, such as EDBTC and BRSVM.From our point of view, this is mainly due to two reasons: the first reason is that, based on the hierarchical structure, FCNR is the result of deep learning on the low-level features of the image and the second reason is that the high-level semantic elements of images are integrated into the feature representation of FCNR by using the templates sets. the