Performance Evaluation of Color Descriptors under Illumination Variation

Color descriptors, which involve the extraction of color information that is robust to illumination variation, are indispensable for accessing reliable visual information as illumination variation is inevitable in many practical cases. There has been many color descriptors proposed in literature, but the performance of different color descriptors in different scenes under illumination variation and the influence of the surface characteristics have not been investigated. In this paper, we first systematically introduced the theoretical basis of color descriptors, categorized the existing color descriptors according to the theoretical basis, and then compared the performance of different color descriptors utilized for image recognition and image retrieval tasks on both the indoor and outdoor image datasets. We adopted the recognition rate and normalized average rank as the evaluation criteria to measure the performance of color descriptors. Experiment results show that the color moment invariants (CMI) provide the optimal balance between the performance and dimensions in most tests, and color descriptors derived from physical reflectance models are more suitable for object recognition and image retrieval. We also concluded the best color descriptors for each kind of scene and surface characteristics.


Introduction
Color information is an important and efficient cue obtained by optical image sensors. However, in computer vision field, as illumination variation occurs in most of the real-world scenes, the color-based information will change and thus may easily result in errors in the follow-up applications. Consequently, to obtain a satisfactory performance in digital image processing, achieving color constancy in computer vision is of great importance. Color constancy is usually considered the effect whereby the perceived or apparent color of a surface remains constant despite changes in the intensity and spectral composition of the illumination [1]. To eliminate the influence of illuminations, illumination estimation [1][2][3] and illumination-independent color descriptors are two major approaches of achieving color constancy.
Color descriptors, which can represent the color features of an image, have been proved to be successful in applications such as image registration [4,5], object recognition [6][7][8], face recognition [9][10][11][12][13], human detection [14], image retrieval [15][16][17][18], video retrieval [19,20], and image classification [21,22]. Hor [23] proposed an image retrieval method based on the combination of local texture information of two different texture descriptors robust to rotation, which is constructed to improve computational efficiency. Khwildi and Ouled Zaid [24] proposed a method to improve the accuracy of high dynamic range image retrieval, whose feature extraction is based on a combination of the HSV histogram and color moment. Jacob et al. [25] proposed an interchannel color texture mode based on deep learning algorithm, which gives the interchannel color texture information of an image. It considers the unique channel information and its relationship with the adjacent pixel information in the opponent space. The experimental analysis using this descriptor is much better than the previous work in the context of image retrieval and face recognition. Kumar et al. [26] focused on the 3D sign language recognition in human action recognition and proposed a new color-coded topographical descriptor, which combines joint distances and angles computed from joint locations. Based on the improved Otsu threshold algorithm, Wei et al. [27] proposed a new automatic fruit extraction method for a fruit picking robot, which uses a new feature in OHTA color space. In order to improve the picking accuracy of fruit-picking robot in a threedimensional space, Wu et al. [28] proposed a new improved descriptor combining color features and three-dimensional geometric features.
Color descriptors are various and can be applied in different scenes, but their robustness to illumination variances is different. Some color descriptors are derived by mapping images to a different color space or via a physical reflectance model. In addition, the moment functions such as Hu moment [29] and Zernike moment [30] are also used as a theoretical basis for obtaining color descriptors. A recent work has concentrated on divisive information-theoretic feature clustering (DITC) [31] to deduce color descriptors. In addition, a color-texture descriptor against impulse noise [32] has been proposed, the descriptor is robust to rotating and impulsive noise, and its calculation is simple. Accordingly, it is necessary to analyze the illumination's robustness of existing color descriptors effectively.
In this paper, we evaluated the performance of different color descriptors for image recognition and image retrieval on both the indoor and outdoor image scenes under different illumination conditions. We compared a series of color descriptors, which are influential in the field of pattern recognition and compared them using the same evaluation scenario and the same image data. The evaluation criteria are the recognition rate of object recognition and the normalized average rank of image retrieval.

Related Work.
There are some studies that conducted comprehensive evaluation of the strengths and the weaknesses of different descriptor approaches which help in identifying future research directions [33] in computer vision. Mikolajczyk and Schmid [34] compared descriptors computed on interest regions extracted with scale and affine-invariant detectors for image matching and object recognition; the performance is comprehensively compared for affine transformations, scale changes, rotation, blur, jpeg compression, and illumination changes. It has been proved that descriptors SIFT-style [35] descriptor outperforms other methods in their paper. In the context of image classification, van de Sande et al. [36] evaluated invariance properties and distinctiveness of several color descriptors. The performance is measured by mean average precision using the SVM algorithm [37] and the position in rankings for image benchmark. Burghouts and Geusekbroek [38] compared the discriminative power and invariance of gray value invariants to that of local color invariants in the context of image matching, where the discriminative power is measured by determining the recall of the regions that are to be matched and the precision of the matches. Setkov et al. [39] discussed the invariance properties of some existing color descriptors in that application for nonintrusive geometric compensation. The performance is mainly measured by two indicators: detection rate, which is the ratio between the number of detected matches, and the number of feature points in the image and FM precision indicating how match-ing quality deteriorates in matching images. Bianconi [40] proposed a general and extensible framework to classify color texture, and the existing methods for color texture representation are compared both theoretically and experimentally. They pointed out that a separate color and texture processing can achieve a balance between performance and limited dimensions. In order to investigate the effects of illumination variations on color texture features, Cusana et al. [41] evaluated and compared several color texture descriptors using a new texture database RaWFooT, they pointed out that traditional texture descriptors have limited robustness to illumination variations, and CNN-based descriptors [42] outperform the handcrafted traditional and objectoriented features in the context of texture classification. Ma et al. [43] focused on the classic methods in the field of image matching in the past two decades, as well as the methods based on deep learning in recent years, which covers feature detection, feature description, feature matching, image registration, stereo matching, point set or point cloud registration, and other related subfields. For the traditional feature description algorithm, they pointed out that extracting more accurate and repetitive features, more significant and distinguishable feature descriptors are the future development trend. In order to remove the mismatches from image features, Ma's team has done a lot of researches; they proposed a locality preserving matching method, which preserves the potential real matching in local neighborhood [44]. They also try to transform feature matching into a spatial clustering problem, which achieves promising performance [45].
However, very little work focus on how color descriptors are affected by different kinds of surface characteristics with illumination variations. In addition, there is little introduction and classification of the theoretical basis for deriving color descriptors.
In this paper, we aim to analyze the robustness of various descriptors to illumination variations and the influence of object surface characteristics on image recognition and retrieval. The comparison for practicability of different color descriptors is carried out in the context of image recognition and retrieval. We believe that this manuscript may serve as a handbook for those who need to select an appropriate method for a specific scenario (indoor, outdoor scenes or different surface characteristics), as well as a basis for development of new color descriptors.

1.2.
Overview. This paper is organized as follows: In Section 2, the taxonomy of color descriptors based on a theoretical basis and a brief introduction of selected color descriptors are presented. Section 3 describes the experimental setup as well as our evaluation criterion and image data. Finally, a discussion of results is given. (i) Color spaces based on physics and technology, which includes the color space based on the trichromatic theory (RGB, CMY (K), etc.), luminancechrominance color space (YIQ, YUV, etc.,), and independent axis color model. The RGB color space is the most common color space, which is mainly used in the TV and computer color display system. It uses red, green, and blue components to represent the image. The CMY (k) color space is often used in printing and publishing. The YUV and YIQ color spaces, which use luminance and chrominance components to represent color image, are commonly used to represent color image in the TV system (ii) Uniform color spaces (Lab, Luv, etc.,) which are important applications for comparison of similar colors. Lab and Luv color spaces, using the digital method to describe the human visual perception, are color systems based on physiological characteristics (iii) Color spaces based on perception (HIS, HSV, opponent, etc.,) quantify the human color perception using intensity, hue, and saturation. The HSV color space is created according to the intuitive characteristics of color, which divides the color signal into three attributes: hue, saturation, and brightness. The principle of the HIS color space is similar to HSV Several color descriptors are derived from the selection or establishment of different color spaces. Li et al. [8] proposed the central color coordinate system and edge-based color coordinate system based on the diagonal-offset model. It is of great importance for selecting a suitable color space or proposing a new color space with color invariance to improve the robustness of color descriptors to illumination variation.

Physical Reflectance Model.
Image information is modeled by means of the Kubelka-Munk theory [47] for colorant layers. The physical reflectance model resulting from the Kubelka-Munk theory is given by [48]. The reflected spectrum in the viewing direction is given by Eðλ, x ! Þ: ð1Þ where x ! denotes the position at the imaging plane and λ the wavelength. Further, eðλ, x ! Þ denotes the illumination spectrum and ρ f ð x ! Þ the Fresnel reflectance at x ! , and the material reflectivity is denoted by R ∞ ðλ, x ! Þ.
The other common physical illumination model is the Lambertian model: where x ! denotes the position at the imaging plane, λ the wavelength, and ω the range of visible light; eðλÞ represents the spectral distribution of the light source; Sð x ! , λÞ denotes the physical reflectivity of the surface of the object at the point x ! the light of wavelength λ; and photographic function of imaging device cðλÞ = ðRðλÞ, GðλÞ, BðλÞÞ T . Light scattering refers to the phenomenon that a part of light deviates from the original direction when it passes through the inhomogeneous medium. The condition of natural light illumination is mostly the mixture of direct light and scattered light. Shafer [49] added scattered lighting into the Lambertian model to simulate natural light conditions, so that the model could be more universal: where ϕðλÞ indicates scattered lighting. This improvement is more suitable for outdoor scenes. Similarly, Shafer obtained a dichromatic reflection model [49] based on the Kubelka-Munk model (KM).
Many new color descriptors are derived from these physical reflection models by adding other influence indicators. The algorithm derived from the physical model has photometric invariance, but it sacrifices the discriminant power of the descriptors.

Mathematical Model.
Moments are widely used in image processing, and the moment sets that are calculated from an image describe the global features of the shape of the image and provide much information about the different types of geometric features of the image. Hu [29] proposed seven moment invariants of the first order to third order, and a series of improved algorithms for Hu moments are applied to object recognition and image retrieval [50,51]. Hu moments have been proved to be robust to translation, rotation, and scaling. According to different color channels and Hu moments, color moment invariants can be deduced, which has good robustness to illumination change. In addition, color moment invariants can be obtained by the Lie group approach [52]. Note that the color moment invariants' dimension is small so that they are suitable for image recognition and retrieval.

Information Theory.
It is well known that a pixel value, which is captured in a color space that can be mapped to the same value in the photometric invariance color space, can sacrifice the discriminative power of the color descriptors 3 Journal of Sensors derived from the physical reflectance model. In order to find the balance of illumination invariance and discriminative power, a divisive information-theoretic feature clustering algorithm (DITC) proposed by Dhillon [31] is used to construct color descriptors with high discriminative power. The DITC algorithm introduces the concept of information entropy and Kullback-Leibler (KL) divergence information theory. The color descriptors based on the information theory are employed to find an optimum clustering approach in the color space with higher discriminative power.

Brief Introduction of Current Color
Descriptors. In the following, color descriptors are presented and we categorize them according to the four theoretical foundations mentioned above.

Color Descriptors Based on the Selection of the Color Space
(1) Histogram.

(i) RGB histogram
The RGB histogram is a combination of three 1D histograms based on the channel of R, G, and B in the RGB color space. The spatial structure of the RGB color space is not consistent with people's subjective judgment of color similarity, and it possesses no invariance to illumination changes.
(ii) rg histogram It is the normalization process of the RGB color space, and the chromaticity components r and g represent the color information in images: Due to the normalization, the components r and g have been proved to be invariant to scaling, light intensity changes, shadows, and shading.

(iii) Hue histogram
When mapping images to the HSV color space, it is proved that the hue becomes unstable near the gray axis. Weijer et al. [53] have discovered that the certainty of the hue is inversely proportional to saturation so that the hue histogram is made more robust by weighting each sample of the hue by its saturation. Only the color channel H [53] is scaleinvariant and shift-invariant with respect to light intensity.

(i) Opponent histogram
The opponent histogram, in which images are mapped to the opponent color space, combines the three 1D channels of histograms: where the components O 1 and O 2 represent color information, and the component O 3 denotes intensity information. van de Sande et al. [36] have proved that the components O 1 and O 2 are shift-invariant with respect to light intensity. The intensity channel O 3 has no invariance properties.
(i) SIFT Lowe [35] proposed the SIFT to describe local features using edge orientation histograms based on points of interest. SIFT features possess invariance properties to rotation, scaling, and light intensity variation. However, the SIFT is not invariant to light color changes due to the combination of R, G, and B channels in the intensity channel. Note that SIFT is extensible and can be easily combined with other forms of feature vectors.
(ii) RGB-SIFT RGB-SIFT is the combination of the SIFT computed in the R, G, and B channels separately. It has been proved that RGB-SIFT is scale-invariant, shift-invariant, and invariant to light color changes.

(iii) HSV-SIFT
The method of calculating the SIFT over three channels of the HSV color space independently has been proposed by Bosch et al. [54]. The descriptor has 128-dimensional features for each channel. It is known that the H channel is scale-invariant and shift-invariant with respect to light intensity. However, the combination of HSV channels provided low invariance.
(iv) Hue-SIFT van de Weijer et al. [53] introduced a concatenation of the hue histogram with the SIFT descriptor. Weighting the hue histogram with saturation solves the problem of instability near the gray axis for HSV-SIFT. The Hue-SIFT possesses invariance to intensity changes.
(v) Opponent-SIFT Opponent-SIFT is constructed from the transformation between the RGB color space and opponent color space (5); similarly, the o 1 and o 2 channels describe color information while the o 3 channel represents intensity information in images. In [36], opponent-SIFT shows the best result in the context of image classification.
(vi) C-SIFT C-SIFT [38,55] is derived from the normalization of 2D opponent color space which offsets the intensity information in the o 1   Journal of Sensors (vii) rg-SIFT According to the normalization of the RGB color space (4), the rg-SIFT can be regarded as an independent computation of SIFT features in the r and g chromaticity components.
(i) Robust hue descriptor (HUE) Weijer and Schmid [55] proposed to use the histogram of the hue that is linearly transformed from the RGB color space to describe image patches: Weighting the influence of the histogram by saturation of the corresponding pixel to make the hue more stable, the hue descriptor is invariant with respect to lighting geometry and specularities when assuming white illumination.
(ii) Opponent derivative descriptor (OPP) Weijer and Schmid [55] proposed to represent the image patches by a histogram of opponent derivative: where O 1x and O 2x denote the spatial derivative in the color information channels. The descriptor weighted by has been proved to be invariant with respect to specularities and diffuse lighting.

Color Descriptors Based on the Physical Reflectance Model
(1) Color Descriptors Derived from Lambertian Model. Funt and Finlayson [56] and Gever and Smeulders [57] proposed photometric invariant images' index on the basis of the Lambertian model. However, the color descriptors mentioned above is susceptible to blur. Weijer and Schmid [58] provided a construction scheme of color invariant descriptors to counter the impact of image blur. However, these descriptors all depend on the edge information of the image so that they will lose much of the color information for images with sparse edges.
Geusebroek and Boomgaard [48] proposed a framework of five-color invariants (H, C, W, N, and E) to describe object reflectance regarding five different image conditions, which used the Gaussian scale space paradigm of color images. The five different imaging conditions are as follows: (a) equal energy and uneven illumination (b) equal energy but uneven illumination and matte, and dull surfaces (c) equal energy and uniform illumination and matte, dull surfaces and planar objects (d) colored but uneven illumination (e) arbitrary image conditions The object reflectance is derived from the Kubelka-Munk theory, extensive experiments show these color measurements to obtain highly discriminative power while maintaining photometric invariance.

Color Descriptors Based on a Mathematical Model.
A color image can be described by a function I derived from every channel of the RGB color model at the spatial position: fx, yg: Iðx, yÞ = ðRðx, yÞ, Gðx, yÞ, Bðx, yÞÞ. Mindru et al. [59] have provided a definition of the generalized color moment: where M abc pq denotes a generalized color moment of order p + q and degree a + b + c. In particular, it means a lack of color information for moments of degree 0, i.e., M 000 pq , while for moments of order 0, i.e., M abc 00 , this means that it does not contain any spatial information. The framework can be applied to construct various moments; however, to maintain the stability of color moments, only the generalized color moments up to the first order and second degree are adopted.
Consequently, the color moment descriptor has 30 dimensions and it is only robust to shift variation.

(ii) Color moment invariants
Mindru et al. [59] provided various constructions of color moment invariants (CMI) based on the framework of the generalized color moment. In this paper, due to the better results achieved by PSO invariants, we select it. Furthermore, it is deduced under the circumstance that no geometric deformations are present and the photometric transformations are of Type SO. This yields 24 dimensions.

Color Descriptors Based on Information Theory
(i) Color name Berlin and Kay [60] have defined basic color terms based on a large number of anthropological studies, and they provided eleven basic terms to describe color: CN = fblack, blue , brown, grey, green, orange, pink, purple, red, white, yellowg.

Journal of Sensors
According to the theory, Benavente et al. [61] proposed the fuzzy color name (FCN) to model the category of arbitrary color terms by using the concept of fuzzy sets, where the color name descriptor describes the probability of a color stimulus belonging to arbitrary color terms. van de Weijer et al. [62] have proposed a novel color name (CN) descriptor to represent an image: where n i is related to ith color name, x denotes the spatial coordinates of N pixels in region R corresponding to the L * a * b * color space, i.e., f = fL * , a * , b * g, and p = fn i jRg represents the probability of a color name under the corresponding pixel value. It is proved that the color name has good performance in image classification.
(ii) Discriminative color descriptors The color descriptors derived from a physical-based model can introduce a decline of its discriminative power. Khan et al. [63] have proposed discriminative color descriptors (DD) based on an information theoretic approach. The DD bear some resemblance to the color name, which discretize the color space into eleven parts. The DD are employed to find optimized feature clusters for minimization of global objective function, which in this case, separate the original color space into m color words W = fω 1 , ⋯, ω m g, where m = 10 × 20 × 20 = 4000 in the L * a * b * color space.
Note that image data with l classes is given by C = fc 1 , ⋯, c 1 g. Hence, the mutual information describing the discriminative power of color words W regarding the problem of differentiating the classes C is given: where pðc i , ω t Þ and the prior pðc i Þ and pðω t Þ can be measured empirically from image data.

Experimental Setup
We first give implementation details for the evaluated color descriptors. We then describe the image datasets used for evaluation. Finally, the evaluation criteria are given.

Feature Mapping Pipelines.
To evaluate the essential characteristics of color descriptors, the bag of features (BOF) model [64] is used to obtain fixed-length feature vectors. The main algorithm used in the BOF model is k-means clustering. In essence, it is a process of clustering the extracted image features to construct a visual codebook and then mapping features against the visual codebook.
Hence, the BOF model implements vector quantization of the color descriptors. As described in [64], the process of establishing a visual codebook is as follows: (a) Feature extraction and description. We use a grid (16 × 16) with 50% overlap to extract patches from images, and different color descriptors are calculated on every patch (b) Construction of visual codebook. The visual codebook is constructed with the feature patches extracted before, and the k-means algorithm is used to cluster all patches, which are divided into k classes (in this paper k = 300).
(c) Representation of image features. The frequency of each feature word in the k-dimensional visual codebook of test images is counted; then, each test image can be represented as a k-dimensional vector In addition, when calculating the whole image directly for color descriptors based on histograms, we select 16 bins for each color space channel.

Performance Evaluation
3.2.1. Dataset. We use the popular color datasets aiming at studying the effects of illumination variance: SFU (Simon Fraser University), ALOI (Amsterdan Library of illumination), Phos, RawFooT (raw food texture database), and THRI2015 (time-lapse hyperspectral radiance images).

(i) Indoor dataset
The 321 SFU image data [65] is provided by Computational Vision Lab of Simon Fraser University, and it contains 330 images of 30 scenes under 11 different artificial lights. Some images were culled from each set due to deficiencies in the calibration data so that 321 images are adopted. It is proved to be germane to study the computational color constancy. The image data is split into four groups: images with minimal specularities (Mondrian, 22 scenes, 223 images); images with nonnegligible dielectric specularities (specular, 9 scenes, 98 images); images with metallic specularities (metallic, 14 scenes, 149 images); and images with fluorescent surfaces (fluorescent, 6 scenes, 59 images). The 321 SFU image data is used for image recognition. In addition, another image data, which is also from Simon Fraser University, consists of 220 images of 20 scenes under 11 different illumination conditions. The ALOI image dataset [66] contains 1200 objects, each one with 12 images obtained under different illumination conditions; we choose the top 30 objects, 12 images of each one, and a total of 360 images for experiments. The Phos image dataset [67] contains 220 images of 15 scenes under 15 different illumination conditions, i.e., various strengths of uniform illumination and different degrees of nonuniform illumination. The RawFooT image dataset [68] includes images of samples of textures, acquired under 46 lighting conditions which may differ in the light direction, in illuminant color, in its intensity, or in a combination of these factors. We choose 460 images of 10 objects under 46 different illumination conditions from RawFooT.
(ii) Outdoor dataset 6 Journal of Sensors The sequences of time-lapse hyperspectral radiance images of Natural Scenes 2015 (THRI2015) [41] contains 34 images of four natural scenes under 7-9 different natural light sources based on the passage of time. In other words, the images were acquired at approximately 1-hour intervals for each scene. For convenience of image retrieval, we reject redundant images from each scene to make our data include 24 images of four scenes under 6 time points of natural light.

Evaluation Criteria.
In the experiment of image recognition, we use the common k-nearest neighbor classification scheme where the Euclidean distance is adopted as a measurement of the feature's distance between images. Firstly, k images with the closest feature distance are selected for the test images in the training set, if one type of image appears most frequently in the k images, then the test image belongs to this category. In this paper, k values are selected to be 1, 3, and 5 for the assessment of color descriptors. We take the recognition rate of the objects as the evaluation criteria. Suppose that a total of N experiments was conducted, of which M times were correctly identified. Then, the recognition rate (RR) can be defined as follows: We also calculate the average RR over three different k values. For the experimental scheme, we choose the leaveone-out approach. Suppose that there are N images in the image data, and every time, only one is taken as the test image, and the remaining N − 1 images are used as training images. In this way, different images are selected as test images and then N experiments are conducted.
In the context of image retrieval, the performance of color descriptors is assessed by the normalized average rank for a single query [55]: where N denotes the total number of image data, N R is the number of images relevant to the query image, and R i represents rank of the ith relevant image in the query results. A smaller NAR indicates a better retrieval result. The retrieval result is perfect when NAR = 0, and NAR = 0:5 means that the retrieval is random. We take the average NAR value (ANAR) of all candidates as the final result on the image data. For example, we select the first image on every scene of the ALOI image data (360 images of 30 scenes under 12 different illumination conditions) as a candidate, and the remaining images per scene are the training set so that 10 relevant images are possessed by every candidate. Hence, N = 330, N R= 11. In addition, in order to measure the illumination invariance of features, we also measure the Euclidean distance between these color descriptors of the same scene including indoor and outdoor scenes under different lighting conditions. The indoor image data from the Amsterdam Library of Object Images (ALOI) [67] contains 12 images of 1 scene under 12 different artificial lights as shown in Figure 1(a), and the outdoor image data extracted from THRI2015 contains 6 images of 1 scene under 6 different natural illuminations as shown in Figure 1(b).

Results
This section outlines the discussion of the results obtained in our experiments. The performance of the color descriptors discussed in Section 2 under illumination variation is compared in the context of image recognition and image retrieval. For the indoor image data, we divide images into four kinds of surface conditions to analyze the effect of different surface characteristics on the performance of descriptors.

Experiment on Image Recognition.
For ease of presentation in graph, the classification of descriptors is abbreviated as follows: color descriptors based on the selection of color space (Colorspace), color descriptors based on physical reflectance models (Phy.), color descriptors based on mathematical models (Math.), and color descriptors based on information theory (Inf.).
(i) Indoor image data Figure 2 shows the results of color descriptors in SFU 321 image data for images with different surface characteristics. From the results in Figure 2(a), the color descriptors derived from the physical reflectance model such as W, C, N, E, and H have a high degree of robustness to illumination color variation, among which W is the best, because the property of W can be interpreted as an edge detector and it does not only represent object properties but also includes the shadow edge information, at the same time, the construction process of W, which is based on the situation that equal energy, uniform illumination and matte, dull, surfaces, makes W more suitable for these images containing minimal specularities. For color space selection, a majority of SIFT and color SIFT descriptors perform much better than histogram-based descriptors. From the results in Figures 2(b) and 2(c), and N followed by W performs better than other color descriptors. The OPP followed by HUE also have good performance, note that OPP, which uses a histogram over the opponent angle to represent image patches, is invariant to specularities and diffuse lighting; therefore, it has satisfactory performance for images with metal reflection. The majority of the histogram-based color descriptors lack illumination invariance except rg histogram, which improves the robustness of the rg color space to light change by normalizing the RGB color space. The color descriptors derived from the physical reflectance model still have good performance with RR > 90%. It is remarkable that H, which is related to the hue of materials for equal energy but uneven illumination, has a reduced performance with RR = 61:52%. The RR of color descriptors based on information theory are more than 80%, of which CN performs slightly better than the 50dimensional DD. Regarding the results in Figure 2 Figure 2(d) for images with nonnegligible dielectric specularities, E performs best slightly reduced by N and C whereas HSV-SIFT performs worst, which is due to HSV-SIFT lack of invariant properties. Note that for descriptors based on physical reflectance models, E has a more wide application than others. Since N indicates transition in object reflectance, N can keep stable for images with specular reflection. In addition, the RR of both CMI and color descriptors derived from KM are higher than ninety percent. The OPP has better performance than HUE, which is followed by CN. For all the surface characteristics, CMI is better than CM, which is in accordance with the expected theoretical results, and DD shows an obvious improvement in performance with increase of dimensions, among which DD with 50 dimensions is more satisfactory.
According to the comparison for the influence of descriptors in Figure 3, the performance of most color descriptors with respect to the color space selection shows an increase in images with metallic specularities (metallic) or images with nonnegligible dielectric specularities (specular) compared with images with minimal specularities (Mondrian) except opponent-SIFT. For images with specularities, which may result in appearance of highlight area, the performance of opponent-SIFT, which has been proved that it is not robust to light color change and shift [28], is reduced. Different surface characteristics have the least influence on the color descriptors based on physical reflectance models. In view of information-theory-based color descriptors, they all have better performance in metallic than in Mondrian. Descriptors perform better in specular than in Mondrian except DD50 and FCN. It is not difficult to find that most color descriptors in the metallic category perform better than those in the specular category. The nonnegligible dielectric specularities have a more negative effect on color descriptors than metal-containing specular reflection. Meanwhile, performance of most descriptors shows an improvement for images with fluorescent surfaces, whose key characteristic is that some of the light energy they absorb is reemitted at longer wavelengths except DD25 and DD11. Table 1 shows the performance of descriptors in indoor image data. For ALOI image data, we can observe that C, W, N, and CMI give perfect results, and E, followed by H, 9 Journal of Sensors obtains slightly lower score than the best, but they perform better than all the other color descriptors. The descriptors (C, W, N, E, and H) based on physical reflectance model focus on edge information of color images; hence, it is suitable for ALOI image data with obvious image contour information. SIFT-based color descriptors except for SIFT give the lower score than the other descriptors.
From the results of descriptors in Phos image data, we can find that H, whose property is related to the hue of material, gives the highest score, OPP performs well, i.e., lower than H, but better than most of the other descriptors. Note that CM performs worse than many other descriptors for its' lowest RR. CM does not work well for image data with multiple objects in each image and has limited invariance to illumination changes. CMI performs best among the mathematical-based and CN gives the best results among the information-theory-based.
For RawFooT image data based on texture information, we can observe that Hue-SIFT, which is invariant to light intensity change and shift [36], has the best performance with RR higher than 93%, followed by OPP, W, and the other descriptors. Note that Hue-SIFT performs best among descriptors based on color space selection, W performs best among the physical-model-based, CMI performs best among the mathematical-based, and FCN performs best among the information-theory-based. Benavente et al. [61] pointed out that the color assignment of FCN is based on physical color samples, which may result in some error when it applied to other stimuli like light.
(ii) Outdoor image data Table 2 presents the results for THRI2015 image data. OPP obtain the highest score, and a slightly lower score is obtained by HUE. In addition, the CN, FCN, DD25, and DD50 also perform well for the outdoor data. HSV-SIFT, which has poor performance in indoor image data, achieves the lowest recognition rate for below 20 percent. It can also verify the previous conclusion that the combination of HSV color space does not have any robustness. Most color descriptors derived from the physical reflectance model have relatively better performance. The fact that CMI performs much better than CM is consistent with previous results. Table 3 presents the image retrieval results for all the image data.

Experiment on Image Retrieval.
(i) Indoor image data For ALOI image data, w gives the best score for ANAR = 0, followed by N, C, and other color descriptors. HSV-SIFT fails on this image data for its' highest value of ANAR. We can observe that performance of the informationtheory-based is better than other types of color descriptors.
For Phos image data, it is observed that H performs significantly better than other color descriptors, HUE and C obtain a slightly higher value of ANAR than H, but they perform better than all the other descriptors. RGB-histogram gives the largest value of ANAR, which reflects the lack of robustness to illumination variation. Note that Hue histogram, SIFT, HSV-SIFT, E, CM, and DD50 perform worse than many other descriptors.
For RawFooT image data in the context of image retrieval, W obtains the best results with ANAR = 0:0661. E, CM, and DD50 fail on this example with the value of ANAR higher than 0.3. Hue-SIFT also performs well with ANAR = 0:0918, i.e., not as good as W, but better than all other descriptors.
(ii) Outdoor image data For THRI2015 image data, it can be seen that the Hue-SIFT performs the best with the lowest ANAR = 0:0475. In addition, CMI also has better performance than other descriptors for ANAR = 0:12, respectively. When focusing on the comparison of the same type descriptors, the rghistogram has better performance in histogram-based color descriptors. After Hue-SIFT, OPP obtains the second lowest ANAR value for descriptors based on color space selection. For descriptors derived from the information theory, the CN performs best and a slightly larger score is obtained by the fuzzy color name. Note that CMI performs well.
Observing the results in Figure 4(a) for ALOI image data, CMI performs best for its minimum Euclidean distance between features under different illumination conditions, and the C and W also have better performance for its Euclidean distances are less than 0.01. It can be seen that the CM has the worst performance for its extremely unstable Euclidean distance under different illumination conditions; the DD, OPP, HUE, and Hue histogram also perform poorly for its Euclidean distance are greater than 1. However, from the results in Figure 4(b), the FCN, CN, and DD perform worst for its Euclidean distance are greater than 8 under different illumination conditions, and the CMI still performs best with lowest variation of Euclidean distances under different illumination conditions; in addition, the Hue-SIFT also has better invariance for its Euclidean distance are lower than 0.1.  All in all, CMI, which combines the powers of pixel coordinates and their intensities in each color bands, uniformly describes the shape and color distribution of images and has strong robustness to illumination variations. The color descriptors derived from information theory, whose limitation is the fewer vocabulary of color names, obtain high discriminative power at the expense of photometric invariance.
In the unknown image scenes, we recommend giving priority to CMI for its robustness to illumination variation in the context of image recognition and retrieval.   Journal of Sensors

T-Test
Measurement. In order to further analyze the performance of color descriptors, Tables 4 and 5 show the p value of a paired T-test, comparing the best color descriptors in each image data with the remaining descriptors. From the comparison of vertical directions in the table, when p > 0:05, it reflects that the differences between a color descriptor and the best one are not significant. In Table 4, we can observed that the differences of hue-hist, OPP-hist, and RGB-hist compared with the best descriptor W is significant (p < 0:01) for Mondrian, which indicates performance of the histogram-based except rg-hist is poor, whereas the differences between the best descriptor C and hue-hist are not significant (p < 0:01) for fluorescent. In addition, there is a significant difference between CM and the best descriptors for all the surface characteristics, which reflects that CM has limited robustness to illumination variation. From the results of

Discussion and Conclusion
In this paper, in order to compare the robustness of color descriptors with different scenes and different surface characteristics to illumination variation, we have presented a performance evaluation of different color descriptors in the context of image recognition and image retrieval. The recommended color descriptors for different surface characteristic and different image data are shown in the following:  (i) Compared with objects with metallic specularities, images containing nonnegligible dielectric specularities have a more negative effect on the performance of color descriptors, and for the objects with fluorescent surface, the performance of most descriptors has been improved significantly. Consequently, color descriptors are more suitable for images with fluorescent surfaces, whereas for images with specular or metal reflections, the highlight part would lead to the performance degradation of pure color descriptors. It can be considered to fuse the color information of images with the shape, texture, and other information, or focus on the highlighted part for further research (ii) The color moment invariants (CMI) provide the optimal balance between performance and dimensions in most tests. Hence, CMI can be considered an alternative when the high dimensionality of color descriptors is an issue (iii) Color descriptors derived from physical reflectance models are more suitable for object recognition and image retrieval (iv) From the theoretical and experimental results, it can be concluded that we can transform images into a color space with illumination invariance such as the rg color space and then calculate color descriptors, which can improve the robustness of color descriptors to illumination variations. This also provides an effective way to improve existing methods Color descriptors play an important role in feature extraction; future research may tend to build more distinctive color descriptors and consider the relationship between image pixels. Combining with deep learning methods is also an important trend. Feature extraction in computer vision does not point to a single feature; integrating color descriptors properly with other features will play a greater role in different application scenarios.