Evaluating the Performance of Polynomial Regression Method with Different Parameters during Color Characterization

The polynomial regression method is employed to calculate the relationship of device color space and CIE color space for color characterization, and the performance of different expressionswith specific parameters is evaluated. Firstly, the polynomial equation for color conversion is established and the computation of polynomial coefficients is analysed. And then different forms of polynomial equations are used to calculate the RGB and CMYK’s CIE color values, while the corresponding color errors are compared. At last, an optimal polynomial expression is obtained by analysing several related parameters during color conversion, including polynomial numbers, the degree of polynomial terms, the selection of CIE visual spaces, and the linearization.


Introduction
As color electronics often have different imaging characteristics, such as the imaging mechanism, color space, apparatus capability, and material peculiarity [1], the color images always look different in some detail when they are output by different devices.For example, the same image displayed on monitor looks brighter and more colorful than that printed out on paper by printer.Even the same image displayed on two different monitors sometimes produces different visual effects.Thus, for color signal processing system with several color devices, in order to maintain the color consistence of color images, the precision of color signal transmission between different devices must be high enough.Now, for most of the color signal processing systems, the deviceconnection space is often used, such as the CIE color spaces CIEXYZ and CIELAB [2].If the CIELAB space, for example, is chosen as the standard connection space, the color transmission process can be divided into two parts, device-to-CIELAB and CIELAB-to-device.Therefore, the color signal processing precision is highly depending on the color conversion algorithms between device colors and CIELAB colors.
There are many models which can be used for converting color signals, such as Neugebauer model, neural network, interpolation method, and polynomial regression.The regression model is widely used in color signal processing systems [3], since it can produce the high accuracy by using less sample data and also it can be used in both the deviceto-CIELAB and the CIELAB-to-device directions.However, there are still some problems unresolved for this model during processing color signals; for example, (1) the calculation precision of this model is highly dependent on the number and the degree of polynomial terms [4], so it is important to obtain the optimal polynomial expressions for specific signal processing process; (2) the selection of device-connection space, such as CIEXYZ and CIELAB spaces, may have some influence on the signal processing precision [5,6], so it still needs to be analyzed and tested for polynomial regression models; (3) in some cases, the RGB signals are linearized before processing with CIE colors, but in other cases linearization is not added.Hence, for both the RGB and the CMYK signals, the effect of linearization processing should be analyzed and tested [7,8], which may reveal whether or not it should be added for specific color devices and polynomials.
In this paper, these issues above are analyzed and tested in corresponding experiments.The different polynomials expressions, the different device-connection color spaces, and influence of linearization for signal processing are all tested on RGB and CMYK devices.At last, for the specific RGB and CMYK color signal processing systems, the optimal parameters are obtained with detailed analysis.

Polynomial Regression Model for Color Signal Processing
Polynomial regression is a form of linear regression in which the relationship between the independent variable  and the dependent variable  is modeled as an th degree polynomial.Meanwhile, polynomial regression fits a nonlinear relationship between the value of  and the corresponding conditional mean of  values denoted by ( | ) and has been used to describe nonlinear phenomena, such as the growth rate of tissues [9], the distribution of carbon isotopes in lake sediments [10], and the progression of disease epidemics [11].
Within the polynomial regression model, if the dependent variable  and multiple independent variables  1 ,  1 , . . .,   have the linear relationship and there are  groups of sample data: then the relationship between them can be described as [12] . .., where  0 ,  1 , . . .  are the coefficients to be determined and  1 ,  2 , . . .  are independent random variables.
The system of expressions above can be represented using the matrix where If  0 ,  1 , . . .,   are the estimated values by least squares methods for parameter , then the regression equation is From the principle of least squares [13][14][15], the coefficients of  0 ,  1 , . . .,   should obtain the minimal residuals square sum for all the measured value   and regression value ŷ : By using least squares method, the coefficients  can be resolved as follows: In addition, the polynomial regression method can also be used to describe nonlinear problems, in which the dependent variable  is modeled as an th degree polynomial of independent variables, so this model can be rightly used in color signal processing systems.Taking the  monitor as an example, with the CIEXYZ device-connection space, the relationship between  and CIEXYZ can be expressed as where  is the polynomial coefficients,  is the degree of polynomial, and  +  +  ≤ , and the expression above can also be represented using matrix where the  3× is the coefficients matrix and   is the matrix of polynomials, while  represents the number of polynomials.Thus, when  = 1,  = 4, the first-degree polynomial matrix  4 is shown as follows: when  = 2,  = 10, the second-degree polynomial matrix  10 should be when  = 3,  = 20, the third-degree polynomial matrix  20 is and when  = 4,  = 35, the fourth-degree polynomial matrix  35 is There are also some other forms of polynomials used in color signal processing, and the polynomial matrixes are shown as follows: When the coefficients  and  are defined in color signal processing, with the sample color data which consist of device colors and CIEXYZ colors, the coefficient matrix  3× can be resolved using least square method.

Study on the Key Parameters during Color Signal Processing
To determine the key parameters within the polynomial regression model, a color signal processing system with several color devices is introduced in the experiment.As the additive primary color is  and subtractive primary color is CMYK, an  monitor and CMYK printer are chosen as the typical testing color devices.Within the color signal processing system, the device-connection space is CIEXYZ or CIELAB, so the color processing is mainly based on four color spaces.
For the purpose of obtaining the relationship between the device colors and the device-connection colors, the training sample data should be gathered in advance.For the IBM monitor, the three primary channels Red, Green, and Blue are all divided evenly into 9 parts, and each value of , , and  In general, the regression errors for the training sample reduce data as the number of polynomial terms or the degree of polynomials increases.However, for the color data of entire range, the regression errors will increase when the number of polynomial terms exceeds a certain value.Therefore, it is highly important and necessary to find the optimal polynomial expressions producing least errors, especially determined by the number of polynomial terms or the degree of polynomials.In experiment, to evaluate the different polynomial expressions employed in color signal processing, the error computation is defined by the color difference CIE76 formula [16]: where is the regression color and ( * 2 ,  * 2 ,  * 2 ) is the measured color.

Number and the Degree of Polynomial Terms.
To find the appropriate polynomial expressions, the  monitor is tested and the signal processing errors with different polynomial expressions are compared.Firstly, the training sample data is used to obtain the polynomial coefficients between  and CIELAB signals; then for all the  colors, the CIELAB values can be simulated by using the obtained coefficients; secondly, in order to analyze the regression precision, the measured CIELAB color values from the training data are used to compute the errors of different polynomial expressions; at last the errors are represented as color differences between measured and simulated CIELAB values as below.
From the above result, it can be seen that the regression precision obviously improves as the number of polynomial terms increases, but when the number of terms reaches a certain value the mean error becomes small enough.For example, for the first-degree polynomial  4 , the average error is 10.9106Δ which exceeds the reproduction error threshold [17,18], and the maximal error is 37.5866Δ which is a visually unacceptable error.Additionally its standard deviation and variance are 5.4642Δ and 29.8576Δ, respectively, which indicate that the distribution of errors is unsatisfactory.
In general, the regression precision can be evaluated mainly from the average error for different polynomials shown in Figure 1.The regression errors for polynomials  4 ,  6 ,  8 , and  10 are all exceeding 5Δ units, while for the other polynomials  14 ,  20 , and  35 their color differences are all acceptable.The figure shows that polynomial  35 is the most accurate, but its precision is very close to polynomial  20 .In addition, too many terms of the polynomial may increase the difficulty of coefficients-solving process [14], so the polynomial  20 should be most suitable for RGB color signal processing.
Using the regression coefficients from the training data, the relationship between  and CIELAB signals can be described.To verify the precision of different polynomials for the whole range of  signals, the testing sample data should be used.For the testing data which is outside the range of training data, the 's corresponding simulation CIELAB values can be computed using the relationship    3.
It can be seen that, for the CMYK devices, most of the polynomials perform well with the average error below 3Δ.This is mainly because the color-rendering properties of CMYK printers surpass those of the RGB monitors, such as the regularity of color gamut and color consistence [1,19].On the whole the fourth-degree polynomial  35 with the largest number of terms has the smallest color difference and its error distribution is also ideal.The performance of the thirddegree polynomial  20 is close to the  35 , which indicates that the preferred polynomials for CMYK and CIELAB signal processing should be the  35 or  20 .

Selection of CIE Color Spaces during Color Signal Processing.
During the signal processing for different color devices, the use of two CIE color spaces, CIEXYZ and CIELAB spaces, often brings in different color errors.Hence, it is important to find the appropriate CIE color space for the specified color devices and polynomials.
In the experiment, the CIEXYZ and CIELAB color spaces are tested, respectively, on  monitors and on CMYK printers to compare their color errors.For the IBM monitor, when the CIEXYZ is selected as device-connection space, the errors of different polynomials are shown in Table 4. Corresponding to the color errors listed in Table 2 where the CIELAB is the device-connection space, the mean color differences corresponding to these two CIE spaces are compared in Figure 2.
As the figure shows, for the  monitors, the polynomials perform somewhat differently with the CIEXYZ and CIELAB color spaces.For the low-degree polynomials, the color error of signal processing with CIEXYZ space is greater than with CIELAB space, and along the increasing of polynomial terms, the influence of device-connection color space becomes very little.So when the polynomials of less  than 10 terms are used in  signal processing, the CIELAB color space is recommended as the device-connection space, while for the case when the polynomial terms are between 10 and 20, the CIEXYZ space is suggested.
To test the influence of CIE color space for CMYK devices, the errors of CIEXYZ and CIELAB color spaces are also compared on Epson9889 printer.The CMYK signal processing errors with CIELAB space have been listed in Table 3; for the polynomials of 8 to 35 terms, the color errors with CIEXYZ space are recorded in Table 5.
It can be seen that, for the second-degree polynomials  10 and  14 , third-degree polynomial  20 , and fourth-degree polynomial  35 , all the color errors are acceptable.Taking account of the precision and computing efficiency, the thirddegree polynomial  20 is the most suitable model for CMYK devices using CIEXYZ color space.In addition, similar to the comparison of  signal processing with different device-connection spaces, the CMYK signal processing using CIELAB and CIEXYZ spaces is compared in Figure 3.
It can be seen that, within the CMYK signal processing based on polynomial regression method, the precision of color conversion using CIELAB color space is higher than that of using CIEXYZ space for a majority of the second, third, and fourth degree of polynomials.Therefore, in CMYK signal processing, the CIELAB color space is preferred as the device-connection space for the polynomial regression model.[20], in which the  signals are firstly converted into lightness signals before color conversion process.In the experiment, the linearization is described as follows: where   ,   ,   , respectively, are the lightness signals, respectively, and  // are the linearization functions which are described as follows: where  stands for one of the colors , , or  and   ( = 0, 1, 2, 3) are the coefficients.Within the  color signals processing, the device colors  are firstly linearized into the lightness signals ()  , then the relationship between ()  and CIELAB is obtained by polynomial regression model, and at last the color error is calculated by using the measured colors within the testing data.For the IBM monitor in the experiment, the color differences of different linearized polynomials are shown in Table 6.To test the influence of the linearization on the color signal processing precision, the two groups of color errors in Tables 2 and 6 are compared.As shown in Figure 4, the errors are very close for the two processes, so a conclusion is reached that for the  devices the linearization has little impact on the signal processing precision especially for the polynomials of degree greater than or equal to three.
Similarly, to test the linearization for CMYK signal processing, the color errors of testing sample of EPSON printer are recorded in Table 7, and the comparison with the signal processing without linearization in Table 3 is described in Figure 5.
It can be seen that, for the CMYK printers, the precision of signal processing with linearization is lower than that without linearization for most polynomial models.In some cases the linearization process does not improve the CMYK signal processing precision, so it is not advisable for CMYK devices calibration.

Conclusions
In this paper, the polynomial regression model is used for RGB and CMYK color signal processing.For the purpose of improving color signal processing precision, the item number and degree of the polynomials are tested.By comparing the color errors within the color signal processing, the appropriate polynomial expressions for  and CMYK color devices are obtained.In addition, the parameters of device-connection color space and linearization are tested.
In general, the  and CMYK color signal processing by employing the polynomial regression method can be concluded as follows.
(1) During the  and CMYK color signal processing with polynomials regression, it is advised to use the third-degree polynomials  20 or fourth-degree polynomials  35 .Taking into account the coefficient solving process and the color errors in experiment, the third-degree  20 is the most appropriate model.
(2) When the CIE color space is used as the deviceconnection space, for the  devices, the signal processing precision is higher with CIEXYZ space than with CIELAB space for polynomials including 10 to 20 terms, while in other cases the CIELAB is more precise.For CMYK devices, in most cases the CIELAB color space performs better than the CIEXYZ space.
(3) When the linearization is added in the color signal processing, the precision improves somewhat for parts of polynomials for the  devices, while for the CMYK devices, the addition of linearization reduces the signal processing accuracy instead, so the linearization is not recommended to be used within the CMYK signal processing.
colors ranges within [0 32 64 96 128 160 192 224 255].When all these 9 3 = 729 patches are displayed on monitor, the corresponding CIEXYZ and CIELAB colors are measured with Spectrophotometer X-Rite DTP94.These  and corresponding CIEXYZ or CIELAB colors form the training sample data.To verify the accuracy of polynomial regression model, the testing sample data should also be collected.Similar to the training sample data, the testing data consists of 7 3 = 349 color patches with the single channel ranging within [16 48 80 112 144 176 208 240].For the CMYK Epson 9880 printer, the single channel is divided into 11 parts with the interval 10, so every color channel ranges within [0 10 20 30 40 50 60 70 80 90 100].Because the subtractive primary colors are Cyan, Magenta, and Yellow (CMY) and the color of Black can be seen as the replacement of a certain amount of CMY, in experiment the device color CMYK is treated as CMY.Thus, when all the 11 3 = 1331 CMY color patches are printed out, the corresponding CIEXYZ and CIELAB colors are measured with Spectrophotometer X-Rite 528, and all these CMY and corresponding CIE colors form the training sample data.In addition, the testing data consists of 6 3 = 216 patches with the single channel ranging within [5 25 45 65 85 95].

Figure 1 :
Figure 1: Different polynomials' average error of the  training samples.

Figure 2 :
Figure 2: Comparison of the  monitor's color errors between CIEXYZ and CIELAB spaces.

Figure 3 :
Figure 3: Comparison of the CMYK printer's color errors between CIEXYZ and CIELAB spaces.

Figure 4 :
Figure 4: The  monitor's color error comparison with linearization.

Figure 5 :
Figure 5: The CMYK printer's color error comparison with linearization.

Table 1 :
The color differences of RGB training sample data for different polynomials.

Table 2 :
The color difference of RGB testing data for different polynomials.

Table 2 ,
for the  and CIELAB color signals, the signal processing precision of different polynomials for testing data is similar to the training data shown in Table1, and the acceptable forms of polynomials are  14 ,  20 , and  35 , respectively.For the purpose of testing the different polynomials' performance for CMYK signals, the training sample data of EPSON9880 printer are used to obtain the relationship between CMYK and CIELAB.Because the errors are too large for polynomials with few terms such as  4 and  6 ,

Table 3 :
The color difference of CMYK testing data for different polynomials.

Table 4 :
The color difference of RGB monitor for different polynomials using CIEXYZ space.,  10 ,  14 ,  20 , and  35 are only tested for CMYK signals.With the obtained polynomial coefficients solved with training data, for the 216 testing patches of testing sample data, the color differences are shown in Table

Table 5 :
The color differences of CMYK printer for different polynomials using CIEXYZ space.

Table 6 :
The RGB color difference of different polynomials with linearization applied.

Table 7 :
The CMYK color difference of different polynomials with linearization applied.