A Systematic Approach to Optimizing h Value for Fuzzy Linear Regression with Symmetric Triangular Fuzzy Numbers

A systematic approach is proposed to optimize h value for fuzzy linear regression (FLR) analysis using minimum fuzziness criteria with symmetric triangular fuzzy numbers (TFNs). Firstly, a new concept of credibility is defined to evaluate the performance of FLR models with different h values when a set of sample data pairs is given. Secondly, based on the defined concept of credibility, a programming model is formulated to optimize the value of h. Finally, both the numerical study and the real application show that the approach proposed in this paper is effective and efficient; that is, optimal value for h can be determined definitely with respect to a set of given sample data pairs.


Introduction
Statistic regression analysis and fuzzy regression analysis are two types of methods underlying different philosophies to assess the functional relationship between the dependent and independent variables and determine the best-fit model for describing the relationship, by exploiting the knowledge from the given input-output data pairs.In statistical regression analysis, deviations between the observed values and the estimates are assumed to be random errors disturbed by a probabilistic distribution.Different from statistic regression analysis, in fuzzy regression analysis, the deviations are attributed to the imprecision of the observed values and/or the indefiniteness of model structure.Tanaka et al. [1] firstly proposed fuzzy linear regression (FLR) analysis using the fuzzy functions defined by Zadeh's extension principle [2], in which the observed values can differ from the estimated values to a certain degree of belief [3].Thus, the uncertainty in this type of regression model becomes fuzziness, not randomness, and the disturbance is incorporated into the fuzzy coefficients, and the final objective is to adjust the fuzzy coefficients from the available sample data pairs.
According to [3], the existing FLR methods can be roughly classified into the following two categories based on criterion function, that is, FLR methods using minimum fuzziness criteria and FLR methods using fuzzy least-squares criteria.By using the first category of FLR methods, FLR model can be built by minimizing the system vagueness.The first FLR method in [1] was extended by using other types of fuzzy coefficients, including general LP-type fuzzy coefficients [4], exponential fuzzy coefficients [5], and triangular fuzzy coefficients [6,7], Chen et al. depended on symmetric triangular fuzzy numbers to study determination Method for Parameters of Rock's shear strength through least absolute linear regression, and the analysis of practical engineering computation, and comparison to other methods shows that the method is reasonable [8], trapezoidal fuzzy coefficients [9], and Kheirfam and Verdegay extend the dual simplex method to a type of fuzzy linear programming problem involving symmetric trapezoidal fuzzy numbers.The results obtained lead to a solution for fuzzy linear programming problems and the optimal value function with fuzzy coefficients [10].And interval regression, where a model with interval coefficients is assumed, is regarded as the simplest version of fuzzy regression analysis [11][12][13].Some fuzzy nonlinear regression approaches also were proposed [3,7]; a research indicated that prediction performance of the nonlinear multiple regression model is higher than that of the fuzzy inference system model [14].Based on the other direction of FLR methods, FLR model will be built by minimizing fuzzy distance between the predicted outputs and the observed outputs [15][16][17].Celmin ¸š [15] has dealt with quadratic membership functions based on least squares fitting with indicators of discord, data spread dilator, and so forth, and Diamond [16] proposed models for least squares fitting for crisp input fuzzy output and for fuzzy input-output where the distance of fuzzy numbers is defined to measure the best fit for models.Chang [17] formulated fuzzy leastsquares regression model by defining the weighted fuzzy arithmetic (WFA).Lv et al. proposed a novel least squares support vector machine-(LSSVM-) based ensemble learning paradigm to predict NO  emission of a coal-fired boiler using real operation data.The result shows that the new soft FCM-LSSVM-PLS ensemble method can predict NO  emission accurately [18].
Regarding the first category of FLR methods, the value of ℎ determines the range of the possibility distribution of the fuzzy parameters [1,[3][4][5][6][7][8]19], so it is important to select a suitable value for ℎ in FLR analysis.Moskowitz and Kim [20] studied the relationship among the ℎ value, membership function shape, and the spreads of fuzzy parameters in FLR with symmetric fuzzy numbers, they also developed a general approach to assess the proper ℎ parameter values.Their studies showed that the system fuzziness will increase with the augment of ℎ value.And Tanaka and Watada [19] suggested that the selection of the ℎ value should be based on the sufficiency of the collected dataset.When the dataset is sufficiently large, ℎ = 0 should be used and is increased along with the decreasing volume of the collected data.However, both did not suggest how to get an optimal ℎ value for the FLR model when a sample dataset is given.In practical situations, the value of ℎ is usually subjectively preselected by the decision makers (DMs) [17,18].
In fact, if a larger value is given to ℎ, the FLR using triangular fuzzy coefficients tends to yield large unnecessary fuzziness and estimated parameters with too large aspiration, which leads to the fuzzy predictive interval too fuzzier and has no operational definition or interpretation [8].On the other hand, if a smaller ℎ value is used, the FLR tends to yield very lower membership degree, which leads to the very narrow fuzzy predictive interval and the reliability of the FLR model is doubtable.Therefore, it is necessary to develop a systematic approach to help DMs determine the optimal ℎ value for the FLR using minimum fuzziness criteria.To tackle the problem, with the suggestion from Tanaka and Watada [19], a new concept of credibility is introduced to measure the performance of the FLR models with different ℎ values in this paper, based on which a systematic approach is formulated to optimize ℎ values for FLR using minimum fuzziness.
The rest of the paper is organized as follows.In Section 2, Tanaka's FLR method is described.In Section 3, the concept of credibility is proposed to measure the performance of FLR models with different ℎ values.In Section 4, a systematic approach is formulated to optimize the ℎ value for FLR with symmetric triangular fuzzy numbers (TFNs).In Section 5, a numerical example is used to show how the optimal value can be determined using the approach proposed in this paper.In Section 6, a real application is conducted to illustrate the effectiveness of the approach proposed in this paper.The conclusions are drawn in Section 7.

FLR with Symmetric TFNs
As in a FLR analysis, the explained variable is assumed to be a linear combination of the explanatory variables.This relationship should be obtained from a sample of  observations {( 1 , x 1 ), . . ., (  , x  ), . . ., (  , x  )}, where   is the th crisp observation and x  = { 0 ,  1 , . . .,   , . . .,   } is the th crisp input vector.Moreover,  0 = 1∀, and   is the observed value for the th variable in the th case of the sample.In particular, the fuzzy linear function has to be estimated as follows: where ỹ is the fuzzy estimation of   .And Ã ,  = 0, 1, . . ., , are fuzzy coefficients in the terms of symmetric TFNs and can be uniquely defined by Ã = (   ,    ),  = 0, 1, . . ., .Here,    is the spread value, and    is the centre value of Ã (see Figure 1).
The goal in the fuzzy linear regression is to determine (x  ) by minimizing the system vagueness subject to the following inclusion condition [1,19]: where (x  ) ℎ is the ℎ-level set of the fuzzy output from the linear fuzzy model (x  ) corresponding to the input vector x  .Since ℎ-level set of fuzzy numbers are intervals, (2) can be further given as follows by using Interval arithmetic: The system fuzziness in (1), Δ, can be given as in which Δ ỹ is the fuzziness associated with ỹ and can be given as Henceforth, according to [1,19], Ã ,  = 0, 1, . . ., , in the form of symmetric TFNs, can be determined by solving the following optimal programming model: subject to in which the constraints (6b) and (6c) are corresponding to inclusion condition in (2), and the constraint (6d) guarantees that the spread values of Ã ,  = 0, 1, . . ., , are nonnegative.

Credibility Measure for the FLR Model
As we can see from (6a) to (6d), the value of ℎ determines the range of the possibility distribution of the fuzzy parameters, so it is important to select a suitable ℎ value for FLR.To do this, the first problem to be solved is how to evaluate the FLR models with different ℎ values in (1).Therefore, a new concept of credibility measurement is introduced in this section, based on which the FLR models associated with different ℎ values can be assessed.Assume that ℎ 1 and ℎ 2 are any two values for ℎ and 0 ≤ ℎ 1 < 1 and 0 ≤ ℎ 2 < 1.And we denote that Ã1 and Ã2  ,  = 0, 1, . . ., , are two sets of symmetric TFNs with respect to ℎ 1 and ℎ 2 , respectively, and ỹ1 and ỹ2  are the corresponding estimations of   from (1).From (1), ỹ1 and ỹ2  are calculated as Now, we are interested in which one, out of ỹ1  and ỹ2  , is better as the estimation of   .That is to say, which one, out of ℎ 1 and ℎ 2 , is better to be used to build a FLR model when the sample set of data pairs is given?To deal with the problem, in our opinion, two factors, that is, the estimating fuzziness Δ ỹ and the membership degree  ỹ (  ), should be taken into account.In general, the smaller Δ ỹ is and the higher  ỹ (  ) is, the better the performance of ỹ in representing   will be.To illustrate our point of view, two special cases are considered firstly as follows.
(  , the performance of one with higher membership degree is better than that of the other with lower membership degree in representing   .As shown in Figure 3, Δ 4. To deal with this problem, a new concept of credibility measure is introduced in this paper.We denote the credibility of ỹ in representing   as   , which is expressed as The higher   is, the better the performance of ỹ in representing   will be.Obviously, the scenarios of (1) and (2) are two special cases of ( 9) respectively.As a result, based on (9), the total credibility of all sample data, , can be obtained to assess the performance of the obtained FLR model in (1), which can be calculated as follows: The higher  is, the better of the performance of the FLR model will be.This will help us select the optimal value of ℎ for a FLR model.
With respect to ℎ 1 , ( ỹ (  )) ℎ 1 ,  = 1, . . ., , are calculated as follows: in which is given as Therefore, according to (8), the estimating credibility for all sample data with regard to ℎ 1 can be expressed as Therefore, from (9), the total credibility for the FLR model with respect to ℎ 1 can be obtained as From (12), with regard to ℎ 2 , we have Therefore, according to (11), ( ỹ (  )) ℎ 2 with regard to ℎ 2 can be calculated as Henceforth, according to ( 8), (10), and ( 16), the estimating credibility for all sample data pairs with regard to ℎ 2 can be expressed as in which Consequently, the estimating credibility for the FLR model in (1) with regard to ℎ 2 can be obtained as in which For similarity, we denote ℎ 1 = 0 and ℎ 2 = ℎ, (11), ( 16), ( 17), (18), and ( 20) can be rewritten as Based on (26), the optimal ℎ value for the FLR model in (1) can be obtained by maximizing the estimating credibility; that is, the optimal value for ℎ can be obtained by solving the following programming model: It is obviously that the programming model in (27a) to (27b) is quadratic, and many kinds of optimization algorithms, such as the gradient descent method, can be used to solve the previous nonlinear programming problem.
From Figure 6, we can see that the fuzziness of all sample data become larger with the augment of ℎ value; especially when ℎ value is close to 1, the fuzziness will be extremely larger.From Figure 7, it can be found that the membership degree of all sample data becomes higher with the increase of ℎ value, and the membership degree will be close to 1 when ℎ value is close to 1.It is clearly shown in Figure 8 that when ℎ value is close to 1, the credibility measures for all sample data will be close to zero due to the extremely larger of the fuzziness (see Figure 7).Figure 9 shows that when ℎ value is 0.2666, the total credibility of the FLR model will achieve the maximum, based on which the best FLR model can be obtained.
To further demonstrate the effectiveness of the approach proposed in this paper, another numerical example with two independent variables is given as follows.The twenty testing datasets are listed in Table 3.
According to (27a) and (27b), the optimal value ℎ * is given as 0.3184.When ℎ is specified as 0, 0.3184, and 0.5, respectively, the fuzzy coefficients in terms of symmetric TFNs and the corresponding total credibility can be obtained  by solving the programming model in (6a) , (6b), (6c), and (6d).The results are summarized in Table 4.
From Table 4, we can see that when ℎ value is set to 0.3148, the total credibility of the FLR model will achieve the maximum, that is, 0.1371, which indicates that the approach proposed in this paper is effective and reasonable.

Real Application
To investigate the effectiveness of the approach proposed in this paper, modeling welding process for electronic manufacturing using fuzzy linear regression was studied.An electronic company is a famous OEM company of printed circuit board (PCB) in PR China.The engineers in this company wanted to improve the welding quality by investigating the relationship between the pull strength () of welding line and the proportion of colophony () in welding fluid.And an engineering experiment was conducted and the experiment results are shown in Table 5.
With the use of the 10 experimental datasets pairs, the optimal ℎ value is calculated as 0.2057.If ℎ value is given as 0.2057 (the optimal value), 0, and 0.5, respectively, the FLR models with the corresponding total credibility are summarized in Table 6.
It is not surprise that the total credibility of the FLR model with the optimal ℎ value achieves the highest among the three FLR models (see Table 6).In fact, when the optimal ℎ value is used, the total credibility of the FLR model will improve 7.19% comparing with ℎ = 0 and 15.91% comparing with ℎ = 0.5.
To further investigate the modeling performance of the approach proposed in this paper, we divided the sample data pairs to two groups: one group is for modeling; that is, 80% of all the sample data pairs were used to build the FLR model; the other group is for testing; that is, the 20% of all the sample data were used to test the performance of the FLR model.Therefore, each time, two datasets were randomly selected from ten datasets as testing datasets while the rest eight datasets were used to develop FLR models when ℎ value is specified as the optimal value, 0 and 0.5, respectively.Their corresponding total credibility was calculated.The previous procedures were repeated for ten times.Table 7 summarizes the testing results.
From Table 7, it can be seen that the predictive performance of the FLR model with the optimal ℎ value is the best among those of the other models.In fact, in general, when the optimal ℎ value is used, the total credibility of the FLR model will improve 9.73% comparing with ℎ = 0 and 22.61% comparing with ℎ = 0.5.Therefore, the approach proposed in this paper is effective.

Conclusions
In this paper, a systematic approach is proposed to select optimal value of ℎ for FLR analysis with symmetric TFNs.Firstly, a new concept of credibility is introduced by the consideration of system fuzziness and membership degree, which can be used to assess the performance of FLR analysis with different ℎ values when a set of sample data pairs is given.Secondly, a procedure to obtain optimal ℎ value is formulated for FLR with symmetric TFN by maximizing the total credibility of FLR models.By using the approach proposed in this paper, the optimal value of ℎ can be determined definitely with respect to a set of sample data pairs.Both the numerical example and real application demonstrate that the approach proposed in this paper is effective and efficient.The further work in this direction involves extending the approach proposed in this paper for FLR analysis with fuzzy outputs, that is, developing a procedure to select optimal ℎ value for FLR with fuzzy observations and applying the proposed approach to practical problems.

Figure 6 :Figure 7 :
Figure 6: Fuzziness of the eight sample data.

Figure 8 :Figure 9 :
Figure 8: Credibility of the eight sample data.
, the performance of one with smaller fuzziness is better than that of the other with larger fuzziness in representing   .As shown in Figure2,  ỹ1 1) If  ỹ1  (  ) is equal to  ỹ2  (  ), it is clear that, out of ỹ1  and ỹ2   (  ) =  ỹ2  (  ) and Δ ỹ1  < Δỹ 2  ; therefore, ỹ1  is better than ỹ2  as the estimation of   .(2) If Δ ỹ1  is equal to Δ ỹ2  , it is doubtless that, out of ỹ1  and ỹ2 ỹ1  = Δ ỹ2  and  ỹ1  (  ) >  ỹ2  (  ); therefore, ỹ1 is better than ỹ2  as the estimation of   .

Table 3 :
Testing data set.

Table 4 :
The corresponding results.

Table 5 :
The experiment results.

Table 6 :
The FLR models with the corresponding total credibility.