Innovative Correlation Coefficient Measurement with Fuzzy Data

Correlation coefficients are commonly found with crisp data. In this paper, we use Pearson’s correlation coefficient and propose a method for evaluating correlation coefficients for fuzzy interval data. Our empirical studies involve the relationship between mathematics achievement and other projects.


Introduction
Human thought processes are mainly based on cognitive awareness of the environment and social phenomena.Human knowledge is fuzzy because of humans' subjective awareness of time and space.Therefore, Wu [1] proposed fuzzy theory in reference to how humans perceive complex and uncertain environmental phenomena.
To determine the correlation between phenomena  and , a scatter plot is often used.Using a scatter plot, the correlation between phenomena  and  can be determined to be positive, negative, or statistically independent.
In traditional statistical analysis, correlation coefficients are often found using crisp data.In this paper, we use Pearson's correlation coefficient to calculate correlation coefficients for fuzzy interval data.Fuzzy correlation coefficients are often applied in the fields of engineering or economics but have also been increasingly emphasized in social sciences.
Fuzzy correlations are referenced in the literature.For instance, Nguyen et al. [2,3] provided the fundamentals of statistics with fuzzy data.Hong and Hwang [4] established the correlation coefficient of intuitionistic fuzzy sets in probability space by using the generalization of fuzzy sets by Zadeh [5].Chiang and Lin [6] argued that membership degrees are concrete observational values based on the membership functions of fuzzy sets to define fuzzy correlation coefficients.Chaudhuri and Bhattacharya [7] investigated the correlation of two fuzzy sets that were defined by the members of the supports, which were ranked to evaluate the correlation coefficients of two fuzzy sets.Hong [8] and Ni and Cheung [9] also suggested some methods for calculating fuzzy correlations.Based on correlation coefficients developed by Liu and Kao [10], Xie and Wu [11] and Yang [12] established fuzzy correlation coefficients and obtained fuzzy correlation intervals based on fuzzy interval sample data.R. Saneifard and R. Saneifard [13] calculated the correlation coefficient for fuzzy data by adopting the method from central interval.Cheng and Yang [14] proposed a method for determining fuzzy correlation coefficients and explained the application of fuzzy correlation.Hanafy et al. [15] evaluated the correlation coefficients of neutrosophic sets by centroid method.Wu et al. [16] developed a new approach for determining fuzzy correlation and applied this approach to 12-year compulsory education in Taiwan.Lin et al. [17] investigated some problems on marketing research by using a soft computing technique and a new statistics tool.
The main purpose of this paper is to develop fuzzy correlation coefficients for fuzzy interval data.We propose a functional formula for determining fuzzy correlation coefficients of two variables.We can find the maximum and minimum values by differentiating our proposed functional formula.However, the formula can be applied not only when the value of one of two data sets is a real number but also when both data sets are real numbers.Using this method 2 Mathematical Problems in Engineering of research, we can provide information for researchers to explain related phenomena in practice.

Research Approaches
Let (  ,   ),  = 1, 2, . . ., , be a fuzzy sample set; then, the correlation coefficient between  and  is defined as where  and  are sample means for   and   , respectively.1.
The algorithm of the correlation coefficient between  and  consists of the following five steps.
Step 4. Calculate the correlation coefficient function   between  and  by using formula (1).In this case, the correlation coefficient function   is a function of two variables  and  for the closed region bounded by Ω and is expressed as   = (, ).
Step 5.By the differentiation method, we can find the maximum and minimum values of the correlation coefficient function   .

The Assumption of Corresponding Points of Each Rectangle.
Our initial idea is to find the correlation coefficient for The graph of a rectangle   ⊗   .
Figure 3: The graph of a rectangle   ⊗   .

Example 3. Consider the rectangle sample data 𝑥
For Example 3, we also can find the correlation coefficient   = 0.918 for the upper-right point coordinate, the correlation coefficient   = −0.397for the lower-right point coordinate, the correlation coefficient   = −0.596for the lower-left point coordinate, and the correlation coefficient   = 0.803 for the upper-left point coordinate of each rectangle, as shown in Figures 6, 7, 8, and 9.
We also can find the correlation coefficient   = 0.466 for interior corresponding point of each rectangle; for example, the coordinate (, ) = (1/3, 3/4) as shown in Figure 10.
We assume that the coordinate (, ) of corresponding point of each rectangle is fixed; for example, the coordinate (, ) = (1/2, 1/2) for Figure 5 and the coordinate (, ) = (1, 1) for Figure 6.However, we do not consider the case that each  rectangle may have different coordinates (, ), for example, as shown in Figure 11.
According to Figures 6-9, we cannot say the maximal value of the correlation coefficient   = 0.918 and the minimal value of the correlation coefficient   = −0.596,because there are infinitely many corresponding points for boundary points and interior points of each rectangle; we must evaluate every correlation coefficient for corresponding points of each rectangle.Therefore, we use Steps 1 to 5 and the differential rule of two variables to evaluate the maximal and the minimal values of the correlation coefficient.
The first-order derivatives for  and  are, respectively, Let   (, ) =   (, ) = 0; it follows that ( + 2)(2 − 1) = 5.There is no critical point for the equation ( + 2)(2 − 1) = 5 bounded by Ω.The reason is as follows: If 0 ≤  ≤ 1, then we obtain −7 ≤  ≤ −1/3, and if 0 ≤  ≤ 1, then we obtain The boundary of the region consists of the lines  = 0,  = 1,  = 0, and  = 1.Consideration of extrema on the boundary of the region along  = 0 leads to the function (0, ) = There is no critical point when setting the derivative with respect to  equal to zero.The endpoints are (0, 0) and (0, 1).Consideration of extrema on the boundary of the region along  = 1 leads to the function There is no critical point when setting the derivative with respect to  equal to zero.The endpoints are (1, 0) and (1, 1).Consideration of extrema on the boundary of the region along  = 0 leads to the function (, 0) = (−9 + 3)/2 √ 3 √ 19 −  +  2 , 0 ≤  ≤ 1.There is no critical point when setting the derivative with respect to  equal to zero.The endpoints are (0, 0) and (1, 0).Consideration of extrema on the boundary of the region along  = 1 leads to the function (, 1) = (7 + )/2 √ 19 −  +  2 , 0 ≤  ≤ 1.There is no critical point when setting the derivative with respect to  equal to zero.The endpoints are (0, 1) and (1, 1).All candidates for the maximum and minimum values are listed in Table 1.We see that the minimum value is (0, 0) ≈ −0.596; the maximum
In this case, the center points of these three rectangles are positively correlated, and   = 1.Moreover, these three rectangles are approximately symmetric to the straight line  = .Hence, the tendency of positive correlation of these three rectangles is high.In other words, the fuzzy correlation coefficient   may have a smaller range.Example 6.Consider the rectangle sample data and  3 ⊗  3 = [3, 9] ⊗ [3,4], as shown in Figure 13.
In this case, the center points of these three rectangles are positively correlated, but 0 <   < 1.Moreover, these three rectangles are not symmetric to any straight lines.Hence, the tendency of positive correlation of these three rectangles is not evident.In other words, the fuzzy correlation coefficient   may be a large range.[2,4], and  3 ⊗  3 = [7, 9] ⊗ [0, 4], as shown in Figure 14.
The boundary of the region consists of the lines  = 0,  = 1,  = 0, and  = 1.Consideration of extrema on the boundary of the region along  = 0 leads to the function (0, ) = (−3 + 12)/2 √ 39 √ 1 −  +  2 , 0 ≤  ≤ 1.There is no critical point when setting the derivative with respect to  equal to zero.The endpoints are (0, 0) and (0, 1).Consideration of extrema on the boundary of the region along  = 1 leads to the function (1, ) = (1 + 10)/2 √ 37 √ 1 −  +  2 , 0 ≤  ≤ 1.There is no critical point when setting the derivative with respect to  equal to zero.The endpoints are (1, 0) and (1, 1).Consideration of extrema on the boundary of the region along  = 0 leads to the function (, 0) = (−3 + 4)/2 √ 39 − 6 + 4 2 , 0 ≤  ≤ 1.There is no critical point when setting the derivative with respect to  equal to zero.The endpoints are (0, 0) and (1, 0).Consideration of extrema on the boundary of the region along  = 1 leads to the function (, 1) = (9 + 2)/2 √ 39 − 6 + 4 2 , 0 ≤  ≤ 1.There is no critical point when setting the derivative with respect to  equal to zero.The endpoints are (0, 1) and (1, 1).All candidates for the maximum and minimum values are listed in Table 4.We see that the minimum value is (0, 0) ≈ −0.240; the maximum value is (1, 1) ≈ 0.904.Therefore, the fuzzy interval number of the correlation coefficient is Based on the scatter plots of Examples 6 and 7, we intuitively think that the scatter plot of Example 7 is more dispersed than the scatter plot of Example 6.Therefore, the fuzzy correlation coefficient   of Example 7 will have a larger range.
Comparing the scatter plots of Examples 7 and 8, we intuitively think that the scatter plot of Example 8 is more concentrated and has a greater tendency of positive correlation than that of the scatter plot of Example 7.Moreover, when fuzzy interval data of the scatter plot increase, the fuzzy correlation coefficient will have a smaller range.
Lin et al. [17] proposed the formula of the fuzzy correlation coefficient   as the following four situations: where   is the correlation coefficient of the center point of each rectangle,   is the correlation coefficient of the interval lengths    and    of each of the fuzzy interval numbers   and   , and  = 1 − ln(1 Next, the four scatter plots are observed as follows. Intuitively, the degrees of spread of the four scatter plots (refer to Figures 16,17,18,and 19) do not seem to be the same.Hence, the fuzzy correlation coefficient should not be equal.But the formula (12) of Wu et al. [16] shows that the four fuzzy correlation coefficients are equal, and However, our proposed method obtains different results.The four fuzzy correlation coefficients (refer to Figures 16,17,18,and 19) obtained through our proposed method  are [1,1], [0.976, 1], [0.922, 1], and [0.857, 1], respectively.Therefore, our proposed method produces results that are more consistent with our intuition.

Empirical Studies
In this section, we discuss some applications of fuzzy correlation coefficients.First, we analyze a case in which two data sets are fuzzy interval numbers.Second, we change the case to one in which one data set is a fuzzy interval number, and the other is a real number.Finally, we analyze a case in which both data sets are real numbers.
To understand the factors influencing mathematics achievement at a school, we investigate 10 students' data.where   and   denote the mathematics score and weekly online time, respectively, of a student ,  = 1, 2, . . ., 10, as shown in Figure 20.
Based on the previous discussion, the correlation coefficient function between  and  is Second, based on the previous discussion, the fuzzy interval number of the correlation coefficient is [−0.804,−0.748].
Clearly,  and  have a highly negative correlation.In other words, a higher mathematics score correlates with a lower weekly online time.The weekly online time of a student negatively influences the student's mathematics score.Based on the previous discussion, the correlation coefficient function between  and  is for the closed region bounded by Ω.Based on the previous discussion, there is a minor positive correlation between mathematics score and weekly sleeping time.Therefore, a higher mathematics score correlates with a lower weekly sleeping time.The influence of weekly sleeping time on mathematics score is minor.where   and   denote the mathematics and Chinese scores, respectively, of a student ,  = 1, 2, . . ., 10, as shown in Figure 22. (20) First, let   (, ) =   (, ) = 0; it follows that (, ) ≈ (8.324, 4.416) ∉ Ω or (1.596, −0.534) ∉ Ω.Therefore, no local maximum or minimum is in Ω.
Second, based on the previous discussion, the fuzzy interval number of the correlation coefficient is [0.553, 0.717].
Based on the previous discussion,  and  have a highly positive correlation.Therefore, a higher mathematics score correlates with a higher Chinese score.Students' Chinese scores positively influence their mathematics scores.

Conclusion
Scientists are accustomed to using binary logic to analyze information.Human logic is fuzzy and complex, and applying binary logic to analyze human thought processes causes some distortion.Fuzzy logic is based on human thought processes, and fuzzy logic has therefore been increasing applied to social science.
Possible methods of calculating fuzzy correlation coefficients are proposed in the literature, but understanding most formulas used in the literature requires a strong mathematical background.In this paper, we use Pearson's correlation coefficient and the differentiation method to evaluate fuzzy correlation coefficients, which can be applied to cases in which two data sets are fuzzy interval numbers, one of two data sets is a fuzzy interval number and the other is a real number, and both data sets are real numbers.
This paper discusses only fuzzy correlation coefficients of fuzzy interval number.However, we will extend the research method that we used to triangular or trapezoidal fuzzy numbers in the future.

Figure 1 :
Figure 1: The scatter plot with fuzzy interval data.

Figure 5 :
Figure 5: The graph of Example 3 for the centroid of each rectangle.

Figure 6 :
Figure 6: The graph of Example 3 for the upper-right point coordinate of each rectangle.

Figure 7 :
Figure 7: The graph of Example 3 for the lower-right point coordinate of each rectangle.

Figure 8 :Figure 9 :
Figure 8: The graph of Example 3 for the lower-left point coordinate of each rectangle.

Case 2 .Example 7 .
Maximal value 1 or minimal values −1 of the correlation coefficient do not occur for the closed region bounded by Ω.Consider the rectangle sample data  1
The first-order derivatives for  and  are, respectively,