MPE Mathematical Problems in Engineering 1563-5147 1024-123X Hindawi Publishing Corporation 10.1155/2016/9094832 9094832 Research Article Innovative Correlation Coefficient Measurement with Fuzzy Data Wu Berlin 1 Hung Chin Feng 1 Kóczy László T. Department of Mathematical Sciences National Chengchi University Taipei 116 Taiwan nccu.edu.tw 2016 3052016 2016 17 12 2015 21 04 2016 28 04 2016 2016 Copyright © 2016 Berlin Wu and Chin Feng Hung. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Correlation coefficients are commonly found with crisp data. In this paper, we use Pearson’s correlation coefficient and propose a method for evaluating correlation coefficients for fuzzy interval data. Our empirical studies involve the relationship between mathematics achievement and other projects.

1. Introduction

Human thought processes are mainly based on cognitive awareness of the environment and social phenomena. Human knowledge is fuzzy because of humans’ subjective awareness of time and space. Therefore, Wu  proposed fuzzy theory in reference to how humans perceive complex and uncertain environmental phenomena.

To determine the correlation between phenomena X and Y , a scatter plot is often used. Using a scatter plot, the correlation between phenomena X and Y can be determined to be positive, negative, or statistically independent.

In traditional statistical analysis, correlation coefficients are often found using crisp data. In this paper, we use Pearson’s correlation coefficient to calculate correlation coefficients for fuzzy interval data. Fuzzy correlation coefficients are often applied in the fields of engineering or economics but have also been increasingly emphasized in social sciences.

Fuzzy correlations are referenced in the literature. For instance, Nguyen et al. [2, 3] provided the fundamentals of statistics with fuzzy data. Hong and Hwang  established the correlation coefficient of intuitionistic fuzzy sets in probability space by using the generalization of fuzzy sets by Zadeh . Chiang and Lin  argued that membership degrees are concrete observational values based on the membership functions of fuzzy sets to define fuzzy correlation coefficients. Chaudhuri and Bhattacharya  investigated the correlation of two fuzzy sets that were defined by the members of the supports, which were ranked to evaluate the correlation coefficients of two fuzzy sets. Hong  and Ni and Cheung  also suggested some methods for calculating fuzzy correlations. Based on correlation coefficients developed by Liu and Kao , Xie and Wu  and Yang  established fuzzy correlation coefficients and obtained fuzzy correlation intervals based on fuzzy interval sample data. R. Saneifard and R. Saneifard  calculated the correlation coefficient for fuzzy data by adopting the method from central interval. Cheng and Yang  proposed a method for determining fuzzy correlation coefficients and explained the application of fuzzy correlation. Hanafy et al.  evaluated the correlation coefficients of neutrosophic sets by centroid method. Wu et al.  developed a new approach for determining fuzzy correlation and applied this approach to 12-year compulsory education in Taiwan. Lin et al.  investigated some problems on marketing research by using a soft computing technique and a new statistics tool.

The main purpose of this paper is to develop fuzzy correlation coefficients for fuzzy interval data. We propose a functional formula for determining fuzzy correlation coefficients of two variables. We can find the maximum and minimum values by differentiating our proposed functional formula. However, the formula can be applied not only when the value of one of two data sets is a real number but also when both data sets are real numbers. Using this method of research, we can provide information for researchers to explain related phenomena in practice.

2. Research Approaches

Let ( x i , y i ) , i = 1,2 , , n , be a fuzzy sample set; then, the correlation coefficient between x and y is defined as (1) r x y = i = 1 n x i - x ¯ y i - y ¯ i = 1 n x i - x ¯ 2 i = 1 n y i - y ¯ 2 , where x ¯ and y ¯ are sample means for x i and y i , respectively.

Definition 1 (fuzzy interval number).

Let a fuzzy number A = [ a , b ] be an interval over the real number R , let c = a + b / 2 be the center of interval A , and let r = ( b - a ) / 2 be the radius of interval A ; then, interval A can be expressed as A = [ a , b ] or A = ( c ; r ) . Consider interval A a fuzzy interval number.

Consider the fuzzy sample set ( x i , y i ) , i = 1,2 , , n , where x i = [ x i 1 , x i 2 ] and y i = [ y i 1 , y i 2 ] are fuzzy interval numbers, as shown in Figure 1.

The algorithm of the correlation coefficient between x and y consists of the following five steps.

The scatter plot with fuzzy interval data.

Step 1.

For any fuzzy interval number, x i = [ x i 1 , x i 2 ] and y i = [ y i 1 , y i 2 ] , x i y i is defined by a rectangle. In addition, the rectangle x i y i has four vertices, A i , B i , C i , and D i , the coordinates of which are ( x i 1 , y i 1 ) , ( x i 2 , y i 1 ) , ( x i 2 , y i 2 ) , and ( x i 1 , y i 2 ) , respectively, as shown in Figure 2.

The graph of a rectangle x i y i .

Step 2.

Choose a point E i lying in the line segment A i B i ¯ such that two segments’ proportion A i E i ¯ : E i B i ¯ = s : ( 1 - s ) , where 0 s 1 , and the point coordinate E i ( x i 1 + s ( x i 2 - x i 1 ) , y i 1 ) is obtained, as shown in Figure 3.

The graph of a rectangle x i y i .

Step 3.

Choose a point G i lying in the line segment C i D i ¯ such that E i G i ¯ parallels A i D i ¯ . Next, choose a point F i lying in the line segment E i G i ¯ such that two segments’ proportion E i F i ¯ : F i G i ¯ = t : ( 1 - t ) , where 0 t 1 , and the point coordinate F i ( x i 1 + s ( x i 2 - x i 1 ) , y i 1 + t ( y i 2 - y i 1 ) ) is obtained, as shown in Figure 4.

The graph of a rectangle x i y i .

Definition 2.

The domain set Ω = { ( s , t ) 0 s 1 , 0 t 1 } .

Step 4.

Calculate the correlation coefficient function r x y between x and y by using formula (1). In this case, the correlation coefficient function r x y is a function of two variables s and t for the closed region bounded by Ω and is expressed as r x y = f ( s , t ) .

Step 5.

By the differentiation method, we can find the maximum and minimum values of the correlation coefficient function r x y .

2.1. The Assumption of Corresponding Points of Each Rectangle

Our initial idea is to find the correlation coefficient for corresponding points of each rectangle. For Example 3, we can find that the correlation coefficient r x y = 0 for the centroid of each rectangle.

Example 3.

Consider the rectangle sample data x 1 y 1 = [ 0,2 ] [ 4,6 ] , x 2 y 2 = [ 4,8 ] [ 8,10 ] , and x 3 y 3 = [ 10,12 ] [ 0,10 ] , as shown in Figure 5.

For Example 3, we also can find the correlation coefficient r x y = 0.918 for the upper-right point coordinate, the correlation coefficient r x y = - 0.397 for the lower-right point coordinate, the correlation coefficient r x y = - 0.596 for the lower-left point coordinate, and the correlation coefficient r x y = 0.803 for the upper-left point coordinate of each rectangle, as shown in Figures 6, 7, 8, and 9.

We also can find the correlation coefficient r x y = 0.466 for interior corresponding point of each rectangle; for example, the coordinate ( s , t ) = ( 1 / 3 , 3 / 4 ) as shown in Figure 10.

We assume that the coordinate ( s , t ) of corresponding point of each rectangle is fixed; for example, the coordinate ( s , t ) = ( 1 / 2 , 1 / 2 ) for Figure 5 and the coordinate ( s , t ) = ( 1,1 ) for Figure 6. However, we do not consider the case that each rectangle may have different coordinates ( s , t ) , for example, as shown in Figure 11.

The coordinate ( s , t ) = ( 0,1 ) of rectangle A , the coordinate ( s , t ) = ( 1,0 ) of rectangle B , and the coordinate ( s , t ) = ( 0,0 ) of rectangle C in Figure 11; their coordinates ( s , t ) are not different.

According to Figures 69, we cannot say the maximal value of the correlation coefficient r x y = 0.918 and the minimal value of the correlation coefficient r x y = - 0.596 , because there are infinitely many corresponding points for boundary points and interior points of each rectangle; we must evaluate every correlation coefficient for corresponding points of each rectangle. Therefore, we use Steps 1 to 5 and the differential rule of two variables to evaluate the maximal and the minimal values of the correlation coefficient.

Based on Example 3, three point coordinates, F 1 ( 2 s , 4 + 2 t ) , F 2 ( 4 + 4 s , 8 + 2 t ) , and F 3 ( 10 + 2 s , 10 t ) , are obtained. We then find the sample means x ¯ = 14 + 8 s / 3 and y ¯ = 12 + 14 t / 3 , respectively. Therefore, the correlation coefficient function between x and y is (2) r x y = i = 1 n x i - x ¯ y i - y ¯ i = 1 n x i - x ¯ 2 i = 1 n y i - y ¯ 2 = f s , t = - 9 + 3 s + 16 - 2 s t 2 19 - s + s 2 3 - 6 t + 4 t 2 , for the closed region bounded by Ω .

The first-order derivatives for s and t are, respectively, (3) f s s , t = 105 - 60 t + 15 s - 30 s t 4 3 - 6 t + 4 t 2 · 19 - s + s 2 3 / 2 , f t s , t = 42 + 6 s - 24 t - 12 s t 4 19 - s + s 2 · 3 - 6 t + 4 t 2 3 / 2 .

Let f s ( s , t ) = f t ( s , t ) = 0 ; it follows that ( s + 2 ) ( 2 t - 1 ) = 5 . There is no critical point for the equation ( s + 2 ) ( 2 t - 1 ) = 5 bounded by Ω . The reason is as follows: If 0 t 1 , then we obtain - 7 s - 1 / 3 , and if 0 s 1 , then we obtain 4 / 3 t 7 / 4 . Hence, their critical points on the equation ( s + 2 ) ( 2 t - 1 ) = 5 do not belong to the set Ω .

The boundary of the region consists of the lines s = 0 , s = 1 , t = 0 , and t = 1 . Consideration of extrema on the boundary of the region along s = 0 leads to the function f ( 0 , t ) = - 9 + 16 t / 2 19 3 - 6 t + 4 t 2 , 0 t 1 . There is no critical point when setting the derivative with respect to t equal to zero. The endpoints are ( 0,0 ) and ( 0,1 ) . Consideration of extrema on the boundary of the region along s = 1 leads to the function f ( 1 , t ) = - 6 + 14 t / 2 19 3 - 6 t + 4 t 2 , 0 t 1 . There is no critical point when setting the derivative with respect to t equal to zero. The endpoints are ( 1,0 ) and ( 1,1 ) . Consideration of extrema on the boundary of the region along t = 0 leads to the function f ( s , 0 ) = - 9 + 3 s / 2 3 19 - s + s 2 , 0 s 1 . There is no critical point when setting the derivative with respect to s equal to zero. The endpoints are ( 0,0 ) and ( 1,0 ) . Consideration of extrema on the boundary of the region along t = 1 leads to the function f ( s , 1 ) = 7 + s / 2 19 - s + s 2 , 0 s 1 . There is no critical point when setting the derivative with respect to s equal to zero. The endpoints are ( 0,1 ) and ( 1,1 ) . All candidates for the maximum and minimum values are listed in Table 1. We see that the minimum value is f ( 0,0 ) - 0.596 ; the maximum value is f ( 1,1 ) 0.918 . Therefore, the fuzzy interval number of the correlation coefficient is [ - 0.596,0.918 ] .

( s , t ) f ( s , t )
( 0,0 ) - 0.596
( 0,1 ) 0.803
( 1,0 ) - 0.397
( 1,1 ) 0.918

The graph of Example 3 for the centroid of each rectangle.

The graph of Example 3 for the upper-right point coordinate of each rectangle.

The graph of Example 3 for the lower-right point coordinate of each rectangle.

The graph of Example 3 for the lower-left point coordinate of each rectangle.

The graph of Example 3 for the upper-left point coordinate of each rectangle.

The graph of Example 3 for the coordinate ( s , t ) = ( 1 / 3 , 3 / 4 ) of interior corresponding point of each rectangle.

There are different coordinates ( s , t ) in rectangles A , B , and C .

Theorem 4.

If f is continuous on a closed, bounded region, then f has a maximum value and a minimum value on the region. These extrema occur either (1) where all first partial derivatives of f are zero, (2) where some first partial derivative of f does not exist, or (3) on the boundary of the region.

Proof.

See .

3. Case Studies

In this section, we discuss some cases of fuzzy correlation coefficients. First, we analyze a case in which maximal value 1 or minimal values −1 of the correlation coefficient occur for the closed region bounded by Ω . Second, we analyze a case in which maximal value 1 or minimal values −1 of the correlation coefficient do not occur for the closed region bounded by Ω .

Case 1.

Maximal value 1 or minimal values −1 of the correlation coefficient occur for the closed region bounded by Ω .

Example 5.

Consider the rectangle sample data x 1 y 1 = [ 0,2 ] [ 0,2 ] , x 2 y 2 = [ 2,6 ] [ 2,6 ] , and x 3 y 3 = [ 6,8 ] [ 6,8 ] , as shown in Figure 12.

Based on the previous discussion, three point coordinates, F 1 ( 2 s , 2 t ) , F 2 ( 2 + 4 s , 2 + 4 t ) , and F 3 ( 6 + 2 s , 6 + 2 t ) , are obtained. We then find the sample means x ¯ = 8 + 8 s / 3 and y ¯ = 8 + 8 t / 3 , respectively. Therefore, the correlation coefficient function between x and y is (4) r x y = i = 1 n x i - x ¯ y i - y ¯ i = 1 n x i - x ¯ 2 i = 1 n y i - y ¯ 2 = f s , t = 14 - s + - 1 + 2 s t 2 7 - s + s 2 7 - t + t 2 , for the closed region bounded by Ω .

The first-order derivatives for s and t are, respectively, (5) f s s , t = 27 t - 27 s 2 7 - t + t 2 · 7 - s + s 2 3 / 2 , f t s , t = 27 s - 27 t 2 7 - s + s 2 · 7 - t + t 2 3 / 2 .

Let f s ( s , t ) = f t ( s , t ) = 0 ; it follows that s = t . Infinitely many critical points are found for the equation s = t bounded by Ω . For example, there are two points, ( 0,0 ) or ( 1,1 ) .

The boundary of the region consists of the lines s = 0 , s = 1 , t = 0 , and t = 1 . Consideration of extrema on the boundary of the region along s = 0 leads to the function f ( 0 , t ) = 14 - t / 2 7 7 - t + t 2 , 0 t 1 . Setting the derivative with respect to t equal to zero gives the point ( 0,0 ) . The endpoints are ( 0,0 ) and ( 0,1 ) . Consideration of extrema on the boundary of the region along s = 1 leads to the function f ( 1 , t ) = 13 + t / 2 7 7 - t + t 2 , 0 t 1 . Setting the derivative with respect to t equal to zero gives the point ( 1,1 ) . The endpoints are ( 1,0 ) and ( 1,1 ) . Consideration of extrema on the boundary of the region along t = 0 leads to the function f ( s , 0 ) = 14 - s / 2 7 7 - s + s 2 , 0 s 1 . Setting the derivative with respect to t equal to zero gives the point ( 0,0 ) . The endpoints are ( 0,0 ) and ( 1,0 ) . Consideration of extrema on the boundary of the region along t = 1 leads to the function f ( s , 1 ) = 13 + s / 2 7 7 - s + s 2 , 0 s 1 . Setting the derivative with respect to t equal to zero gives the point ( 1,1 ) . The endpoints are ( 0,1 ) and ( 1,1 ) . All candidates for the maximum and minimum values are listed in Table 2. We see that the minimum value is f ( 0,1 ) = 0.929 ; the maximum value is f ( 1,1 ) = 1 . Therefore, the fuzzy interval number of the correlation coefficient is [ 0.929,1 ] .

In this case, the center points of these three rectangles are positively correlated, and r x y = 1 . Moreover, these three rectangles are approximately symmetric to the straight line y = x . Hence, the tendency of positive correlation of these three rectangles is high. In other words, the fuzzy correlation coefficient r x y may have a smaller range.

( s , t ) f ( s , t )
( 0,0 ) 1
( 0,1 ) 0.929
( 1,0 ) 0.929
( 1,1 ) 1

The graph of Example 5.

Example 6.

Consider the rectangle sample data x 1 y 1 = [ 1,3 ] [ 1,2 ] , x 2 y 2 = [ 3,5 ] [ 0,3 ] , and x 3 y 3 = [ 3,9 ] [ 3,4 ] , as shown in Figure 13.

Based on the previous discussion, three point coordinates, F 1 ( 1 + 2 s , 1 + t ) , F 2 ( 3 + 2 s , 3 t ) , and F 3 ( 3 + 6 s , 3 + t ) , are obtained. We then find the sample means x ¯ = 7 + 10 s / 3 and y ¯ = 4 + 5 t / 3 , respectively. Therefore, the correlation coefficient function between x and y is (6) r x y = f s , t = 1 + 10 s + 2 - 4 s t 2 1 + 2 s + 4 s 2 7 - 8 t + 4 t 2 , for the closed region bounded by Ω .

The first-order derivatives for s and t are, respectively, (7) f s s , t = 9 - 6 t + 6 - 12 t s 2 7 - 8 t + 4 t 2 · 1 + 2 s + 4 s 2 3 / 2 , f t s , t = 9 + 6 s + - 6 - 12 s t 1 + 2 s + 4 s 2 · 7 - 8 t + 4 t 2 3 / 2 .

Let f s ( s , t ) = f t ( s , t ) = 0 ; it follows that ( 2 t - 1 ) ( 2 s + 1 ) = 2 . Infinitely many critical points can be found for equation ( 2 t - 1 ) ( 2 s + 1 ) = 2 bounded by Ω . For example, there are two points, ( 1 / 2 , 1 ) or ( 1 , 5 / 6 ) .

The boundary of the region consists of the lines s = 0 , s = 1 , t = 0 , and t = 1 . Consideration of extrema on the boundary of the region along s = 0 leads to the function f ( 0 , t ) = 1 + 2 t / 2 7 - 8 t + 4 t 2 , 0 t 1 . There is no critical point when setting the derivative with respect to t equal to zero. The endpoints are ( 0,0 ) and ( 0,1 ) . Consideration of extrema on the boundary of the region along s = 1 leads to the function f ( 1 , t ) = 11 - 2 t / 2 7 7 - 8 t + 4 t 2 , 0 t 1 . Setting the derivative with respect to t equal to zero gives the point ( 1 , 5 / 6 ) . The endpoints are ( 1,0 ) and ( 1,1 ) . Consideration of extrema on the boundary of the region along t = 0 leads to the function f ( s , 0 ) = 1 + 10 s / 2 7 1 + 2 s + 4 s 2 , 0 s 1 . There is no critical point when setting the derivative with respect to s equal to zero. The endpoints are ( 0,0 ) and ( 1,0 ) . Consideration of extrema on the boundary of the region along t = 1 leads to the function f ( s , 1 ) = 3 + 6 s / 2 3 1 + 2 s + 4 s 2 , 0 s 1 . Setting the derivative with respect to s equal to zero gives the point ( 1 / 2 , 1 ) . The endpoints are ( 0,1 ) and ( 1,1 ) . All candidates for the maximum and minimum values are listed in Table 3. We see that the minimum value is f ( 0,0 ) = 0.189 ; the maximum value is f ( 1,1 ) = 5 / 6 . Therefore, the fuzzy interval number of the correlation coefficient is [ 0.189,1 ] .

In this case, the center points of these three rectangles are positively correlated, but 0 < r x y < 1 . Moreover, these three rectangles are not symmetric to any straight lines. Hence, the tendency of positive correlation of these three rectangles is not evident. In other words, the fuzzy correlation coefficient r x y may be a large range.

( s , t ) f ( s , t )
( 1 , 5 / 6 ) 1
( 1 / 2 , 1 ) 1
( 0,0 ) 0.189
( 0,1 ) 0.866
( 1,0 ) 0.786
( 1,1 ) 0.982

The graph of Example 6.

Case 2.

Maximal value 1 or minimal values −1 of the correlation coefficient do not occur for the closed region bounded by Ω .

Example 7.

Consider the rectangle sample data x 1 y 1 = [ 0,2 ] [ 0,2 ] , x 2 y 2 = [ 2,6 ] [ 2,4 ] , and x 3 y 3 = [ 7,9 ] [ 0,4 ] , as shown in Figure 14.

Based on the previous discussion, three point coordinates, F 1 ( 2 s , 2 t ) , F 2 ( 2 + 4 s , 2 + 2 t ) , and F 3 ( 7 + 2 s , 4 t ) , are obtained. We then find the sample means x ¯ = 9 + 8 s / 3 and y ¯ = 2 + 8 t / 3 , respectively. Therefore, the correlation coefficient function between x and y is (8) r x y = f s , t = - 3 + 4 s + 12 - 2 s t 2 39 - 6 s + 4 s 2 1 - t + t 2 , for the closed region bounded by Ω .

The first-order derivatives for s and t are, respectively, (9) f s s , t = 147 - 42 t + - 42 t s 2 1 - t + t 2 · 39 - 6 s + 4 s 2 3 / 2 , f t s , t = 21 + - 6 - 6 s t 4 39 - 6 s + 4 s 2 · 1 - t + t 2 3 / 2 .

Let f s ( s , t ) = f t ( s , t ) = 0 ; it follows that t ( 2 s + 2 ) = 7 . There is no critical point for equation t ( 2 s + 2 ) = 7 bounded by Ω . The reason is as follows: If 0 t 1 , then we obtain s 5 / 2 , and if 0 s 1 , then we obtain 7 / 4 t 7 / 2 . Hence, the critical points for equation t ( 2 s + 2 ) = 7 do not belong to the set Ω .

The boundary of the region consists of the lines s = 0 , s = 1 , t = 0 , and t = 1 . Consideration of extrema on the boundary of the region along s = 0 leads to the function f ( 0 , t ) = - 3 + 12 t / 2 39 1 - t + t 2 , 0 t 1 . There is no critical point when setting the derivative with respect to t equal to zero. The endpoints are ( 0,0 ) and ( 0,1 ) . Consideration of extrema on the boundary of the region along s = 1 leads to the function f ( 1 , t ) = 1 + 10 t / 2 37 1 - t + t 2 , 0 t 1 . There is no critical point when setting the derivative with respect to t equal to zero. The endpoints are ( 1,0 ) and ( 1,1 ) . Consideration of extrema on the boundary of the region along t = 0 leads to the function f ( s , 0 ) = - 3 + 4 s / 2 39 - 6 s + 4 s 2 , 0 s 1 . There is no critical point when setting the derivative with respect to s equal to zero. The endpoints are ( 0,0 ) and ( 1,0 ) . Consideration of extrema on the boundary of the region along t = 1 leads to the function f ( s , 1 ) = 9 + 2 s / 2 39 - 6 s + 4 s 2 , 0 s 1 . There is no critical point when setting the derivative with respect to s equal to zero. The endpoints are ( 0,1 ) and ( 1,1 ) . All candidates for the maximum and minimum values are listed in Table 4. We see that the minimum value is f ( 0,0 ) - 0.240 ; the maximum value is f ( 1,1 ) 0.904 . Therefore, the fuzzy interval number of the correlation coefficient is [ - 0.240,0.904 ] .

Based on the scatter plots of Examples 6 and 7, we intuitively think that the scatter plot of Example 7 is more dispersed than the scatter plot of Example 6. Therefore, the fuzzy correlation coefficient r x y of Example 7 will have a larger range.

( s , t ) f ( s , t )
( 0,0 ) - 0.240
( 0,1 ) 0.721
( 1,0 ) 0.082
( 1,1 ) 0.904

The graph of Example 7.

Example 8.

Consider the rectangle sample data x 1 y 1 = [ 0,2 ] [ 0,2 ] , x 2 y 2 = [ 2,6 ] [ 2,4 ] , x 3 y 3 = [ 7,9 ] [ 0,4 ] , and x 4 y 4 = [ 9,13 ] [ 4,6 ] , as shown in Figure 15.

Based on the previous discussion, four point coordinates, F 1 ( 2 s , 2 t ) , F 2 ( 2 + 4 s , 2 + 2 t ) , F 3 ( 7 + 2 s , 4 t ) , and F 4 ( 9 + 4 s , 4 + 2 t ) , are obtained. We then find the sample means x ¯ = 9 + 6 s / 2 and y ¯ = 3 + 5 t / 2 , respectively. Therefore, the correlation coefficient function between x and y is (10) r x y = f s , t = 13 + 6 s + 5 - 2 s t 53 + 8 s + 4 s 2 11 - 6 t + 3 t 2 , for the closed region bounded by Ω .

The first-order derivatives for s and t are, respectively, (11) f s s , t = 266 - 126 t + - 28 - 28 t s 11 - 6 t + 3 t 2 · 53 + 8 s + 4 s 2 3 / 2 , f t s , t = 94 - 4 s + - 54 - 12 s t 53 + 8 s + 4 s 2 · 11 - 6 t + 3 t 2 3 / 2 .

First, let f s ( s , t ) = f t ( s , t ) = 0 ; it follows that t = 47 - 2 s / 27 + 6 s = 133 - 14 s / 63 + 14 s . We obtain ( s , t ) = ( 5 / 2 , 1 ) . This critical point for equation t = 47 - 2 s / 27 + 6 s = 133 - 14 s / 63 + 14 s does not belong to the set Ω . Therefore, no local maximum or minimum is in Ω .

Second, the boundary of the region consists of the lines s = 0 , s = 1 , t = 0 , and t = 1 . Consideration of extrema on the boundary of the region along s = 0 leads to the function f ( 0 , t ) = 13 + 5 t / 53 11 - 6 t + 3 t 2 , 0 t 1 . There is no critical point when setting the derivative with respect to t equal to zero. The endpoints are ( 0,0 ) and ( 0,1 ) . Consideration of extrema on the boundary of the region along s = 1 leads to the function f ( 1 , t ) = 19 + 3 t / 65 11 - 6 t + 3 t 2 , 0 t 1 . There is no critical point when setting the derivative with respect to t equal to zero. The endpoints are ( 1,0 ) and ( 1,1 ) . Consideration of extrema on the boundary of the region along t = 0 leads to the function f ( s , 0 ) = 13 + 6 s / 11 53 + 8 s + 4 s 2 , 0 s 1 . There is no critical point when setting the derivative with respect to s equal to zero. The endpoints are ( 0,0 ) and ( 1,0 ) . Consideration of extrema on the boundary of the region along t = 1 leads to the function f ( s , 1 ) = 18 + 4 s / 2 2 53 + 8 s + 4 s 2 , 0 s 1 . There is no critical point when setting the derivative with respect to s equal to zero. The endpoints are ( 0,1 ) and ( 1,1 ) . All candidates for the maximum and minimum values are listed in Table 5. We see that the minimum value is f ( 0,0 ) 0.538 ; the maximum value is f ( 1,1 ) 0.965 . Therefore, the fuzzy interval number of the correlation coefficient is [ 0.538,0.965 ] .

Comparing the scatter plots of Examples 7 and 8, we intuitively think that the scatter plot of Example 8 is more concentrated and has a greater tendency of positive correlation than that of the scatter plot of Example 7. Moreover, when fuzzy interval data of the scatter plot increase, the fuzzy correlation coefficient will have a smaller range.

Lin et al.  proposed the formula of the fuzzy correlation coefficient r x y as the following four situations: (12) r x y = r c , m i n 1 , r c + δ i f r c 0 , r l 0 r c - δ , r c i f r c 0 , r l < 0 r c , r c + δ i f r c < 0 , r l 0 m a x - 1 , r c - δ , r c i f r c < 0 , r l < 0 , where r c is the correlation coefficient of the center point of each rectangle, r l is the correlation coefficient of the interval lengths l x i and l y i of each of the fuzzy interval numbers x i and y i , and δ = 1 - l n ( 1 + r l ) / r l .

Next, the four scatter plots are observed as follows.

Intuitively, the degrees of spread of the four scatter plots (refer to Figures 16, 17, 18, and 19) do not seem to be the same. Hence, the fuzzy correlation coefficient should not be equal. But the formula (12) of Wu et al.  shows that the four fuzzy correlation coefficients are equal, and r x y = [ 1,1 ] .

However, our proposed method obtains different results. The four fuzzy correlation coefficients (refer to Figures 16, 17, 18, and 19) obtained through our proposed method are 1,1 , 0.976,1 , [ 0.922,1 ] , and [ 0.857,1 ] , respectively. Therefore, our proposed method produces results that are more consistent with our intuition.

( s , t ) f ( s , t )
( 0,0 ) 0.538
( 0,1 ) 0.874
( 1,0 ) 0.711
( 1,1 ) 0.965

The graph of Example 8.

Scatter plot 1.

Scatter plot 2.

Scatter plot 3.

Scatter plot 4.

4. Empirical Studies

In this section, we discuss some applications of fuzzy correlation coefficients. First, we analyze a case in which two data sets are fuzzy interval numbers. Second, we change the case to one in which one data set is a fuzzy interval number, and the other is a real number. Finally, we analyze a case in which both data sets are real numbers.

To understand the factors influencing mathematics achievement at a school, we investigate 10 students’ data.

Example 9.

Consider the rectangle sample data for 10 students: x 1 y 1 = [ 80,90 ] [ 1,1.5 ] , x 2 y 2 = [ 80,90 ] [ 6.5,7 ] , x 3 y 3 = [ 60,80 ] [ 4,5 ] , x 4 y 4 = [ 90,100 ] [ 1.5,2.5 ] , x 5 y 5 = [ 40,70 ] [ 16,17 ] , x 6 y 6 = [ 70,80 ] [ 15,16 ] , x 7 y 7 = [ 60,80 ] [ 15,17 ] , x 8 y 8 = [ 80,100 ] [ 1,3 ] , x 9 y 9 = [ 80,90 ] [ 0,3 ] , and x 10 y 10 = [ 80,90 ] [ 4.5,5.5 ] , where x i and y i denote the mathematics score and weekly online time, respectively, of a student i , i = 1,2 , , 10 , as shown in Figure 20.

Based on the previous discussion, the correlation coefficient function between x and y is (13) r x y = f s , t = - 639 + 197.5 s + 4 + 5 s t 10 196 - 160 s + 45 s 2 372.7 - 14.2 t + 5.54 t 2 , for the closed region bounded by Ω .

The first-order derivatives for s and t are, respectively, (14) f s s , t = - 12410 + 1300 t + 12955 - 580 t s 10 372.7 - 14.2 t + 5.54 t 2 · 196 - 160 s + 45 s 2 3 / 2 , f t s , t = - 3046.1 + 3265.75 s + 3511.66 - 1129.65 s t 10 196 - 160 s + 45 s 2 · 372.7 - 14.2 t + 5.54 t 2 3 / 2 .

First, let f s ( s , t ) = f t ( s , t ) = 0 ; it follows that ( s , t ) ( 3.239,51.066 ) Ω or ( 3.239,51.066 ) Ω . Therefore, no local maximum or minimum is in Ω .

Second, based on the previous discussion, the fuzzy interval number of the correlation coefficient is [ - 0.804 , - 0.748 ] .

Clearly, x and y have a highly negative correlation. In other words, a higher mathematics score correlates with a lower weekly online time. The weekly online time of a student negatively influences the student’s mathematics score.

The graph of Example 9.

Example 10.

If Example 9 is adjusted to x 1 y 1 = [ 85,85 ] [ 1,1.5 ] , x 2 y 2 = [ 85,85 ] [ 6.5,7 ] , x 3 y 3 = [ 70,70 ] [ 4,5 ] , x 4 y 4 = [ 95,95 ] [ 1.5,2.5 ] , x 5 y 5 = [ 55,55 ] [ 16,17 ] , x 6 y 6 = [ 75,75 ] [ 15,16 ] , x 7 y 7 = [ 70,70 ] [ 15,17 ] , x 8 y 8 = [ 90,90 ] [ 1,3 ] , x 9 y 9 = [ 85,85 ] [ 0,3 ] , and x 10 y 10 = [ 85,85 ] [ 4.5,5.5 ] (i.e., x i is a real number, i = 1,2 , , 10 ), then the correlation coefficient function between x and y is (15) r x y = f s = 0.5 , t = - 540.25 + 6.5 t 11.28 10 372.7 - 14.2 t + 5.54 t 2 , bounded by the set { t 0 t 1 } .

The first-order derivative with respect to t is (16) f t s = 0.5 , t = - 1413.23 + 2946.84 t 11.28 10 · 372.7 - 14.2 t + 5.54 t 2 3 / 2 .

Let f t ( s = 0.5 , t ) = 0 ; it follows that t 0.48 , which is a critical point bounded by the set { t 0 t 1 } . Based on the previous discussion, the fuzzy interval number of the correlation coefficient is [ - 0.786 , - 0.784 ] .

Example 11.

If Example 9 is adjusted to x 1 y 1 = [ 85,85 ] [ 1.25,1.25 ] , x 2 y 2 = [ 85,85 ] [ 6.75,6.75 ] , x 3 y 3 = [ 70,70 ] [ 4.5,4.5 ] , x 4 y 4 = [ 95,95 ] [ 2,2 ] , x 5 y 5 = [ 55,55 ] [ 16.5,16.5 ] , x 6 y 6 = [ 75,75 ] [ 15.5,15.5 ] , x 7 y 7 = [ 70,70 ] [ 16,16 ] , x 8 y 8 = [ 90,90 ] [ 2,2 ] , x 9 y 9 = [ 85,85 ] [ 1.5,1.5 ] , and x 10 y 10 = [ 85,85 ] [ 5,5 ] (i.e., both x i and y i are real numbers, i = 1,2 , , 10 ), then the correlation coefficient between x and y is r x y = f ( s = 0.5 , t = 0.5 ) = - 0.786 .

According to the results of Examples 9, 10, and 11, we find that Example 9 is the generalized situation of Examples 10 and 11.

Example 12.

Consider the rectangle sample data of 10 students: x 1 y 1 = [ 80,90 ] [ 8,8.5 ] , x 2 y 2 = [ 80,90 ] [ 7,7.5 ] , x 3 y 3 = [ 60,80 ] [ 9,10.5 ] , x 4 y 4 = [ 90,100 ] [ 8,8.5 ] , x 5 y 5 = [ 40,70 ] [ 6,7.5 ] , x 6 y 6 = [ 70,80 ] [ 10,11 ] , x 7 y 7 = [ 60,80 ] [ 7,8 ] , x 8 y 8 = [ 80,100 ] [ 8,10 ] , x 9 y 9 = [ 80,90 ] [ 6.5,8 ] , and x 10 y 10 = [ 80,90 ] [ 7.5,8.5 ] , where x i and y i denote the mathematics score and weekly sleeping time, respectively, of a student i , i = 1,2 , , 10 , as shown in Figure 21.

Based on the previous discussion, the correlation coefficient function between x and y is (17) r x y = f s , t = 36 - 25 s + - 27 + 20 s t 10 196 - 160 s + 45 s 2 12.6 - 0.9 t + 2.4 t 2 , for the closed region bounded by Ω .

The first-order derivatives for s and t are, respectively, (18) f s s , t = - 2020 + 1760 t + 380 - 385 t s 10 12.6 + 0.9 t + 2.4 t 2 · 196 - 160 s + 45 s 2 3 / 2 , f t s , t = - 648 + 481.5 s + - 148.5 + 102 s t 2 10 196 - 160 s + 45 s 2 · 12.6 - 0.9 t + 2.4 t 2 3 / 2 .

First, let f s ( s , t ) = f t ( s , t ) = 0 ; it follows that ( s , t ) ( 4.697 , - 4.881 ) Ω or ( 1.368,1.216 ) Ω . Therefore, no local maximum or minimum is in Ω .

Second, based on the previous discussion, the fuzzy interval number of the correlation coefficient is [ 0.037,0.229 ] .

Based on the previous discussion, there is a minor positive correlation between mathematics score and weekly sleeping time. Therefore, a higher mathematics score correlates with a lower weekly sleeping time. The influence of weekly sleeping time on mathematics score is minor.

The graph of Example 12.

Example 13.

Consider the rectangle sample data of 10 students: x 1 y 1 = [ 80,90 ] [ 70,80 ] , x 2 y 2 = [ 80,90 ] [ 80,90 ] , x 3 y 3 = [ 60,80 ] [ 70,80 ] , x 4 y 4 = [ 90,100 ] [ 80,90 ] , x 5 y 5 = [ 40,70 ] [ 60,70 ] , x 6 y 6 = [ 70,80 ] [ 60,80 ] , x 7 y 7 = [ 60,80 ] [ 70,80 ] , x 8 y 8 = [ 80,100 ] [ 80,90 ] , x 9 y 9 = [ 80,90 ] [ 60,70 ] , and x 10 y 10 = [ 80,90 ] [ 70,80 ] , where x i and y i denote the mathematics and Chinese scores, respectively, of a student i , i = 1,2 , , 10 , as shown in Figure 22.

Based on the previous discussion, the correlation coefficient function between x and y is (19) r x y = f s , t = 60 - 10 s + - 2 - 5 s t 196 - 160 s + 45 s 2 60 - 20 t + 9 t 2 , for the closed region bounded by Ω .

The first-order derivatives with for s and t are, respectively, (20) f s s , t = 2840 - 1140 t + - 1900 + 490 t s 60 - 20 t + 9 t 2 · 196 - 160 s + 45 s 2 3 / 2 , f t s , t = 480 - 400 s + - 520 + 140 s t 196 - 160 s + 45 s 2 · 60 - 20 t + 9 t 2 3 / 2 .

First, let f s ( s , t ) = f t ( s , t ) = 0 ; it follows that ( s , t ) ( 8.324,4.416 ) Ω or ( 1.596 , - 0.534 ) Ω . Therefore, no local maximum or minimum is in Ω .

Second, based on the previous discussion, the fuzzy interval number of the correlation coefficient is [ 0.553,0.717 ] .

Based on the previous discussion, x and y have a highly positive correlation. Therefore, a higher mathematics score correlates with a higher Chinese score. Students’ Chinese scores positively influence their mathematics scores.

The graph of Example 13.

5. Conclusion

Scientists are accustomed to using binary logic to analyze information. Human logic is fuzzy and complex, and applying binary logic to analyze human thought processes causes some distortion. Fuzzy logic is based on human thought processes, and fuzzy logic has therefore been increasing applied to social science.

Possible methods of calculating fuzzy correlation coefficients are proposed in the literature, but understanding most formulas used in the literature requires a strong mathematical background. In this paper, we use Pearson’s correlation coefficient and the differentiation method to evaluate fuzzy correlation coefficients, which can be applied to cases in which two data sets are fuzzy interval numbers, one of two data sets is a fuzzy interval number and the other is a real number, and both data sets are real numbers.

This paper discusses only fuzzy correlation coefficients of fuzzy interval number. However, we will extend the research method that we used to triangular or trapezoidal fuzzy numbers in the future.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work of Berlin Wu is partially supported by the National Science Council of the Republic of China under Contract 102–2410-H-004-182.

Wu B. Introduction of Fuzzy Statistics 2005 Taipei, Taiwan Wu-Naan Book Nguyen H. T. Kreinovich V. Wu B. Xiang G. Computing Statistics under Interval and Fuzzy Uncertainty 2011 Heidelberg, Germany Springer 10.1007/978-3-642-24905-1 MR3026509 2-s2.0-82655185552 Nguyen H. Wu B. Fundamentals of Statistics with Fuzzy Data 2006 Heidelberg, Germany Springer Hong D. H. Hwang S. Y. Correlation of intuitionistic fuzzy sets in probability spaces Fuzzy Sets and Systems 1995 75 1 77 81 10.1016/0165-0114(94)00330-a MR1351593 2-s2.0-0040136579 Zadeh L. A. Fuzzy sets as a basis for a theory of possibility Fuzzy Sets and Systems 1978 1 1 3 28 10.1016/0165-0114(78)90029-5 2-s2.0-49349133217 Chiang D.-A. Lin N. P. Correlation of fuzzy sets Fuzzy Sets and Systems 1999 102 2 221 226 10.1016/s0165-0114(97)00127-9 MR1674935 Chaudhuri B. B. Bhattacharya A. On correlation between two fuzzy sets Fuzzy Sets and Systems 2001 118 3 447 456 10.1016/s0165-0114(98)00347-9 MR1809392 2-s2.0-0012168971 Hong D. Fuzzy measures for a correlation coefficient of fuzzy numbers under TW (the weakest t-norm)-based fuzzy arithmetic operations Information Sciences 2006 176 2 150 160 10.1016/j.ins.2004.11.005 Ni Y. Cheung J. Y. Correlation coefficient estimate for fuzzy data Intelligent Systems Design and Applications 2003 23 Berlin, Germany Springer 2138 2144 Advances in Soft Computing Liu S.-T. Kao C. Fuzzy measures for correlation coefficient of fuzzy numbers Fuzzy Sets and Systems 2002 128 2 267 275 10.1016/s0165-0114(01)00199-3 MR1908432 2-s2.0-0036603572 Xie M. C. Wu B. The relationship between high schools students time management and academic performance: an application of fuzzy correlation Educational Policy Forum 2012 15 1 157 176 Yang C. C. Correlation coefficient evaluation for the fuzzy interval data Journal of Business Research 2016 69 6 2138 2144 10.1016/j.jbusres.2015.12.021 Saneifard R. Saneifard R. Correlation coefficient between fuzzy numbers based on central interval Journal of Fuzzy Set Valued Analysis 2012 2012 9 jfsva-00108 10.5899/2012/jfsva-00108 Cheng Y. T. Yang C. C. The application of fuzzy correlation coefficient with fuzzy interval data International Journal of Innovative Management, Information & Production 2013 4 1 65 71 Hanafy I. M. Salama A. A. Mahfouz K. M. Correlation coefficients of neutrosophic sets by centroid method International Journal of Probability and Statistics 2013 2 1 9 12 10.5923/j.ijps.20130201.02 Wu B. Lai W. Wu C. L. Tienliu T. K. Correlation with fuzzy data and its applications in the 12-year compulsory education in Taiwan Communications in Statistics—Simulation and Computation 2016 45 4 1337 1354 10.1080/03610918.2013.827712 Lin H. Wang C. Chen J. C. Wu B. New statistical analysis in marketing research with fuzzy data Journal of Business Research 2016 69 6 2176 2181 10.1016/j.jbusres.2015.12.026 Hunt R. A. Calculus with Analytic Geometry 1998 New York, NY, USA Harper & Row