ON THE SIGNIFICANCE LEVEL OF THE MULTIRELATION COEFFICIENT

The concept of the multirelation coe cient is de ned to describe the closeness of a set of variables to a linear relation This concept extends the linear correlation between two variables to two or more variables Parameters of a beta distribution are determined that are utilized to approximate signi cance levels of the multirelation coe cient for any given number of observations and variables A generalized Student t distribution is de ned This distribution which is termed the multirelated t distribution reduces to the Student t distribution for two variables It is useful in the determination of the signi cance level of the multirelation coe cient


Introduction
The concept of the multirelation coe cient is described in Drezner (1995).It gives a measure of closeness of a set of variables to a linear relationship.In order to be able to compare the signi cance level of two m ultirelation coe cients with di erent dimensionality, the signi cance level of the multirelation coe cient i s n e e d e d .We evaluate the distribution of the multirelation coe cient and its fractiles for a nite number of observations.There might b e w ays to evaluate the limit of the distribution as the numb e r o f o b s e r v ations increases to in nity.Such a result might h a ve some practical interest but is not investigated here.
First, the concept of multirelation coe cient i s i n troduced and then some of its properties are outlined (for a detailed discussion see Drezner (1995)).Let (R) b e the least eigenvalue of the correlation matrix R between a given set of k variables.
The multirelation coe cient is a measure of the linear relation among all the Y i for i = 1 : : : k .
The following properties are proven in Drezner (1995) and help explain the role and the properties of the multirelation coe cient.Property 3 r(Y 1 : : : Y k ) = 0 i r ij = 0 for every 1 i < j k.Property 4 r(Y 1 : : : Y k ) = 1 i some vector is a linear combination of the other vectors.(Or, in other words, the vectors are linearly dependent).

On the Distribution of the Multirelation Coe cient
In order to be able to compare multirelation coe cients with di erent n umber of variables and observations the fractiles of the multirelation coe cient are helpful.In this section we approximately calculate these fractiles for a given number of observations and variables.
First, some properties of the eigenvalues of the correlation matrix are found.Let, for a given correlation matrix R = fr ij g, The eigenvalues of R are 1 : : : k .By matrix theory: ; P i6 =j i j = k 2 ;k(k;1)(1;r 2 ) = k+k(k;1)r 2 .
Therefore, we h a ve the identities: Assume that the elements of k vectors Y i for i = 1 : : : k of length n each h a ve a given random distribution.By (6) the eigenvalues are a sample from a multivariate distribution .The correlation matrix of the sample has o diagonal elements equal to ; 1 k;1 .This correlation matrix is singular.By (3) the means of the multivariate distribution are all equal 1.It is known, (Kendall, 1980), that E(r 2 ij ) = 1 n;1 for any distribution of independent Y 0 i s.Therefore, E(r 2 ) = 1 n;1 .By (4) the variances of are all equal to k;1 n;1 .In trying to determine the type of the multivariate distribution , when the Y 0 i s are drawn from i.i.d.normal distributions, we rst checked whether can be approximated by a m ultivariate normal distribution (see the Appendix for computational details).We observed that the multivariate normal distribution is not a goodapproximation for the distribution of the eigenvalues and thus cannot be used to accurately derive the distribution of the multirelation coe cient.In order to nd a better approximation for we plotted the simulation results.In Figure 1 we present the distribution of the least eigenvalue that was obtained by calculating the eigenvalues of a correlation matrix of 5 by 5 generated by randomly generated vectors of 100 elements each.The gure shows the frequency of eigenvalues in segments of size 0.01 (i.e., between 0 and 0.01, 0.01 and 0.02 and so on) based on 100,000 correlation matrices.This distribution is not a normal distribution, nor is it symmetric.A discussion of the distribution of the eigenvalues of the correlation matrix can be found in Kendall and Stuart (1966)  1 .However it does not address our particular issue of the distribution of the smallest eigenvalue.Moreover, since the mean of all eigenvalues is 1 (3), the least eigenvalue cannot exceed 1.The distribution of the least eigenvalue is between 0 and 1.We therefore attempted to estimate the probability density function of the least eigenvalue in order to be able to calculate the signi cance level of the multirelation coe cient.
The case k = 2 can be explicitly solved.For k = 2 t h e m ultirelation coe cient is the absolute value of the correlation coe cient.The correlation coe cient is related to the Student t distribution by the relationship: The Student t distribution is related to the beta distribution by the following formula (Abramowitz and Stegun, 1972): Since = n ; 2, equation ( 7) yields: Comparing (9) to (8)  distributed according to a beta distribution with parameters a = 1 2 and b = n;2 2 .In conclusion, for the case k = 2 the square of the multirelation coe cient is distributed according to a beta distribution.
Examination of many graphs of the distribution of the least eigenvalue led us to conclude that a beta distribution may be used to estimate the multirelation coe cient distribution or its square.In Figure 2 the frequency of the least eigenvalue is compared with the beta distribution with the mean and the variance of the simulated values.The t justi es the exploration of the beta distribution as an approximation to the distribution of the least eigenvalue.However, since r 2 is actually a beta distribution for k = 2 , and the square of the multiple correlation coe cient is also a beta distribution with a = k;1 2 and b = n;k 2 (Stuart andOrd, 1991 Kendall andStuart, 1966), we i n vestigated a possible t of a beta distribution to the distribution of r 2 rather than r.In order to be able to apply the beta distribution for the calculation of the probabilities involving the multirelation coe cient, estimates for the parameters a and b of the distribution are required for given k and n (rather than be estimated by s i m ulation).We calculated these parameters for k = 2 3 : : : 10 and n = 1 0 20 : : : 100 using only pairs k nfor which n > 3k.The simulation was performed as follows.

ON THE SIGNIFICANCE LEVEL OF THE MULTIRELATION COEFFICIENT
For a given k and n, a matrix of size k by n is generated using standard generation techniques (Law and Kelton (1991), Marse and Roberts (1983)).The elements of this matrix are drawn from a standard normal distribution.The correlation matrix is calculated and the multirelation coe cient found.
For each c a s e w e simulated 20 sets of 50,000 matrices each for a total of one million matrices for each result in Table 1.In the table we report the mean and standard error (standard deviation of the 20 sets divided by p 20) f o r a and b calculated for each pair of k and n.A c u r v e tting using multiple regression was performed on these means.Since regression analysis assumes uniform variance for all points, we regressed on a;0:5 k;2 for a, a n d b n;2 ; 0:5 f o r b using only the points for k 3.These

Calculating Approximate Fractiles
Calculating the signi cance level of a certain value of the multirelation coe cient can be done by the following algorithm: 1.A m ultirelation coe cient o f r was obtained for given values of k and n.
2. Estimate the values of a and b using equation ( 9). 3. Estimate the signi cance as 1-I r 2 (a b) where I x (a b) is the incomplete beta distribution (Abramowitz and Stegun, 1972).Calculating the critical value of r for a given signi cance can be done as a binary search on the segment 0,1] using the algorithm.
We tested this procedure on various values of n, k and and compared the critical value of r obtained by the algorithm with a simulation of 10,000 matrices.The comparison is reported in Table 2.The simulated fractiles are given in parentheses next to the calculated fractiles.
We know that for k = 2 the quantity r p n;2 p 1;r 2 is a Student t distribution.We have found that the same quantity i s w ell behaved for the multirelation coe cient fractiles.It can be used to estimate fractiles for values of n which are not reported in Tables 2 and 3. We de ne these values as the Multirelated t F ractiles.In Table 3 w e g i v e the calculated values for these Multirelated t fractiles.

An Example
In Drezner (1995) an example taken from Kendall (1980) was used to demonstrate the concept of the multirelation coe cient.Fifteen traits of applicants were tested for relationships.The traits were:  The data consisted of 48 applicants and the correlation matrix between these traits is given (Drezner, 1995Kendall, 1980).Note that two entries should be corrected in Drezner (1995): r 5 14 = 0:48, r 11 15 = 0:43.A procedure similar to backward step-wise regression was presented in Drezner (1995).Variables which are not associated with the rest were dropped one by one according to a certain rule.The issue of which of the subsets is best remained unresolved.In order to determine which of the subsets is the most signi cant, we need to calculate the signi cance level of the multirelation coe cient for each subset.The present paper provides us with the tools necessary to make s u c h determination.In Table 4 we g i v e the original subsets presented in Drezner (1995) with their calculated multirelation coe cients as well as the signi cance level of each m ultirelation coe cient calculated by the method presented in this paper.Other methods for subset selection may be considered as well, possibly yielding better results.
By examining Table 4 it is clear that the best subset is the subset of 6 variables (Likability, Self-con dence, Lucidity, Honesty, Ambition, Grasp) because it yields the best signi cance level.
Since this particular problem with k = 1 5 has only 2 15 ; 16 = 32 752 possible subsets (excluding subsets of less than two members), it is feasible to calculate the multirelation coe cient for each subset and select the best one.In Table 5 we report the best multirelation coe cient for subsets of 2,3,. . .,15 elements, a list of the members of that subset, and the corresponding signi cance.Note that the signi cance levels are quite small.However, these values should be quite accurate because both the beta distribution and the theoretical multirelation distribution are anchored to zero at both ends of the segment 0,1].Such small values of signi cance cannot be veri ed by simulation.Some of the groups have an improved signi cance level.However, the best group obtained by this analysis is still the same group of 6 v ariables.We conclude that the step-wise backward procedure is e ective.

Appendix
In Johnson and Kotz (1972) there are some simpli ed formulas when all the correlation coe cients are equal to each other (and in our case they are all equal to ; 1 k;1 ).The case when all the 's are positive is relatively simple.De ne (h k ) = P r (x i h, for i = 1 : : : k ) when the correlation coe cient b e t ween X i and X j is equal to for all i 6 = j.k = 1 represents the univariate Normal distribution: (h) = ( h 1 ) f o r a n y .For 0 (Johnson and Kotz 1972): where Z(X) is the standard normal density function.The integral (11) can be calculated using Gaussian quadrature formulas based on Hermite polynomials (Abramowitz and Stegun, 1972).For a negative a recursion formula by k is given:

Property 1 0
r(Y 1 : : : Y k ) 1. Property 2 r(Y 1 : : : Y k;1 ) r(Y 1 : : : Y k ).y Part of this research w as done while the second author was on sabbatical leave at the Hong Kong University of Science and Technology, K o wloon, Hong Kong.

Figure 2 .
Figure 2 .Distribution of the smallest eigenvalue and its beta approximation

Table 1 .
Means and standard errors of the beta parameters

Table 2 .
Fractiles of the multirelation coe cient a n d s i m ulation results

Table 3 .
Fractiles of the Multirelated t distribution

Table 4 .
Signi cance levels for the example problem

Table 5 .
Best signi cance levels for the example problem