TESTING THE STABILITY OF REGRESSION PARAMETERS WHEN SOME ADDITIONAL DATA SETS ARE AVAILABLE

We consider the problem of testing the stability of regression parameters in regression lines of different populations when some additional, but unidentified, data sets from those populations are available. The standard test (To) discards the additional data and tests the stability, of the regression parameters using only the data sets from identified Iopulations. We propose two test procedures (T1 and T2) utilizing all the available data, because the additional data may contain information about the parameters of the regression lines which are tested for stability. A power comparison among the tests is also presented. It is shown that T1 always has larger power than To. In certain situations T2 has the largest power.


INTRODUCTION. Consider the regression model
Yij ai + #i(xij ) + ij, i= 1,2 k, j= 1,2,...,ni, (1.1) where the Yij are observations on the response variable, the xij are observations on the predictor variable, a and #i are the regression parameters, and the ij are the error terms, which are unobserved random variables.It is assumed that the errors are independent, normally distributed random variables with mean 0 and common unknown variance o2.For the model, ai + #i(xij" ) is the regression line of the variable y on the predictor variable x for the th group, a is the y- intercept when x x, and #i is the slope.Suppose we have m (m _< k) additional data sets corresponding to m regression lines whose model is given by Yij =ai +i(xij )+ij, i=k+l k+m, j=1,2 hi. (1.2) We assume that the error terms il in model (1.2) are independent, normally distributed random variables with mean 0 and common unknown variance o2.It is further assumed that the m regression fines in model (1.2) are an unknown subset of the k regression lines in model (1.1).However, we cannot identify the m regression lines associated with the additional data sets (Yii, xij, k + k + m, 1,2 ni).We are interested in testing the null hypothesis Ho: , a2 --... Ok; 1 2=... #k against Ha: either 'i * ai' or i * i' for at least one pair (i,i'), where i, i', i,i'= 1,2 k, utilizing all the available data.The null hypothesis implies that all the k regression lines in model (1.1) are coincident whereas the alternative hypothesis is that at least two of the regression lines are different.The standard test (To) of H 0 against H a using the k data sets (Yij, xij, i= 1,2 k, j= 1,2 ni) is well-known in the literature and has diverse applications.A biostatistician may be interested in testing the equivalence of regression lines for predicting the systolic blood pressure using age as the predictor variable for four social groups.A test for the stability of the regression parameters that generated the data sets is H o" ' 2 3 4; fl #2 #3 #4.If H 0 is true, we use a single regression line based upon the four data sets for predicting systolic blood pressure using age as the predictor variable, Klienbaum and Kupper [1].
An economist might be interested in testing the equivalence of multiple regression models for predicting the gross domestic product using labor and capital as predictor variables for different time periods, Maddala [2].
In this paper we consider two tests (T1 and T2) utilizing all the available data and make a power comparison between these two tests and the standard test which is based solely on the k data sets relating to the regression lines whose parameters are tested for stability.In Section 2 we determine least squares estimates of the regression parameters to obtain the test statistics for the problem.The noncentrality parameter of the tests is derived in Section 3. In Section 4 we derive our proposed tests, T and T 2. We illustrate and compare the power of all three tests in Section 5.
The second sum on the right-hand side of (2.6) is minimized with respect to i and i (i=k+ 1 k+m) where and/i are defined, respectively, as in equations (2.2) and (2.3).The least squares estimates of the regression parameters a and # are given by where k n= Iz n i.
(2.9) i=l The conditional error sum of squares under H 0 is

10)
The sum of squares for testing the null hypothesis H 0 is where It is well-known in the literature that 1 2/0 2 is distributed as chi-square with k+m n'--n+ r ni-2(k+m) i=k+l (2.11) (2.12) (2.13) degrees of freedom and SSHo/o 2 is distributed as noncentral chi-square with 2(k-l) degrees of freedom.When H 0 is true, SSH0/o 2 is distributed as chi-square with 2(k-1) degrees of freedom.
Further, l 2 and SSH 0 are independent; for example see Kshirsagar [3].
3. NONCENTRALITY PARAMETER.Here we derive the expected value of SSH 0 under the non-null case.It can be shown that  Since SSHo/o 2 is distributed as noncentral chi-square with 2(k-l) degrees of freedom, it follows from (3.9) that the noncentrality parameter is given by where See, for example, Kshirsagar [3].The above test rejects the null hypothesis H 0 if F 0 > F,,2(k.),,.and accepts H 0 otherwise, where F,f ,f 2 is the upper 100a percentile point of the F-distribution with fl numerator degrees of freedom (ndf) and f2 denominator degrees of freedom (ddf).We note that the standard test is based upon the k data sets (Yi], xi], i= 1,2 k, 1,2 ni) and discards the additional data (Yi], xi], k+ 1 k+ m, 1,2 ni).
Consider the following test procedure (T1).Rejct H 0 if F 1 (SSHo/2(k-1))/(R0/n') > Wa,2(k.1),n,( and accept H 0 otherwise.A comparison between T O and T 1 shows that both have the same ndf but that the latter has larger ddf than T 0. We further note that T1 is based upon all the available data.Under the non-null case, both test statistics have noncentral F-distributions with the same noncentrality parameter, as in (3.10).Therefore F1 will have larger power than F 0, Graybill [4].
When the m regression lines in (1.2) are an unidentified subset of the k regression lines in the model (1.1), testing H o against H a is equivalent to testing Ithe k + m regression lines are identical agaimt I-1: at least two of them are different.
Following the procedure outlined in Section 2, it can be shown that the sum of squares for testing Iis where k+m k+m SSI-I I hi(8 .&,,)2+ I Si ./,)2,The sampling distribution of SSX--e 2, when H is tree, is noncentral chi-square with 2(k+ m-l) degrees of freedom and noncentrality parameter where and k+m k+m ,X'= (1/o2)[ E ni(ai-a) 2 + :E Si2(fli- We note that SSI-I can be obtained from SSH 0 by replacing k with k + m.When Iis true, the sampling distribution of SSHo 2 is chi-square with 2(k + m-l) degrees of freedom.Further, SSI-I and R 2 are independent.
We use an F-test (T2) to test Iagainst based upon the test statistic where 2(k + m-l).We reject H if F 2 > F,,fl, n, and accept I otherwise.When I is true, the sampling distribution of F 2 is noncentral F with ndf and n' ddf and noncentrality parameter a'.We use noncentral F-distribution tables to compute the power of the tests.The next section illustrates and compares the power of these three tests.
5. POWER COMPARISONS OF THE TESTS.When H 0 and Iare not true the test statistics (4.1), (4.3), and (4.10) follow noncentral F-distributions.The non-null distributions of the test statistics F0 and F 1 have the same noncentrality parameter, a, defined in (3.10).The noncentrality parameter for the non-null distribution of F2 is x' as defined in (4.7).The ndf for both T O and T1 is f 2(k-l).For T 2 the ndf is 2(k+ m-l).T o has ddf f2 n-2k, while the ddf for Tx and T 2 is n' as defined in (2.13).
Tables 1, 2, and 3 illustrate the powers of T O and our proposed tests, T1 and T 2. We chose , 0.05 and situations involving k 4 regression lines.The number of data sets considered from unidentified populations is m, where 1 _< m_< k.For simplicity we use equal sample sizes (ni 10) for the k identified populations and equal sample sizes (n) for the m unidentified populations.
From our earlier notation n nk+ (i 1 m).Tables 1, 2, and 3 differ in the magnitude of ni.
In the tables we denote the noncentrality parameter for the power of test T as i.The power of each test is a function ofai and the relevant degrees of freedom.As indicated above, a0 " 1.For m k, each,x is a specific value.For m < k, ,x 0 and,x are (the same) specific values, but ,x 2 varies depending upon which unidentified populations produce the m data sets.For this reason we calculate the tests' powers for selected sets of k regression lines and values of Si 2 The differences between the parameters of these lines together with Si 2 and a 2 affect ,x i.The parameters of the k lines, Si 2 and 2 were chosen to produce the three values indicated for ,x 0, so that the power of T O is about .25,.5, and .75.If T O has very small power, then additional data provide very little improvement.Conversely, when the power of T O is very large, there is little need for improvement with additional data.
Examinations of the tables produce the following observations.The powers of T O are the same in all three tables because this test ignores the additional data sets.For a given,x 0 (and,x 1) the power of T is always greater than the power of T O consistent with Graybill's conclusion [4]  that for a given ndf the power of the test increases as the ddf increases.Also, in each table the power of T1 increases as m increases, because the ndf remains at 2(k-l) while the ddf increases by n-2.Likewise, for each value of m the power of T increases from Table through Table 3 because the ddf increases as a result of the n increasing from 5 to 7 and finally to 10.
The power of T 2 is heavily influenced by the choice of regression lines for the additional data when m < k.In each table the power of T 2 does not consistently increase as m increases.The increases in,x 2 and ddf are sometimes offset by the increase in of 2m.For each value of m the power of T 2 generally increases from Table through Table 3 because of the same increase in ddf as for T, But as Table 1 indicates, for small n relative to n the power of T2 may be lower than the power of T 0, and is seldom much better than the power of T 1.In Table 2 when the n approaches ni in size, improvements in the power of T 2 over the power of T are noticeable.
Table 3 indicates that when the n equal n i, T2 is superior to the power of T1 except occasionally for small m. 6. APPLICATIONS.Using additional data from unidentified populations improves the power of the test for stability of the parameters in k regression lines.The only requirement is that the error terms of the regression lines from all populations have a common variance.The power of our proposed test, T 1, is always greater than the power of the standard test, T 0. If m, the number of data sets from unidentified populations, is close to k and if the n are near the n i, then T 2 can produce a larger increase in the power than T 1.If m is small or if n is small relative to n i, then T1 may be a better choice than T

Call for Papers
This subject has been extensively studied in the past years for one-, two-, and three-dimensional space.Additionally, such dynamical systems can exhibit a very important and still unexplained phenomenon, called as the Fermi acceleration phenomenon.Basically, the phenomenon of Fermi acceleration (FA) is a process in which a classical particle can acquire unbounded energy from collisions with a heavy moving wall.This phenomenon was originally proposed by Enrico Fermi in 1949 as a possible explanation of the origin of the large energies of the cosmic particles.His original model was then modified and considered under different approaches and using many versions.Moreover, applications of FA have been of a large broad interest in many different fields of science including plasma physics, astrophysics, atomic physics, optics, and time-dependent billiard problems and they are useful for controlling chaos in Engineering and dynamical systems exhibiting chaos (both conservative and dissipative chaos).We intend to publish in this special issue papers reporting research on time-dependent billiards.The topic includes both conservative and dissipative dynamics.Papers discussing dynamical properties, statistical and mathematical results, stability investigation of the phase space structure, the phenomenon of Fermi acceleration, conditions for having suppression of Fermi acceleration, and computational and numerical methods for exploring these structures and applications are welcome.
To be acceptable for publication in the special issue of Mathematical Problems in Engineering, papers must make significant, original, and correct contributions to one or more of the topics above mentioned.Mathematical papers regarding the topics above are also welcome.

It can be shown that the least squares estimates ofa and/i are given by
2.

Table 2
Power of the F-tests fora =.05 k=4 n 10 n =7

Table 3
Power of the F-tests for --.05 k=4 n 10 n 10