Associating Multivariate Traits with Genetic Variants Using Collapsing and Kernel Methods with Pedigree- or Population-Based Studies

In genetic association analysis, several relevant phenotypes or multivariate traits with different types of components are usually collected to study complex or multifactorial diseases. Over the past few years, jointly testing for association between multivariate traits and multiple genetic variants has become more popular because it can increase statistical power to identify causal genes in pedigree- or population-based studies. However, most of the existing methods mainly focus on testing genetic variants associated with multiple continuous phenotypes. In this investigation, we develop a framework for identifying the pleiotropic effects of genetic variants on multivariate traits by using collapsing and kernel methods with pedigree- or population-structured data. The proposed framework is applicable to the burden test, the kernel test, and the omnibus test for autosomes and the X chromosome. The proposed multivariate trait association methods can accommodate continuous phenotypes or binary phenotypes and further can adjust for covariates. Simulation studies show that the performance of our methods is satisfactory with respect to the empirical type I error rates and power rates in comparison with the existing methods.


Introduction
Genome-wide association studies (GWAS) intend to find genetic variants such as single nucleotide polymorphisms (SNPs) associated with common traits or with complex diseases [1,2]. Association studies, where the correlation relationship between a genetic variant and a trait is evaluated, are helpful for mapping genes influencing complex diseases [3]. In the study of complex diseases, data on several correlated phenotypes or a multivariate phenotype with several components are often collected to get a better understanding of the disease [1,3,4]. Multivariate correlated traits are influenced through multiple variants simultaneously. Therefore, by a suitable joint or multivariate analysis framework of multivariate traits, we can not only gain more statistical power to identify pleiotropic effects of genetic variants on multivariate traits [3,[5][6][7][8][9][10][11][12] but also can further understand the genetic architecture of the disease of interest [5,13]. Thus, recently, the joint analysis of multivariate traits has become popular because it can increase statistical power over analyzing only one trait at a time [1,4].
Although these new developments keep many benefits, existing methods have some potential limitations [39]. Most current methods are constructed under some specific assumptions about the effects of genetic variants on multivariate traits [39]. These current approaches suffer a severe loss in power once the model assumptions are violated [26,39].
In this investigation, we develop the statistical methods for identifying pleiotropic effects of genetic variants on multivariate traits using collapsing and kernel methods with pedigree-or population-structured data. The proposed multivariate trait association method is able to handle binary phenotypes or continuous phenotypes and further can adjust for covariates. Moreover, the proposed multivariate trait association method not only can leverage the dependence on the phenotypes but also can account for the sample relatedness in the pedigree-based or population-based structured data.
The rest of the article is organized as follows. In the materials and methods section, we construct the multivariate effect model using the joint GEE model formulation (JGEE) [40]. We apply the JGEE to pedigree-or populationstructured data and introduce a retrospective framework for analyzing multivariate traits in genetic association studies. The proposed framework is applicable to the burden test, the kernel test, and the omnibus test for autosomes and the X chromosome. In the simulation studies, we examine the finite sample size performance of the proposed multivariate association methods and evaluate the comparison results with the existing method, Multi-SKAT [39]. Concluding remarks and future possibilities for continuity are given in the conclusion section and the limitation section.

Notations.
To describe the proposed multivariate trait association method based on the pedigree-or populationbased structured data, we suppose that there are N independent pedigrees and each pedigree has n i subjects. We assume that the n i subjects have been sequenced in a genetic region of interest (e.g., a gene) that contains p variants. Let y ik = ðy i1k , y i2k ,⋯,y in i k Þ T be the n i × 1 phenotype vector for the k th phenotype of the i th pedigree. Let y i = ðy i1 , y i2 ,⋯,y iK Þ be the ðn i × KÞ × 1 response vector for the K phenotypes that we are interested in. Let x im = ðx i1m , x i2m ,⋯,x in i m Þ T be the n i × 1 vector for the m th covariate of the i th pedigree. Let x i = ðx i0 , x i1 ,⋯,x iq Þ be the n i × ðq + 1Þ covariate matrix for the ðq + 1Þ nongenetic covariates that we want to adjust for. Let α k = ðα 0k , α 1k ,⋯,α qk Þ T be the ðq + 1Þ × 1 vector of regression coefficients of the ðq + 1Þ nongenetic covariates with the element α mk being the effect of the m th covariate on the k th trait. Let g i = ðg i1 , g i2 ,⋯,g ip Þ be the n i × p genetic matrix for p genetic variants in a target region of interest where g il = ðg i1l , g i2l ,⋯,g in i l Þ T is the n i × 1 vector for a genetic variant l (g ijl = 0, 1, or 2 for 0, 1, or 2 copies of the minor allele, respectively). Let β k = ðβ 1k , β 2k ,⋯,β pk Þ T be the p × 1 vector of regression coefficients of the p genetic variants with the element β lk being the effect of the l th genetic variant on the k th trait.

Multitrait Regression-Based Tests for Pedigree Data.
We let X i = I K ⊗ x i be the ðn i × KÞ × ððq + 1Þ × KÞ covariate matrix and G i = I K ⊗ g i be the ðn i × KÞ × ðp × KÞ genotype matrix for the i th pedigree where I K is an identity matrix of dimension K × K and ⊗ stands for the Kronecker product. According to the generalized linear model [41], we assume that the marginal density of y ijk is f ðy ijk Þ = exp ½fy ijk θ ijk − a ðθ ijk Þ + bðy ijk Þg/ϕ with two moments, μ ijk = Eðy ijk Þ = ∂að θ ijk Þ/∂θ ijk and Based on the joint GEE model formulation [40], we construct the multivariate linear model for describing the association relationship between K correlated traits and genetic variants, which is given as follows: where g −1 ð•Þ is the inverse function of gð•Þ and is a response-specific link function [40], the ðn i × KÞ × 1 vector of the expected mean of the multivariate traits y i = ðy T i1 , y T i2 ,⋯,y T iK Þ T , α = ðα T 1 , α T 2 ,⋯,α T K Þ T is the ððq + 1Þ × KÞ × 1 vector of regression coefficients of the ðq + 1Þ nongenetic covariates for the K correlated traits, and β = ðβ T 1 , β T 2 ,⋯,β T K Þ T is the ðp × KÞ × 1 vector of regression coefficients of the p genetic variants for the K correlated traits. Let R n i ðφÞ and R K ðγÞ be the n i × n i within-in cluster correlation matrix and the K × K multivariate-response cluster correlation matrix, which depend on a vector of parameters φ and γ, respectively. The ðn i × KÞ × ðn i × KÞ working (or approximate) covariance matrix of y i is given by [40]. where T be the ððn i × KÞ × 1Þ vector of the standard residuals with components S ik = ðS i1k , S i2k ,⋯,S in i k Þ T , k = 1, 2, ⋯, K, whereV −1 i is the inverse matrix ofV i . Here,V i and b μ i are the estimates of V i and μ i . Here and henceforth, all estimates are calculated based on the null hypothesis of the genetic effects β equal to zero. All unknown parameters and the working within-in and multivariate-response cluster correlation matrices are estimated by the R package JGEE [42].
(1) Homogeneous Kernel Statistic. We suppose that w l is a marker-specific weight of the l th variant and assume that the genetic effects on the K different phenotypes are homogeneous (i.e., β 1 = β 2 = ⋯ = β K Þ. Based on the JGEE model with the genotype as random variables considered, we propose the homogeneous quadratic (kernel) association statistic (HoK) as follows: Δ ikÂik S ik,Âik is the estimate of A ik , and b Δ ik is the estimate of Δ ik = diag ð∂ θ i1k /∂η i1k , ∂θ i2k /∂η i2k ,⋯,∂θ in i k /∂η in i k Þ that is a n i × n i diagonal matrix for the k th phenotype of the i th pedigree. The null distribution of κ Ho asymptotically follows a mixture chisquare distribution ∑ p l=1 λ l χ 2 l,1 , where χ 2 l,1 s are independent random variables following a chi-square distribution with one degree of freedom and ðλ 1 , λ 2 , ⋯, λ p Þ are nonzero eigenvalues of the null covariate matrix of Cov 0 ðZ l ,Z l ′ Þ = 2C Ho and Ω i is a n i × n i matrix of genetic correlations for all n i individuals in the i th pedigree, which has the same definition given by Schaid et al. [43] and can be calculated by the R package kinship2 [44]. When the genetic relationship between subjects j and j ′ in the i th pedigree is unknown, the elements of the genetic correlation Ω i can be estimated through genomic data [43,45], and its estimate is given by [43] b (2) Heterogeneous Kernel Statistic. We assume that the genetic effects on the K different phenotypes are heteroge-neous (i.e., β 1 ≠ β 2 ≠ ⋯ ≠ β K Þ. The heterogeneous quadratic (kernel) association statistic (HeK) is defined by whereZ lk = w lk Z lk and w lk is a marker-specific weight of the l th variant of the k th trait. The null distribution of κ He asymptotically follows a mixture chi-square distribution ∑ ðp×KÞ l=1 λ l χ 2 l,1 , where χ 2 l,1 s are independent random variables following a chi-square distribution with one degree of freedom, and ðλ 1 , λ 2 , ⋯, λ ðp×KÞ Þ are nonzero eigenvalues of the null covariate matrix of Theoretical p values of κ Ho and κ He are approximately calculated by Kuonen's saddlepoint method [46] and obtained by the R package pchisqsum. A theory for the derivation of the HoK test ðκ Ho Þ and the HeK test ðκ He Þ is shown in Appendix S1.

Burden Test.
We letg T i = ∑ p l=1 w l g T il be a weighted average of genotype scores for the i th pedigree. On the basis of the HoK test ðκ Ho Þ and the HeK test ðκ He Þ in equations (3) and (5) with the same marker-specific weight of the l th variant for each trait k (i.e., w l = w lk , k = 1, 2, ⋯, K), we propose the burden test (BT) as follows: where the null covariance matrix ofg i is given by

Computational and Mathematical Methods in Medicine
Then, The null distribution of BT asymptotically follows a chisquare distribution with one degree of freedom.

Omnibus Test.
Let p Ho , p He , and p BT denote the p values obtained by the HoK, HeK, and BT statistics. Based on the idea of the p value combination method through the Cauchy distribution [47][48][49], we propose the homogeneous omnibus test (HoO) and heterogeneous omnibus test (HeO).
(1) Homogeneous Omnibus Test. Combining the p Ho with the p BT , we construct the homogeneous omnibus test (HoO) as follows: where F −1 C stands for the inverse cumulative distribution function of the standard Cauchy distribution.
(2) Heterogeneous Omnibus Test. Combining the p He with the p BT , we construct the heterogeneous omnibus test (HeO) as follows: The null distributions of the O Ho test and the O He test asymptotically follow a standard Cauchy distribution [47][48][49]. The p values of the O Ho test and the O He test are calculated by the R package RNOmni [50].
The kernel statistic, the burden test, and the omnibus test are also applicable to the X chromosome. Additional technical information for extensions to the X chromosome is shown in Appendix S2.

Simulation Studies
We conduct the numerical simulation studies to assess the finite sample performance of the proposed methods and evaluate the comparison results with two existing methods, the minimum p value SKAT statistic (mPK), and the minimum p value burden statistic (mPB) [39]. The two existing methods are implemented by the R package Multi-SKAT [39]. Based on the similar simulation set-up as those usually considered from existing genetic association tests [39,43,51], we investigate the effect of the proposed methods, HoK, HeK, BT, HoO, and HeO, for identifying genetic variants that are associated with multiple traits. We simultaneously generate 10,000 European-like (EUR) and 10,000 admixed African American-like (AA) haplotypes of length 200 kb using a calibrated human demographic model through the COSI soft-ware [51,52]. A 3 kb region is randomly selected in our numerical simulations. We generate a total of 10,000 databases for each simulation scenario in our studies.

Type I Error Rate and Power Simulations.
In the heterogeneous population with nuclear family data considered, continuous and binary phenotypes for trait k for individual j in the i th family are generated from the multivariate linear model in equation (1) with K = 2 and n i = 3. More precisely, continuous and binary phenotypes are generated by the following linear and logit models, respectively: where For continuous traits, the error terms ε i = ðε i11 , ε i21 , ε i31 , ε i12 , ε i22 , ε i32 Þ T in equation (11) follow a multivariate normal distribution having a mean of zero, a withinin cluster correlation matrix (i.e., Corðε ijk , ε ij ′ k Þ) with diagonal entries of 1 and all off-diagonal entries of 0.2 and a subject-across-response correlation matrix (i.e., Corðε ijk , ε ij ′ k ′ Þ) with diagonal entries of 0.3 and all off-diagonal entries of 0.1. Similarly, binary traits y i in equation (12) are generated with the same within-in cluster correlation matrix (i.e., Corðy ijk , y ij ′ k Þ) and the same subject-across-response correlation matrix (i.e., Corðy ijk , y ij′k′ Þ) as the continuous traits y i in equation (11). These correlated phenotypes are generated by the R package BinNor [53].
For type I error simulations, the regression coefficients of genetic variants, β = ðβ T 1 , β T 2 Þ T , in equations (11) and (12)  corresponding to the risk or protective variant l, l = 1, 2, ⋯, p [51]. Under the assumption that the genetic effects on the two different phenotypes are heterogeneous (i.e., β 1 ≠ β 2 Þ, the genetic effects β 1 for the first traits y i1 are set as described 4 Computational and Mathematical Methods in Medicine above, while the genetic effects β 2 for the second traits y i2 are set by zero. On the other hand, under the assumption that the genetic effects on the two different phenotypes are homogeneous (i.e., β 1 = β 2 Þ, the genetic effects β 1 and β 2 for the first and second traits have the same settings as described above. We simulate 1,400 nuclear families with 800 nuclear families from the European samples and 600 nuclear families from African-American samples. The marker-specific weight w l for variant l is considered as the beta density function w l = Betaðm l , λ 1 , λ 2 Þ with shape parameters λ 1 > 0 and λ 2 > 0 [51]. To study the effect of the marker-specific weight w l of variant l on the phenotypes, we consider the unweighted marker-specific weight with w l = Betaðm l , 1, 1Þ = 1 and the weighted marker-specific weight with w l = Betaðm l , 1, 25Þ [51]. The empirical type I error rates based on fifty thousand replicates and the empirical power rates based on two thousand replicates are reported for all simulation results. The "exchangeable" and "unstructured" structures are considered for the working within-cluster and multivariate-response correlation matrices for the proposed methods, HoK, HeK, and BT, respectively. Table 1 reports the results of a simulation comparison on empirical type I error rates when the phenotypes are considered to be continuous. Table 1 displays that the proposed methods, HoK, HoO, HeK, HeO, and BT, well control the empirical type I error rates regardless of the weight of the marker-specific weight. Similarly, the existing methods, mPK and mPB, have good performance on controlling the empirical type I error rates. Our simulation results show that the seven competing methods, HoK, HoO, HeK, HeO, BT, mPK, and mPB, reasonably control the empirical type I error rates for autosome analyses with continuous traits. The seven competing approaches display similar performance in terms of the empirical type I error rates for the X chromosome analyses with continuous traits (Appendix S3: Table S1). Table 2 reports the empirical type I error rates based on the proposed methods, HoK, HeK, BT, HoO, and HeO, for the binary data. The two existing methods, mPK and mPB, aren't included for comparison. This reason is that implementing the two existing methods, mPK and mPB, via the R package Multi-SKAT [39], the MPMM (multiple phenotype mixed model) function in the R package PHENIX [54][55][56] is a necessary tool for this process. However, the MPMM function is suitable for the continuous phenotypes [56] or is suitable for the binary phenotypes with the condition that the number of cases is sufficiently large [39]. In other words, in some sense, the two existing methods, mPK and mPB, are limited to continuous phenotypes [39]. Table 2 shows that the proposed methods appropriately control the type I error rates when the marker-specific weight is considered for w l = Betaðm l , 1, 1Þ or w l = Betaðm l , 1, 25Þ for variant l for binary traits. On the other hand, the empirical type I error rates of the proposed methods for X chromosome analyses with binary traits are depicted in Table S2 in Appendix S3. These empirical type I error rates show similar results as that for autosome analyses.

Empirical Type I Error Rates.
In summary, our simulation results show that the proposed multivariate trait association methods, HoK, HoO, HeK, HeO, and BT, have reasonable control of type I error rates for continuous traits or binary traits whether the marker is X chromosomal or autosomal. On the other hand, the existing methods, mPK and mPB, yield well-controlled type I error rates for the autosome analyses or the X chromosome analyses with continuous traits (Table 1 or Table S1), regardless of the weight of the marker-specific weight. Figure 1 exhibits the comparison results of the empirical power rates for the autosome analyses with continuous traits, when the working within-cluster and multivariate-response correlation matrices of the proposed methods, HoK, HeK, and BT, are considered to be exchangeable. As expected, the empirical power rates of the seven competing methods with a weighted marker-specific weight of w l = Betaðm l , 1, 25Þ are higher than that with an unweighted marker-specific weight of w l = Betaðm l , 1, 1Þ = 1 . The heterogeneous kernel statistic (HeK) has slightly greater empirical power rates than other methods, when the genetic effects on the different phenotypes are heterogeneous (i.e., β 1 ≠ β 2 ), and causal SNPs have positive effects or negative effects on phenotypes. On the other hand, the existing method, mPB, has bigger empirical power rates, when the genetic effects on the different phenotypes are heterogeneous (i.e., β 1 ≠ β 2 ), and all causal SNPs have a positive association on phenotypes. Moreover, the empirical power rates of the homogeneous omnibus test (HoO) are larger than that of the other six competing methods, when the genetic effects on the different phenotypes are homogeneous (i.e., β 1 = β 2 ). Evidently, the seven competing methods have their respective advantages in identifying the association between genetic effects and multiple continuous traits for autosome analyses.

Empirical Power.
Similar empirical power rates are obtained from the working within-cluster and multivariate-response correlation matrices of the proposed methods, HoK, HeK, and BT, considered to be unstructured. Hence, these empirical power rates are not shown in order to save space. On the other hand, the seven competing approaches display a similar performance in testing for the X chromosome analyses with continuous traits (Appendix S3: Figure S1). Figure 2 exhibits the comparison results of empirical power rates for the autosome analyses with binary traits when the working within-cluster and multivariate-response correlation matrices of the proposed methods, HoK, HeK, and BT, are considered to be exchangeable. As a similar reason for investigating the empirical type I error rates with binary traits, the two existing methods, mPK and mPB, aren't included for power comparison. Figure 2 shows that the heterogeneous kernel statistic (HeK) and the heterogeneous omnibus test (HeO) outperform over other methods in terms of the empirical power rates, when the genetic effects on the different phenotypes are heterogeneous (i.e., β 1 ≠ β 2 ). On the other hand, the empirical power rates of the homogeneous omnibus test (HoO) are bigger than that of the other competing methods, 5 Computational and Mathematical Methods in Medicine when the genetic effects on the different phenotypes are homogeneous (i.e., β 1 = β 2 ). As expected, in general, the heterogeneous kernel statistic (HeK) is more powerful than the homogeneous kernel statistic (HoK), when the genetic effects on the different phenotypes are heterogeneous (i.e., β 1 ≠ β 2 ). On the other hand, the homogeneous kernel statistic (HoK) is more powerful than the heterogeneous kernel statistic (HeK), when the genetic effects on the different phenotypes  are homogeneous (i.e., β 1 = β 2 ). In a word, the proposed methods, HoK, HoO, HeK, HeO, and BT, have their respective merits in examining the association between genetic effects and multiple binary traits for autosome analyses.
Similarly, when the working within-cluster and multivariate-response correlation matrices of the proposed methods, HoK, HeK, and BT, are considered to be unstructured, the empirical power rates have similar results and thus  Computational and Mathematical Methods in Medicine they are omitted. On the other hand, the empirical power rates of the proposed methods for X chromosome analyses with binary traits are presented in Figure S2 in Appendix S3. These empirical power rates show similar results as that discussed in Figure 2.
In summary, the seven competing methods, HoK, HoO, HeK, HeO, BT, mPK, and mPB, have their respective merits in diagnosing whether genetic effects are associated with multiple continuous traits for autosome analyses or the X chromosome analyses. Similarly, the proposed methods, HoK, HoO, HeK, HeO, and BT, have their respective advantages in examining whether there are associations between genetic effects and multiple binary traits for autosome analyses or the X chromosome analyses.
To furthermore examine the performance of the proposed methods, additional simulation studies for continuous traits and binary traits are presented in Appendix S4 and Appendix S5 with higher correlations of phenotypes and higher dimensions of phenotypes considered, respectively. In general, these competing methods based on higher correlations of phenotypes or higher dimensions of phenotypes can provide a bigger empirical power rate for the analysis of continuous traits or binary traits. However, we note that these competing methods based on higher correlations of phenotypes or higher dimensions of phenotypes more easily have empirical type I error rate inflation at a smaller nominal level, especially for binary data analysis (Appendix S5:  Tables S5-S6 and Appendix S6: Table S7), in comparison with these methods based on lower correlations of phenotypes or lower dimensions of phenotypes. A detailed discussion of these additional simulation results is given in Appendixes S4 and S5.
However, we note that the proposed methods have a high computational cost, especially for binary data. Under our simulation setting and framework, we carry out a single simulated data set by using a computer based on one CPU core at 2.1 GHz. The average computational times of the homogeneous and heterogeneous tests with a weighted markerspecific weight w l = Betaðm l , 1, 25Þ under the alternative hypothesis for continuous data are 0.83 and 0.91 minutes, respectively, while that for binary data are 4.77 and 4.80 minutes, respectively. Therefore, in the current version, such a framework algorithm implementation is unsatisfactory for analyzing a large-scale high-dimensional data set in practice.

Conclusion
In this investigation, we develop a retrospective framework for identifying the pleiotropic effects of genetic variants on multivariate traits by using collapsing and kernel methods with pedigree-or population-structured data. The proposed framework, corresponding to the burden test, the kernel test, and the omnibus test, provides a sound basis for genetic association analyses for autosomes and the X chromosome. The proposed multivariate trait association methods based on the JGEE model can flexibly accommodate continuous phenotypes or binary phenotypes and further can adjust for covariates.
One critical advantage of the proposed methods is that the homogeneous kernel statistic (HoK), the heterogeneous kernel statistic (HeK), and the burden test (BT) retain all of the benefits of the retrospective tests proposed by Schaid et al. [43] who treated the genotype data as random variables by conditioning the phenotypes as constants. On the other hand, the homogeneous omnibus test (HoO) and the heterogeneous kernel statistic (HeO) keep the advantages of the Cauchy combination tests proposed by Liu and Xie [48] who showed that the Cauchy combination tests are robust to model misspecification and robustly protect the type I error rates [49].
Another important benefit of the proposed method is that the HoK test, the HeK test, and the BT test keep the benefits of the JGEE model that validly account for complex correlations between subjects within the cluster (within-cluster correlations) and between different phenotypes from the same subjects (multivariate-response correlations). Moreover, the proposed test statistics, HoK, HeK, and BT, based on the JGEE model can efficaciously account for covariate adjustment whether the phenotypes are continuous or binary.
Our simulation studies show that an unweighted markerspecific weight w l = Betaðm l , 1, 1Þ = 1 and an exchangeable structure of the working within-cluster and multivariateresponse correlations are recommended for the practical data analysis if the data cannot sufficiently provide valid information for estimating the structures of the working withincluster and multivariate-response correlations before the start of the data analysis. Moreover, the homogeneous kernel statistic (HoK) is more robust than the heterogeneous statistic (HeK) in controlling the empirical type I errors, because the null distribution of the HeK statistic asymptotically follows a mixture chi-square distribution with a larger degree of freedom, in comparison with the null distribution of the HoK statistic. However, the HeK statistic is more powerful than the HoK statistic when the genetic effects on the different phenotypes are heterogeneous.
On the other hand, our simulation results show that for the autosome analyses or the X chromosome analyses with continuous traits, the seven competing methods, HoK, HoO, HeK, HeO, BT, mPK, and mPB, show good performance with wellcontrolled type I errors, while the seven competing methods have their respective merits for identifying the association between the genetic effects and multiple continuous traits. In addition, our simulation results show that for the autosome analyses or the X chromosome analyses with binary traits, the proposed methods, HoK, HoO, HeK, HeO, and BT, can control empirical type I errors with lower correlations of phenotypes or with lower dimensions of phenotypes ( Table 2 and Table S2), while these proposed methods have their respective advantages for identifying the genetic variants associated with multiple binary traits. However, we observe that the proposed methods, HoK, HoO, HeK, HeO, and BT, with higher correlations of phenotypes or with higher dimensions of phenotypes, more easily have the infection of empirical type I errors at a smaller nominal level (Appendix S5: Tables S5-S6  and Appendix S6: Table S7), although these method under such situations have higher empirical power rates. 8 Computational and Mathematical Methods in Medicine

Limitation
The proposed multivariate trait association methods have their limitations. First, these proposed methods cannot simultaneously include the continuous traits and binary traits in analysis. Thus, future studies are needed to extend the idea of the proposed multivariate trait association methods for simultaneously considering continuous traits and binary traits in analysis. Second, the multivariate trait association methods, based on higher correlations of phenotypes or higher dimensions of phenotypes, easily suffer from the problem of the inflated type I errors, especially when the binary traits are considered (Appendix S5: Tables S5-S6 and Appendix S6: Table S7). Although the JGEE model provides an efficient algorithm for estimating the structure of the working within-cluster and multivariate-response correlations, a large-scale pedigree study always suffers from a more complex and highdimensional structure of the within-cluster and multivariateresponse correlations in pedigree database analysis. Hence, in the future, a more effective algorithm for estimating the complicated and high-dimensional (or higher correlational) structure of the working within-cluster and multivariateresponse correlations is necessary to be proposed, especially when the analysis focuses on the binary traits. Third, in comparison with the null distribution of the homogeneous kernel statistic, the null distribution of the heterogeneous kernel statistic follows a larger degree of freedom test, which easily causes such a heterogeneous test to suffer from the problem of the type I error inflation. Therefore, overcoming the problem of the type I error inflation from the heterogeneous test is an essential part of the future work. Fourth, the proposed methods, which have a high computational cost especially for binary data, are inappropriate for analyzing large-scale high-dimensional data in practice. Thus, a more effective algorithm for reducing computational cost is needed to be proposed in further research. Moreover, the software of the proposed methods is computationally inconvenient and particularly inadequate for the mass GWAS data in practice. Therefore, the software of the proposed methods, which is convenient to be used, is a further work in the future. Fifth, our current work focuses mainly on the lowand common-frequency variants. Extension of the proposed methods to the rare variants deserves further works.

Data Availability
The data supporting the findings of this study are available within the article and its supplementary materials.

Conflicts of Interest
The authors declare that they have no conflicts of interest.