Development of Integrated Choice and Latent Variable (ICLV) Models Using Matrix-Based Analytic Approximation and Automatic Differentiation Methods on TensorFlow Platform

+is study further explores the multinomial probit-based integrated choice and latent variable (ICLV) models. +e LDLTmatrixbased analytic approximation methods, including Mendell and Elston (ME) method, bivariate ME (BME) method, and twovariate bivariate screening (TVBS) method, were adapted to calculate the multivariate cumulative normal distribution (MVNCD) function in the ICLV model because of the better performances in accuracy and computational time. Integrated with the composite marginal likelihood (CML) estimation approach, the ICLV model based on high-dimensional integration can be estimated accurately within a reasonable time. In this study, some three-alternative and four-alternative ICLV models are simulated to examine their abilities to recover model parameters. It is found that the parameter estimates and standard error estimates are acceptable for both models and the computational time is expected to decrease using tensor data structures on the TensorFlow platform. For the four-alternative ICLVmodels, the TVBSmethod has the highest level of accuracy.+e BMEmethod is also a good alternative to TVBS if computational time is of great concern.+e application of the automatic differentiation (AD) technique in the model can free researchers from coding analytical gradients of log-likelihood functions and thereby greatly reduce the workload of researchers.


Introduction
In the discrete choice model (DCM), the choice depends not only on the transport characteristics and the observable characteristics of decision-makers but also on the unobservable characteristics, namely latent variables. Among them, psychological and attitudinal factors as parts of unobservable latent variables have been proved to be significant when being integrated into DCMs [1].
One alternative to incorporate latent variables into discrete choice models is the integrated choice and latent variable (ICLV) model [2,3], which can be viewed as a variation of the traditional structural equation model (SEM) with the advantage to involve the underlying choice process based on psychological concepts. e ICLV model is a popular research topic in the field of transport choice studies since the early 2000s, and the application of this model has been increasing in recent years [3][4][5][6][7]. e earlier ICLV models are generally developed on the basis of the mixed logit model, of which the error terms follow an independent and identically Gumbel distribution. ese logit kernel-based ICLV models ignore the correlation among the utility of alternatives and the correlation among the error terms of latent variables. Meanwhile, the models have several deficiencies in the aspects of the estimation method (i.e., maximum simulated likelihood (MSL) approach [8,9]), the convergence of model estimation with multiple latent variables [7], and computational time of multidimensional integration. Regarding the limitations of the above models, Bhat and Dubey [10] firstly proposed an multinomial probit (MNP) kernel-based ICLV formulation method, which can accommodate a large number of latent variables, and performances in convergence and computational efficiency have been improved. In particular, the maximum approximate composite marginal likelihood (MACML) inference approach is integrated to estimate the model parameters, which further simplifies the estimation process. As the number of latent variables is independent of the integral dimensions of the log-likelihood function and only relates to the number of alternatives of choice models, the computational time of this method is relatively short. Besides, the ordered and continuous response indicators for the latent variables can be involved in the model easily. Up to now, the new MNP-based ICLV model has not been extensively applied [11,12] and remains a broad prospect for practice.
At present, it is critical to compute the multivariate cumulative normal distribution (MVNCD) function of the proposed ICLV model, which can be further improved. In some earlier studies, the simulation techniques are used for evaluating the MVNCD functions, of which the most common method is using a Geweke-Hajivassiliou-Keane (GHK) simulator [13]. However, since there is a large time cost of the simulation method with the same level of accuracy, this method may be replaced by the analytic approximation method. Mendell and Elston (ME) proposed a univariate conditioning approach earlier [14], and then, the researchers made a series of improvements. In the above probit-based ICLV model, the MACML approach of Bhat [15] is adopted, which combines the Switzer-Solow-Joe (SSJ) approximation for the MVNCD function with the CML inference approach for MNP models. Here, we intend to apply a new LDLT matrix-based analytic approximation method proposed by Bhat in 2018 [16] to evaluate the MVNCD function, which is demonstrated to have better performance than the SSJ. e method can effectively reduce difficulties in convergence and covariance matrix computation that occur routinely in the maximum-likelihood estimation of choice models with analytically intractable likelihood functions. Meanwhile, applying the LDLT decomposition method to the algorithms also assists in ensuring no substantial speed reduction in each iteration.
In the process of the maximum-likelihood estimation (MLE) for models, programming analytical gradients are generally needed to improve the speed and accuracy at convergence. However, since the derivation and coding of analytical gradients are usually complicated and time-consuming for complex models, the introduction of automatic differentiation (AD) into the model estimation process enables the acquisition of gradients without coding effort, which significantly reduces the workload of researchers. e AD method takes advantage of the fact that every computer program will execute a sequence of elementary arithmetic operations and basic functions, no matter how complicated it is. e MLE procedure usually adopts a reverse mode of AD, which is first published in 1976 by Linnainmaa [17]. Compared with numerical differentiation, it has obvious advantages in terms of calculating higher-order derivatives and partial derivatives with multiple inputs. In the past decades, the theory of AD has been considerably developed. Nevertheless, it has limited applications in the field of econometrics and statistics up to now [18,19], which provides opportunities for further investigation in this respect.
Furthermore, the graph computation technique based on tensors represented as multidimensional arrays in the TensorFlow platform can avoid loop operations based on the two-dimensional matrix data structure used in conventional computational platforms. TensorFlow should have an advantage in computational speed over conventional platforms when the LDLT-based analytic approximation method is applied within a sample. On the one hand, the graph computation technique based on tensors can ensure that all the calculation process in LDLT-based analytic approximation is completely realized at a speed of C language codes, rather than that of scripting language codes to operate matrices repeatedly in multiple loops. On the other hand, in MLE procedures using matrix-based analytic approximations, a tensor with more dimensions is desirable because each individual observation in the sample needs a matrixbased operation for its log-likelihood computation and such computation needs to be repeated as many times as the sample size. us, a computational platform with a tensor data structure can avoid repeated matrix operations in a large loop and further contribute to computational speed improvement. In this study, some efforts will be made to explore the feasibility of MNP-based ICLV model estimation using LDLT-based analytic approximation and tensor data structure on the TensorFlow platform. e rest of this study will be organized as follows. e second section elaborates on the MNP-based ICLV model and LDLT-based analytic approximation method and then introduces the AD mechanism. In the third section, the simulation experiments are designed for validating the three-alternative and four-alternative ICLV models. e next section presents the evaluation results of the MVNCD function and the estimation results for the ICLV model with different analytic approximation methods (ME, BME, and TVBS). Conclusions and discussions are summarized in the last section. multivariate normally distributed: η ∼ N[0 L , Γ], where Γ is a correlation matrix. Each η l of η is assumed to be the standard normal distribution. For the structural equation model, the same exogenous variables w can be used for different latent variables z * l .

Latent Variable Measurement Equation Model
where y, δ, and ξ are (G × 1) vectors that represent observed ordered indicator-dependent variables, intercept in measurement equation, and error items in measurement equation, respectively. d is a (G × L) matrix of coefficients representing the effect of latent variables z * on observed indicators y. Also, let Σ be the correlation matrix of ξ � (ξ 1 , ξ 2 , . . . , ξ G ), in which the identification of diagonal is not considered. Here, the normalization on the error terms ξ is needed for identification, as in the usual ordered-response model. e ordered indicator y can be horizontally partitioned into corresponding ordinal categories by setting thresholds. As per Bhat in 2014, ψ low is a (G × 1) vector that stacks the lower thresholds ψ g,n g− 1 (g � 1, 2, . . . , G) and ψ up is another vector that stacks the upper thresholds ψ g,n g (g � 1, 2, . . . , G).

Choice Model (Multinomial Probit Model)
where U is (I × 1) vector of alternative utilities, x is (I × D) matrix of exogenous variables in the choice model, and ε is (I × 1) utility error vector that followed multivariate normal distribution whose variance-covariance matrix is Λ. In the equation, φ is a I i�1 N i × L matrix of variables interacting with latent variables and γ is an I × I i�1 N i matrix of coefficients capturing the effects of latent variables and their interactions with exogenous variables. In a case without interactions, λ(I × L) matrix of coefficients can be specified when λ � γφ and the matrix φ is reduced to an identity matrix of size L. Since only the covariance matrix of error differences is estimable, I(I − 1)/2 − 1 elements can be identified in Λ after normalizing the covariance matrix. e MNP-based ICLV model framework can be presented in Figure 1.
In the process of the model estimation, z * in equations (2) and (3) can be substituted into the right side of equation (1) to further simplify expressions. We assume that the error vectors ξ, ε, and η are independent of each other. e following system can be obtained: e following is defined: en, YU ∼ MVN G+I (B, Ω). To estimate the model, we need to define matrix M of size (G + I − 1) × (G + I) firstly to transform the choice model into the utility difference form. When the alternative i is chosen, the matrix (G × G) in M i with the first G rows and first G columns is an identity matrix, and the rest matrix (I − 1) × I in M i is an identity matrix of (I − 1) size with an extra column of − 1 added as the i th column. For example, After that, the matrix dimension is reduced to (G + I − 1).
To integrate the multivariate OP model and MNP model to estimate the parameters, the upper and lower threshold vectors are defined: where both − ∞ I− 1 and 0 I− 1 are (I − 1) × 1-column vectors that represent negative infinities and zeros, respectively. us, the likelihood function of the ICLV model may be written as follows: where D u � u: ψ low ≤ u ≤ ψ up is the integration domain and f G+I− 1 (.) is the multivariate normal density function of dimension (G + I − 1). e integral dimension of the above likelihood function does not increase with the increase in the latent variables z * . To further proceed, the composite marginal likelihood (CML) approach can be used to simplify the model estimation process when the full likelihood function may be Journal of Advanced Transportation infeasible to evaluate. e CML estimator is the one that maximizes the compounded probability of all pairwise events, which obtains much easier-to-compute, lower-dimensional marginal likelihood. In the above case, the pairwise marginal likelihood function may be written for the ICLV model as follows: Pr j g n g , j g′ n g′ where j g represents the index for the ordinal outcome category for the g th ordinal variables. en, consider S g (similar to M), which is a I × (G + I − 1) selection matrix to select the g th ordered variables and all alternatives of the choice model. Let Φ E (: ; Δ * ) be the multivariate standard normal cumulative matrix function of dimension E and correlation matrix Δ * (Δ * ω − 1 Δ Δω − 1 Δ , where ω Δ is the diagonal matrix of the standard deviation of the covariance matrix Δ). e L CML (θ) function is expanded in detail as follows: , ψ g,up S g ψ up , B g S g B, Ω g S g ΩS g . In equations (7) and (8), the CML function can be divided into two components: the rst component corresponds to each pair of ordinal indicators (of which the integral dimension of MVNCD is 2), while the second component corresponds to each pair of the choice outcome and an ordinal indicator outcome (of which the integral dimension is equal to I). Accordingly, the maximum integral dimension of the MVNCD function in equation (8) does not exceed I. It is pivotal to solve the multidimensional integral in the process of model estimation.
e matrix-based analytic approximation method proposed by Bhat [16] can be applied to evaluate the MVNCD function, as detailed in a later section. At last, accounting for the index q for individuals, the pairwise marginal log-likelihood function is de ned as logL CML (θ) Q q 1 log L CML,q (θ). e pairwise estimator θ CML obtained by maximizing the logarithm of the pairwise marginal likelihood function with respect to the vector θ is consistent and asymptotically normally distributed with asymptotic mean θ and covariance matrix given by the inverse of Godambe's [20] sandwich information matrix G(θ) [21]:  where H(θ) and J(θ) can be estimated straightforwardly at the CML estimate (θ CML ):

Matrix-Based Analytic Approximation
Method. e computation of the MVNCD function can be applied for the multidimensional integral of the ICLV model above. In the previous ICLV model of Bhat, the MACML approach in 2011 [15] is adopted. In this section, a new matrix-based analytic approximation method proposed by Bhat in 2018 [16] can replace the MACML approach to evaluate the MVNCD function for acquiring better estimation results [22,23].
ere are five methods for approximating the MVNCD function mentioned and summarized in the literature of Bhat in 2018. As per Bhat, the TVBS method is recommended as the evaluation approach for the MVNCD function with the highest accuracy, and the BME method has a significant speed advantage over these evaluation approaches. Besides, as the foundation of the series of analytic approximation methods, the ME method can also be included in the evaluation system in this study. us, these three typical methods (ME, BME, and TVBS) will be selected for elaboration below. Conceptually, the ME method is based on univariate conditioning, while BME and TVBS are based on a bivariate conditioning mechanism.
Let e � (e 1 , e 2 , . . . , e K ) be a multivariate normally distributed random vector with zero means, variances of 1, and a correlation matrix Σ � 1 ρ 12 · · · ρ 1K ρ 12 K is the dimensionality of the MVNCD function. e Kdimensional MVNCD function can be expressed as follows: Φ K (e 1 , e 2 , . . . , e K : Σ) � P(e < V) � P(e 1 < V 1 , e 2 < V 2 , . . . , e K < V K ). To facilitate the understanding, assume that t represents e < V and t n represents e 1 < V 1 , and then, e joint probability of the MVNCD function of the three methods (ME, BME, and TVBS) may be expressed as follows. All the joint probability may be written as the product of a marginal probability and conditional probabilities, and the normal distribution can be used to approximate the skew normal distribution for conditional probabilities. In the equations below, the superscript n of t (n) is the number of truncations; n max is the maximum number of truncations, and a lower n max means a higher accuracy of the analytic approximation method. Let N be an intermediate parameter used to measure K: K is equal to N for the ME method, and K is discussed as being odd (K � 2N − 1) or even (K � 2N) for the BME (or TVBS) method.
e expected values μ k and variances σ 2 k of single-sided truncations in the following equations can be obtained by the properties of truncated multivariate normal distribution, which is not detailed here (refer to Section 2.1 of Bhat in 2018).
e method relies on a single-sided truncation of a multivariate normal distribution in which some variables are truncated, while others are not. After a single variable is truncated, the residual variables follow a skew normal distribution in the ME method. P(t (n) n+1 ) is defined as the univariate conditional probabilities of t n+1 after n times truncations, which follows a univariate normal distribution (being used for approximating the skew normal distribution). us, the above P(t) can be transformed into the product of recursive univariate normal cumulative distribution function (CDF) (represented by Φ(.)). In equation (11), K � N and n max � N − 1.
For instance, when K � 6, the probability of multivariate normal distribution can be written as follows: , where e * k follows the skew normal distribution after (k − 1) times truncations. ereinto, t 2 was truncated once and (11) can be calculated by iteratively updating the expected value and covariance matrix. In the equation above, the maximum number of truncations n max is 5.

e Bivariate ME (BME) Method
When K � 2N − 1 is odd, e BME method is in accordance with a bivariate truncation scheme, assuming that the marginal distribution of the third and fourth untruncated variables, given the first two are truncated, remains bivariate normal, and so on. In the equation above, similar to the ME method, P(t (2n) 2n+1 , t (2n) 2n+2 ) uses the bivariate normal distribution to approximate the bivariate skew normal distribution. e conditional probability P(t) of this method for MVNCD function is a product of bivariate normal CDF Φ 2 (.) when the dimension is even and a product of Φ 2 (.) and Φ(.) when the dimension is odd. Here, ρ * 2n+1,2n+2 is the updated correlation coefficient between error terms after bivariate truncation, and the calculation principle of the expected values μ k and variances σ 2 k is similar to the ME method. As the extension of the ME method, the BME method truncates two dimensions each time and n max times in total (also does the TVBS method), where n max � N − 1 in equations (12) and (13).
Similarly, when K � 6, the P(t) using the BME method can be expressed as follows: . ereinto, t 3 and t 4 are truncated once (n � 1), and t 5 and t 6 are truncated twice (n � 2). e maximum number of truncations n max is 2, which is smaller than the n max of the ME method.
is indicates that the accuracy of the bivariate truncation scheme is better than the univariate truncation scheme.

e Two-Variate Bivariate Screening (TVBS) Method.
Journal of Advanced Transportation When Similarly, the TVBS method combines the bivariate truncation scheme, in which the bivariate distribution is assumed to be rectangle-screened normal (RSN) distribution [24]. e corresponding conditional CDF at each step can be expressed as a ratio of a four-variate normal CDF Φ 4 (.) and a two-variate normal CDF Φ 2 (.). Since the calculation of four-variate normal CDF is time-consuming, the TVBS method adopts an approximation method by taking the trivariate CDF of the first three variates (P 123 ). Given the first three variables, the CDF of the skew normal distribution of the fourth variable is denoted as P 4|123 . Accordingly, a four-variate normal CDF is obtained: P 1234 � P 123 * P 4|123 � P 123 * P 34|123 /P 3|12 . Further, the MVNCD function is composed of the product of Φ 3 (.) (denotes trivariate normal CDF) and Φ 3 (.)/Φ(.) when K is an odd number, and the product of (14) and (15), μ and Ω represent the expected value and covariance matrix updated after bivariate truncation, respectively. Additionally, n max � N − 1 (K is even) and n max � N − 2 (K is odd) for the TVBS method.
As we know from the above discussion, the multidimensional integral for the analytical approximation method of MNVCD can be completely converted into integrals with no more than three dimensions. At present, the CDF of bivariate normal (BVN) distribution and trivariate normal (TVN) distribution is generally realized by numerical methods based on the Legendre-Gauss quadrature integral approximation. is numerical integration method adopted in most existing packages is relatively old with low levels of accuracy and speed. In this study, referring to the study of Genz in 2004 [25], a new numerical integration method is used to replace the original method. For BVN probabilities, we adopted the algorithm proposed by Drezner and Wesolowsky [26] with a higher-order O(x 4 ) Taylor expansion; for TVN probabilities, the second Plackett formula method studied by Gassmann [27] is applied (not be detailed here). Accordingly, after the above algorithms are applied along with matrix operations, the performance of both functions is improved in terms of accuracy and speed.

Automatic Differentiation (AD).
Automatic differentiation (AD) is a chain rule-based technique for evaluating the derivatives with respect to the input variables of functions defined by a computer program [28]. ere are two different AD strategies: the forward mode and the reverse mode. e advantage of the reverse mode over the forward mode will be highlighted when the number of parameters is large, which is more suitable for the MLE procedure.
For a simple example, consider a binary probit model: e logarithmic likelihood function of the model can be written as In the reverse mode of AD, the derivative is computed with respect to each sub-expression recursively in the chain rule: e derivative of LL with respect to v i is defined as v i � zLL/zv i , and the reverse accumulation with computation graph is illustrated in Figure 2. In this figure, the partial derivatives are estimated as follows: has a closed form, and Φ(.) can be approximated by the numerical integration method.
TensorFlow (https://www.tensorflow.org/), an opensource framework for large-scale distributed numerical computation, is widely applied in the areas of machine learning and deep neural network during early explorations. e main feature of the TensorFlow framework is based on a data flow graph and uses tensor as the core component to denote data. e framework builds a gradient graph corresponding to the original computation graph by backward propagation and chain rule of derivative values, which is more mature and flexible than other programming modes in the application of automatic gradient. Due to the advantages of TensorFlow, we applied it for developing econometric models in this study.
AD has obvious advantages when it is applied to the MLE of parameters in econometric models. Using this technique, modelers do not need to derive and code the complex analytic gradient expression of the log-likelihood function, which greatly reduces the workload. When the CML approach is used to estimate model parameters, the Hessian matrix and Jacobian matrix of estimated parameters need to be calculated. e errors and complexity of the numerical differentiation method increase when secondorder derivatives are computed, and the speed is low in calculating partial derivatives of a function with respect to many inputs. erefore, the use of automatic differentiation instead of numerical differentiation can aid in attaining more accurate estimation results with less computational time.

Simulation Experiments
To evaluate the ability of the proposed ICLV model to recover parameters from finite samples, it is necessary to conduct Monte Carlo simulation experiments. In the study of Bhat [10], four ordered indicator variables and three alternatives are concerned. In this section, referring to Bhat's experiment and further increasing the number of alternatives, we applied the LDLT matrix-based analytic approximation method for a higher integral dimension in simulation experiments.
Here, we design simulation experiments and discuss the identifiability of the model system based on two assumptions (excluding those mentioned in the last section): (1) the number of indicators L will not be less than the number of latent variables G (i.e., each latent variable at least has one indicator, which only loads on that variable but does not load on any other latent variable); (2) there is no common variable between each row of x matrix and the w vector. To explain the model more clearly and concisely, the following ICLV model is designed as an example. Consider that the ICLV model is composed of four latent variables z * l , four ordered indicator variables y g , and three alternatives' utilities U i , where the structural equation model of the latent variables in equation (1) can be represented as follows: Assume that each ordered variable has only one threshold ψ g (i.e., (ψ g ψ g,2 )), and the four-variate ordered probit model for the latent variable measurement model of equation (2) is as follows: en, the choice model with three alternatives of equation (3) can be speci ed as follows: where tt i and tc i are the exogenous explanatory variables in the choice model.
x ln v 7 = 1n (v 5 ) Ф Figure 2: Example of reverse-mode AD in the computation graph.

Journal of Advanced Transportation
In the above model, the correlation matrix Σ of the multivariate ordered probit model is fixed as an identity matrix, and the correlation matrix Γ of latent variables and the covariance matrix Λ of the choice model can be defined as follows: ere are (L * (L − 1)/2) parameters in Γ and (I * (I − 1)/2 − 1) parameters in Λ that need to be estimated. e covariance matrix Ω of the whole model after considering z * can be expressed as follows: Applying the CML approach of in equations (7) and (8), the number of identifiable parameters in the Ω is 16 (6 + 8+2) in the above sample. ereinto, G * (G − 1)/2 parameters of the matrix dΓ d′ + Σ can be identified (the covariance matrix needs to normalize the diagonal elements to 1 in advance), G * (I − 1) parameters of dΓλ ′ can be identified where (I − 1) parameters can be identified in each row at most, and (I * (I − 1)/2 − 1) parameters can be identified in the matrix λΓλ ′ + Λ as mentioned in the section of methodology.
As shown in Figure 3, we conduct the simulation experiments of two models: one contains three alternatives of the choice model and the other contains four alternatives, with the same settings in the latent variable structural equation model and measurement equation model in which r 12 , r 14 , r 23 , and r 34 are fixed at the value of zero. For model 1 of three alternatives, the simulation experiment is undertaken 50 times for a sample size of 10,000 with different realizations of YU vector to generate 50 different datasets. e estimator is then applied to each dataset to estimate data specific values for the 31 parameters, and the numerical integration method for calculating the CDF of TVN distribution is used for the MVNCD function; for model 2 of four alternatives, we perform the above data generation progress once with a large sample size of 100,000 and then carry out the Monte Carlo simulation experiments on the same dataset using ME, BME, and TVBS methods, respectively. e analytic approximation method of fourvariate normal CDF P(t 1 , t 2 , t 3 , t 4 ) can be written as follows: for the ME method, P(t 1 , t 2 ) × P(t (2) 3 , t (2) 4 ) for the BME method, and P(t 1 , t 2 , t 3 ) × P(t (2) 3 , t (2) 4 )/P(t (2) 3 ) for the TVBS method. e 37 parameters can be estimated in the ICLV model. On the other hand, to generate the datasets of low correlations and high correlations, the positive-de nite covariance matrices C can be calculated by C RR ′ + δ × diag(ru), where R represents a matrix of K × K random univariate standard normal variates and diag(ru) is a diagonal matrix with the K × 1 vector ru of standard uniform random variates on the diagonal. δ is a scalar to determine the relative magnitude of the diagonal elements relative to the non-diagonal elements. We can use the two di erent values of δ to distinguish the matrix with low correlations (δ 10) and high correlations (δ 0). e MVNCD evaluation results from the di erent methods are presented in Tables 1 and 2. Here, we use the Gaussian quadrature integral based on 400 nodes as the exact value when K ≤ 3 and use the value calculated by the CDFMVNe function of the Gauss (up to an accuracy of 1e − 6) as the exact value when K > 3. Meanwhile, the mean absolute error (MAE) and the mean absolute percentage error (MAPE) are chosen to evaluate the accuracy level of the MVNCD evaluation results. e "%MAE > 0.005" represents the percentage in which the MAE was over 0.005, and "%MAPE > 2" represents the percentage in which the MAPE was over 2.

e MVNCD Function Evaluation
From Table 1, the following ndings can be obtained. Compared with the calculated results by the cdfBvn and cdfTvn functions in the GAUSS (https://www.aptech.com/ wp-content/uploads/2016/05/GAUSS_16_PDF_UG.pdf ), the accuracy levels (shown as the MAE) of BVN and TVN probability (based on the improvement in Section 2.3) calculated with the assistance of TensorFlow have been improved from e − 10 to e − 15 and from e − 10 to e − 11 , respectively, with a slight loss in time. When the integration dimension K is 2 or 3, as shown in Table 1, the accuracy level of the MAE for the ME and BME methods is e − 3 , which is far lower than that of bivariate and trivariate normal distribution function calculated by numerical integration. Similar conclusion can be obtained by choosing the MAPE as an index. erefore, numerical integration should be adopted to calculate BVN or TVN values.
When K ≥ 4, we employed the LDLT matrix-based analytic approximation method for the evaluation on Gauss and TensorFlow platforms. From Table 2, when the number of dimensions "K" increases, the performance of the three methods can be indicated by "MAE," "MAPE," "% MAE > 0.005," and "%MAPE > 2." e TVBS method is taken as an example, and there is a variation as the dimension "K" goes from 4 to 15, with "MAE" reducing from 0.00117 to 0.00037, "%MAE > 0.005" reducing from 5.8% to 0.1%, and "%MAPE > 2" increasing from 8.4% to 32.0%. As shown in Figure 4(a), "%MAE > 0.005" gradually decreases and "%MAPE > 2" gradually increases with the increase in dimension of all three methods. To explain the intrinsic reason, we checked the data and found that the overall average value of 1000 MVNCD values calculated by the method of generating random number would decrease with the increase in dimension. Due to the lower MVNCD values in the higher dimension, the relative error (MAPE) may become larger with the higher dimension. Meanwhile, as expected, with the increase in K, the time required by the analytic approximation gradually increased for all the methods. In addition, when we compare the three analytical approximation methods with each other at the same dimension, it can be found that the time consumed by the BME method is the shortest. In general, the time consumed by BME is less than half of that by the ME method and that by the TVBS method falls in the middle. Compared with the ME method, the error of the BME and TVBS methods is relatively low, as both of them are based on the bivariate truncation scheme. Overall, the TVBS is the best in terms of accuracy in evaluating individual MVNCD functions. ese ndings are consistent with Bhat's study (2018) and mutually corroborate.  Furthermore, there are some findings by comparing the calculation error and calculation time of the different analytic approximation methods (i.e., ME, BME, and TVBS) in Gauss and TensorFlow. In terms of calculation accuracy, it can be seen in Table 2 that "MAE," "MAPE," "% MAE > 0.005," and "%MAPE > 2" of the three analytic approximation methods are basically the same. erefore, although the new method improves accuracy levels of BVN and TVN probability in TensorFlow, its contribution to accuracy improvement is limited for high-dimensional analytic approximation; in terms of calculation time, it can be seen in Table 2 and Figure 4(b) that under the same dimension and algorithm, the calculation speed of TensorFlow platform is much higher than that of Gauss platform, and the average running time of Gauss is about 5-6 times that of TensorFlow. As shown in Table 1, there is no significant difference in the calculation time of BVN and TVN values on the two platforms.
us, the declines in calculation time are mainly due to the obvious time advantage of TensorFlow with tensor data structure and graph computation technique.

Simulation
Results for ICLV Model. Following Section 3, the two ICLV models are designed to evaluate the ability for recovering the model parameters. For the three-alternative  ICLV model with the maximum integral dimension being 3, the performance of the improved trivariate normal integral approach in estimating the parameters and the standard errors of the ICLV model is evaluated in Table 3. e evaluation system can be adopted as follows: (1) For the parameter estimates, the mean estimate for each model parameter across the 50 datasets can be obtained. en, the absolute bias of the estimator is computed as Abs.Bias | (estimate value− true value)| and the absolute percentage bias as APB | (estimate value − true value)/ true value| × 100. (2) For the standard error estimates, the standard deviation of each estimated parameter across the 50 datasets can be computed, and this is labeled as the nite sample standard deviation or FSSD (essentially, this is the empirical standard error); also, we can compute the mean standard deviation for each model parameter across the 50 datasets and denote this as the asymptotic standard error or ASE (the standard error here is computed using the Godambe (sandwich) matrix at convergence for a theoretical approximation to the FSSD). Next, to evaluate the accuracy of the ASE formula for the nite sample size used, the relative e ciency of the estimator is computed as relative efficiency ASE/FSSD. As per Bhat [10], the closeness between ASE and FSEE may be examined as captured by the 0.75-1.25 range for the relative e ciency value. e following aspects can be observed in Table 3 (the experiment is repeated 50 times with a sample size of 10,000). Firstly, the overall mean value across parameters of APB is 3.7940%, with the individual parameter APB ranging from 0.1% to 19%, which are close to those in Bhat. Besides, the simulation errors are also similar and therefore considered reasonable and acceptable; secondly, the mean APB values for the d vector (9.0%), λ vector (4.3%), and r in Γ matrix (9.0%) are relatively higher than the overall mean APB. is indicates that the coe cients on the latent variable vector z * in the measurement equation and the choice model are somewhat more di cult to recover than other parameters. It is expected, since these elements enter into the covariance matrix Ω in a nonlinear fashion as shown in equation (6), and Ω itself enters into the composite likelihood function (equation (8)) in a complex manner. Lastly, as for simulation error estimates, the overall mean value of relative e ciency (RE) across the parameters is 1.1190 (and the RE of almost all parameters is between 0.75 and 1.25), which suggests that the FSSE and ASE values are close to each other and the asymptotic formula is performing well in estimating the nite sample standard errors. e simulation results (with a sample size of 100,000) of the four-alternative ICLV model estimated by the three analytical approximation methods, namely the ME, BME, and TVBS, are presented in Tables 4 -6, respectively. In these tables, Abs.Bias |estimate value − true value| and APB | (estimate value − true value)/truevalue| × 100.
As shown in the experimental results, it is ensured that all the identi able parameters can be signi cantly estimated and seemingly consistent with their true values. We can see that the values of absolute percentage bias (APB) fall in the ranges of 0-15% in the three tables, of which the APB of four parameters exceeds 10% (i.e., α 3 , α 4 , d 2 , and σ 34 ). It is a similar nding showing that the parameter estimators related to z * are less consistent. Additionally, in these three tables (also shown in Figure 5), it can be observed that among the three methods (i.e., ME, BME, and TVBS), the overall mean value of absolute bias is 0.0266, 0.0262, and 0.0259, and the average value of APB is 4.269%, 4.240%, and 4.204%, respectively. It is noticed that the values of these two indicators slightly decrease. On the other hand, the standard errors of all methods are about 0.037. It shows that the standard errors of the four-alternative ICLV model seem reasonable and acceptable. Overall, we have validated the ability for recovering parameters of these three LDLT-based analytic approximation methods for the ICLV model, and we can conclude that TVBS is superior to BME and BME is superior to ME. ese findings are consistent with the evaluation results with respect to individual MVNCD functions. Although model parameters can be estimated significantly with a sample of large size, there are issues of empirical identification, namely whether the data can support the model specification. If the empirical data cannot allow for significantly estimating all the theoretically identifiable parameters, some parameters may be fixed and only parameters of particular interest need to be estimated.
Furthermore, more attempts were made to discuss the operational efficiency of models using the automatic differential technique. e convergence time of the three-alternative ICLV model in Table 3 has a median value of about 8 minutes for the case of 10,000 observations, which is much shorter than that of the literature of Bhat in 2014 (about one hour for the case of 1000 observations). It can also be inferred that the computational time of the improved four-   alternative ICLV model based on TensorFlow is reduced. is can be attributed to two factors: one is that the LDLTbased analytic approximation method (instead of the MACML approach) can shorten the calculation time of the ICLV model, as mentioned in Bhat. e improvement on bivariate normal CDF and trivariate normal CDF also facilitates this process; the other is based on the time advantage of TensorFlow in calculating tensor and multidimensional arrays. In addition, in the absence of coding the analytic gradient artificially, the convergence speed of the model can be improved using automatic differentiation to obtain the gradient of the likelihood function. For the four-alternative ICLV model with a sample size of 100,000, the convergence times of three analytical approximation methods are 0.6 0.6477 0.0477 7.9500 0.1004 6.4540 ≤ 0.001

Conclusions and Discussion
On the basis of the research advances of Bhat in 2018 and the automatic di erentiation (AD) technique on the TensorFlow platform, this study integrates the LDLT matrix-based analytic approximation method into the probit kernel-based ICLV formulation, which can further improve the performance of the ICLV model in terms of accuracy level and computational speed. e main achievements of this study can be summarized as follows. Firstly, the individual MVNCD function was evaluated with di erent dimensions. By improving the numerical integration algorithm of CDF of BVN and TVN, the accuracy level of BVN and TVN functions was enhanced in this study (compared with algorithms in GAUSS). Based on the TensorFlow platform, we also tested the errors and operation time of three types of matrix-based analytic approximation methods in various dimensions. It is found that the operation time of BME is the shortest and the accuracy level of TVBS is the highest in the same dimension. By comparing the calculation of MVNCD functions on Gauss and TensorFlow platforms, it shows that the calculation speed of TensorFlow is much higher than that of Gauss.
Subsequently, we designed the simulation experiments of the three-alternative ICLV model and four-alternative ICLV model. e estimates and standard errors of both models are within a reasonable range, of which the parameters related to latent variables may be more di cult to recover. For the three-alternative ICLV model, the numerical integration method of calculating TVN value was employed and the simulation experiments were repeated 50 times with a sample size of 10,000. e simulation results show that the overall mean value of APB is 3.79% and the relative e ciency is 1.1190, exhibiting a desirable capability of parameter recovery. After that, this study applies three analytic approximation methods (i.e., ME, BME, and TVBS) in the MVNCD function calculation of the ICLV model and conducts simulation experiments with a sample size of 100,000 regarding the four-alternative ICLV model, and signi cant results of consistent estimation value can be obtained.
e average values of absolute bias and APB decreased in the order of ME, BME, and TVBS, indicating the increased accuracy. To sum up, when we estimate the ICLV model with high-dimensional integration, the TVBS method can be chosen with the highest accuracy level. Also, the BME method is an alternative to save computational time.
Meanwhile, with the assistance of the AD technique, the researchers are freed from coding analytical gradients. Compared with the probit-based ICLV model proposed by Bhat, the computational time of the proposed model is reduced remarkably, which is mainly attributed to two reasons: one is adopting the LDLT matrix-based analytic approximation methods to evaluate MVNCD function; the other is based on the advantages of TensorFlow in calculating tensor (represented as multidimensional array). e operation of a three-alternative ICLV model with a sample size of 10,000 only takes 8 min with the time advantage of LDLT-based analytic approximation methods, which can facilitate the applications of the proposed method in practice.
In the future work, we will further explore the practical application of the improved ICLV model with multiple alternatives and incorporate more psychological attitude variables into the model so as to explore the potential in uencing mechanism of travel behaviors by using real data. If possible, further comparisons in terms of parameter estimation results between the ICLV model and previous models will be discussed.
Data Availability e data used to support the ndings of this study are simulation data and can be obtained from the corresponding author upon request.