Latent Growth Curve Modeling for COVID-19 Cases in Presence of Time-Variant Covariate

For the past two years, the entire world has been fighting against the COVID-19 pandemic. The rapid increase in COVID-19 cases can be attributed to several factors. Recent studies have revealed that changes in environmental temperature are associated with the growth of cases. In this study, we modeled the monthly growth rate of COVID-19 cases per million infected in 126 countries using various growth curves under structural equation modeling. Moreover, the environmental temperature has been introduced as a time-varying covariate to enhance the performance of the models. The parameters of growth curve models have been estimated, and accordingly, the results are discussed for the affected countries from August 2020 to July 2021.


Introduction
e coronavirus disease  was first reported in Wuhan city, China. Several individuals in Wuhan's seafood market were identified with unknown viral pneumonia [1][2][3]. In the next few months, the virus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), spread to other cities of China and then worldwide. On 30 January 2020, the Director-General of the World Health Organization (WHO) declared the outbreak of COVID-19 to be a public health emergency of international concern. In March 2020, more than one hundred countries were facing challenges due to this virus, and the infection cases from COVID-19 were identified almost all over the world. Since March 2020, there are now specific vaccines available against SARS-CoV-2. In the absence of specific therapeutic drugs or vaccines, controlling the spread of SARS-CoV-2 was nearly impossible, as the health management system of any country was not sufficient enough to deal with this pandemic [4][5][6][7].
According to the WHO reports [8], more than 214 million cases had been reported globally by the end of August 2021, out of which around 4.47 million deaths occurred. Here, it was noticed that the growth rate of infected cases and deaths experienced in different regions were dissimilar. A number of causes can affect the growth rate of cases in a region or country, such as the health management system, government policy, and environmental factors. At the initial phase, partial or complete lockdown and quarantine played an important role in controlling the spread of the virus. e work by Bacchetti et al., in [9], showed that lockdown was highly effective in reducing mortality in more polluted areas at the early stage of the pandemic. Moreover, Marquez et al. [10] concluded that air pollution results in a higher incidence and mortality from COVID-19. Azuma et al. studied the role of various environmental factors in the transmission of SARS-CoV-2 in indoor spaces [11].
In the first quarter of 2020, in a study of COVID-19 cases and the related meteorological factors in 122 cities of China, no evidence was found that the case counts of COVID-19 will decline when the weather becomes warmer [12]. On the contrary, an earlier study in the laboratory by Casanova et al. [13] has verified that SARS-CoV can be inactivated rapidly as temperature increases from 4°C to 40°C. Even though the data were quite limited for the second quarter of 2020, Mandal and Panwar [14] and Shao et al. [15] have suspected that the spread of SARS-CoV-2 may also be affected by the change in temperature. ereafter, many researchers have established the association between the temperature and COVID-19 cases [16][17][18]. For specific geographical regions, the relationship among both in the presence of some other factors was investigated [19][20][21][22].
Modeling of respiratory diseases is always of high priority for researchers. Moreover, the outbreak of COVID-19 presented a new challenge for everyone to deal with this situation. In the last few months, various approaches have been utilized to fit the growth of COVID-19 cases over time. Balli, in [23], has proposed a time-series prediction model to obtain the disease curve and predict the pandemic trend using machine learning methods. For this purpose, linear regression, multilayer perception, random forest, and support vector machine learning methods are utilized. Furthermore, the susceptible-infected-recovered (SIR) model is a well-known and widely used method for respiratory diseases. e classic SIR model was updated by incorporating four new factors that are crucial in fitting the data of COVID-19 cases [24]. Several works have modified the SIR model in the same manner [25][26][27].
Using the generalized logistic and generalized Richards model, Wu et al. [28] have presented the fitting for COVID-19 cases in China; then, a similar exercise was performed for the 33 other countries, which were at a less advanced stage at that time. Moreover, several fractional-order dynamical models for the analysis of the virus spread were proposed [29][30][31][32][33]. Few researchers have attempted the model fitting of the dynamics of COVID-19 cases in the presence of environmental temperature. Shi et al. [34] have used the modified susceptible-exposed-infectious-recovered (M-SEIR) model by incorporating the temperature factor to simulate the COVID-19 outbreak dynamics in Wuhan. In other studies, they examined the associations between epidemiological parameters of the dynamics of new cases and temperature using an autoregressive integrated moving average (ARIMA) model [35]. Moreover, Shah et al. [36] have proposed a compartmental mathematical model for the transmission dynamics of the COVID-19 under the Caputo fractional-order derivative.
e Hilbert-type inequalities play a major role in mathematics for pattern complex analysis, numerical analysis, and qualitative theory of differential equations and their implementation [37][38][39].
Generally, a time-series, cross-sectional, or longitudinal data-based approach is utilized when a response variable is observed with respect to time.
ese methods have suitability concerns and accordingly advantages and disadvantages. In this study, we use structural equation modeling (SEM) with longitudinal data. ese models are generally known as latent curve or growth curve models (GCMs). e rest of the article is organized as follows. In Section 2, various facts have been explored using appropriate plots for cases per million (CPM) and temperature over the months. en, in Section 3, we build various GCMs for all country data and select the most suitable one for further analysis. In Section 4, the temperature has been added as a time-varying covariate in the modeling to enhance the performance of the considered GCM. In Section 5, all results are discussed with their interpretation. Furthermore, the complete article has been summarized and concluded in Section 6.

Exploratory Data Analysis
In this study, the data for global COVID-19 cases have been obtained from https://ourworldindata.org. A total of 126 countries have been considered for cases recorded from August 2020 to July 2021. e CPM given in a month represents the number of cases recorded on the fifteenth day of that month. Accordingly, the monthly temperature is collected for the capitals of all considered countries from https://www.weather-atlas.com. e value representing temperature in a month is the average temperature in that month.
Before starting the analysis, let us explore and discuss some hidden facts about the data. A simple monthly trajectory plot from August 2020 to July 2021 for all countries is given in Figure 1. In this duration, it is quite easy to observe that the growth of CPM in all countries is high in the first month and then stabilizes in most countries in the next few months. Nevertheless, many countries have experienced sudden rapid growth in CPM in the last few months of the year. Figure 2(a) shows a set of box plots to understand the nature of the data over the months. e box plots in this figure show the CPM distribution over the months. It can be seen that the median and mean of CPM increase over the months and the mean is significantly larger than the median in all months. us, the distribution is positively skewed in all months. Moreover, the median and dispersion increase at a large scale over the months. In a few countries, CPM is very high, so these countries act as outliers in the first few months; however, in the last months, almost all match with the nature of the sample. A correlation matrix plot of CPM over the months is also shown in Figure 2(b). Except for the first three months, the correlation is high for months close together in time, but the correlation tends to decrease with increase in the time separation between the measurement months. On the contrary, in the first few months, the correlation decreased for the upcoming months but again started to increase. is is weak evidence; however, it is the very first indication that seasons may correlate with the growth of CPM. Moreover, a few basic statistics to understand the characteristics of the observed data are given in Table 1.
e distribution of global temperature over the months is given in Figure 3. It can be seen from Figure 3(a) that the global temperature distribution shifts upwards in the first half of the year and then in the second half it goes downwards. However, it does not mean that all the countries follow the same pattern.
is can be observed from Figure 3(b) in which, for each month, a density plot has been sketched. e density plots are multimodal because over the months each country contributes to temperature 2 Computational Intelligence and Neuroscience distribution according to its position on the global map. In general, as latitude increases the temperature decreases. is latitude-wise varying pattern of temperature may have a significant impact on the growth of CPM for respective countries.

Model Building and Elicitation
e traditional methods for studying the changes in the linear and nonlinear framework are regression and analysis of variance (ANOVA). ese approaches basically deal with  Computational Intelligence and Neuroscience 3 the mean level differences, and among individuals, changes are observed from residuals. To utilize the information from residuals several methods such as random effect ANOVA, multilevel modeling and hierarchical linear modeling have been proposed. ese models explore the differences among individuals with the help of random coefficients. However, the limitation of such models is that they are based on a single response variable. A single response variable is not able to capture all the complexities of a growth model (see [40]).
As the objective of this study is to observe the intraindividual changes and interindividual differences for all countries over the considered time period, so, to possess such characteristics, a structural equation modeling for longitudinal data has been proposed. GCMs, which are generally applied for modeling in social and behavioral sciences, are used for studying such changes. For GCM, response variables are observed over the ordered time periods whereas some time-invariant or timevariant covariates may also be present. e basics of GCM and hypothesis testing for individual change and interindividual differences are discussed and derived in [41]. In [42], authors have discussed GCM for different models and analyzed the cortisol production data. For more details about longitudinal studies using growth curve models, one may refer to [43][44][45][46]. For understanding the models based on latent variables with respective R code, one may follow [47].
As the primary objective of this study is to find a better model for CPM trajectory, as shown in Figure 1, so, in this section, some possible GCMs such as linear, exponential, latent, and multiphase have been introduced which could provide a better substitute for CPM fitting. e general structure of the GCM can be given as where [t] n is a multioccasion vector which represents the observed value of CPM for n th country for t th month. e set of vectors Λ 0 , Λ 1 , , Λ k is collectively responsible for the intraindividual change, i.e., each of these captures the growth of CPM in a country over the months. is vector defines the shape of the interindividual change such as linear and exponential for a country. e latent or unobserved variables which are denoted by τ 0 , τ 1 , , τ k define interindividual differences in intraindividual change among countries. In the defined model by (1), each interindividual difference variable τ 0 , τ 1 ,. . ., τ k is associated with the corresponding intraindividual change variable Λ 0 , Λ 1 , . . ., Λ k . Generally, the set of latent variables τ 0 , τ 1 ,. . ., τ k has a multivariate normal distribution with mean vectors (µ 0 , µ 1 , . . ., µ k ) and random variances and covariances σ ij ; i, j � 1, 2, . . . , k.
e mean vector captures the pattern of intraindividual change and variances, and covariances represent the extent to which countries differ within and between. e time-dependent residual variable, ε[t] n , is assumed to have a   Computational Intelligence and Neuroscience mean 0 and the same variance, σ 2 , at each occasion; also, it is assumed to be uncorrelated with other variables. Let us first describe and derive some GCMs using (1) and then choose the best-fitted model for CPM among them. Few appropriate GCMs which are considered for trajectory fitting are linear, exponential, latent, and multiphase.

Linear Growth Curve Model.
In a linear GCM, the growth of the outcome variable is in the form of a straight line which may be in a positive, negative, or constant direction over the time periods. A linear GCM can be described by two vectors, Λ 0 and Λ 1 , for different countries over the months from model in (1). As the model is applied from August 2020 to July 2021, hence, we have 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 where Λ 0 is used to describe the initial level of measurement of the outcome variable when the other effects are 0, whereas Λ 1 is responsible for the growth or decline in C n . is means that the countries can differ from each other in two ways such as their latent intercept (τ 0 ) and latent slope (τ 1 ). All entries for Λ 0 are fixed to 1, this means that intercept affects all measures with equal scores across months. . e initial level amount of outcome variable is depicted by τ 0 , and after then, at each successive time periods, the linear and quadratic changes are governed by vectors τ 1 and τ 2 .

Exponential Growth Curve Model.
In exponential GCM, the two vectors, Λ 0 and Λ 1 , are responsible for the exponential intraindividual change, and these can be defined such as 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 where λ can be estimated from the observations. e interindividual change among the countries is depicted by two latent random variables τ 0 and τ 1 . Here, the random variable τ 0n can be interpreted as the maximum level of C n . e sum of latent slope score and latent asymptotic score (τ 0n + τ 1n ) represents the value of C n [0]. e random variable τ 1n represents a country's potential for change in n [t] from initial level to upcoming months. e parameter λ indicates the rate at which the level of C n [t] changed to the asymptotic level and here is modeled as being identical for all countries, meaning that the rate at which any individual's n [t] level changes is unidirectional (either continuously increasing or decreasing toward his or her asymptotic capacity, τ 0n ) and constant (exponentially) across the entire observation period from August 2020 to July 2021.
is assumption can be relaxed in further studies.

Latent Growth Curve Model.
e basis coefficients for a latent GCM are estimated freely so that the optimal change in trajectory can be achieved as per the nature of data, whereas, in earlier discussed models, the basis coefficients have been fixed in advance. Here, the basis coefficients are defined such as 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], We fixed the first and last basis coefficients as 0 and 1, as it is necessary for the model identification. In the case of latent GCM, the nonlinear pattern of intraindividual change is captured by vector Λ 1 [t] and a single interindividual difference variable τ 1 .

Multiphase Growth Curve Model.
A multiphase GCM is based on different spline regression models that are connected for different time slots. As it is observed that various countries are facing a number of COVID-19 waves, so a multiphase model may be a good choice for CPM modeling. Figure 1 shows such pattern where the rate of change of COVID-19 cases is not uniform over considered months. From many possible multiphase GCMs, particularly, MP [3,4,5] , has been taken for modeling. e suffix vector specifies the length of phases considered in the model. As the data are taken from August 2020 to July 2021, hence, the vector [7,25,37] denotes three phases are taken as Phase I (August 2020, September 2020, and October 2020), Phase II (November 2020, December 2020, January 2021, and February 2021), and Phase III (March 2021, April 2021, May 2021, June 2021, and July 2021). In the MP [3,4,5] model, Phase I is known as baseline phase and modeled via Λ 0 , whereas Phase II and Phase III are modeled via Λ 1 and Λ 2 , respectively. Accordingly, the three intraindividual change vectors become 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 0, 0, 0, λ 3 , λ 4 , λ 5 , 1, 1, 1, 1, 1, 1, 1 , where the values of λ js can be estimated from the data. e interindividual difference is governed by the latent random variables τ 0 , τ 1 , and τ 2 . e means of the latent variables τ 0 , τ 1 , and τ 2 represent the average baseline C[t] n level, amount of [t] n change in second phase, and amount by which [t] n gains in the last phase, respectively. Simultaneously, the variances of the latent variables represent the extent to which countries differ in these aspects of intraindividual change and how interindividual differences in Computational Intelligence and Neuroscience one aspect are related to interindividual differences in the other aspects which can be defined by their covariances.
To find the best among the considered models, a few well-known model fitting criteria, such as Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Tucker-Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA), and χ 2 -statistic with degrees of freedom (D.F.), have been taken. Lower values of AIC, BIC, RMSEA, and χ 2 -statistic and higher values of TLI statistic indicate the choice of a suitable model. Based on these criteria, it can be seen from Table 2 that the multiphase model, MP [3,4,5] , is performing better than others. So, it can be concluded that multiphase GCM is most appropriate among the considered models. e estimates of the coefficients for all considered GCMs are given in Table 3 and discussed in Section 5. For the multiphase GCM, the structure plot is given in Figure 4. In the structure plot, the dotted lines show the fixed factor loading and the dark line shows the estimated elements of factor loading. It is expected that, with the addition of suitable covariates, the overall performance of the multiphase GCM may also improve.

Modeling with Time-Varying Covariate
Many factors are influencing the growth of CPM, but the environmental temperature has its own significant impact. Earlier, in Section 2, it has been shown that there may be some dependency between CPM and temperature. Also, from Figures 2(a) and 3(a), it can be seen that the monthly gain in CPM depends on the pattern of global temperature. If the temperature is added as a covariate in all country data, then the GCMs will perform robustly.
In general, two types of predictor variables are used in longitudinal studies. e covariate is constant over the measurement time called time-invariant covariate (TIC) and is varying over time periods called time-varying covariate (TVC). Here, in this study, the temperature is a TVC over the months, which have a direct impact on the CPM with some coefficient, say, c [t]. Now, the model defined by (1) can be redefined in the presence of TVC as follows: where T [t] n represents the temperature of n th country in the t th month. All GCMs in presence of TVC are performing better than the respective models introduced in the previous section. Also, we tried different possible combinations of phases to construct multiphase models with TVC. Among the considered models, MP T [3,4,5] is outperforming with AIC, BIC, TLI, RMSEA, and χ 2 -statistic (d.f ) values 18722.02, 18843.50, 0.79, 0.07, and 361. 44(194), respectively. In Table 3, the estimate of coefficients for all considered GCMs with TVC has been given. For the multiphase GCM MP T [3,4,5] , the structure plot is given in Figure 5. In the structure plot, the dotted lines show the fixed factor loadings and the dark lines represent the estimated elements of factor loadings and estimated coefficients for TVCs.

Results
In this section, we are discussing about the results from Tables 3 and 4. In Table 3, the estimate of parameters of four GCMs has been given. For linear GCM, the mean baseline level of C[t] n is 77.1223 (τ 0 ) and then growth is measured with an increment of 0.0554 (τ 1 ) over the months. In quadratic GCM, the mean baseline level of the outcome variable is 77.5197 and amount of suppressing and increment in linear and quadratic phases is 0.3052 and 0.0240 respectively. In exponential GCM, the baseline level is 76.9430 which is the sum of τ 0 and τ 1 . After this, growth in [t] n is observed with an exponential rate of 15.9286 to some limit or capacity of a country. e average baseline value from latent GCM is 77.2063 and the average total amount of growth is −0.2633. e estimated value of C[t] n at any month can be calculated by [77.2063 + (−0.2633)λ t ], e.g., in Oct 2020 the estimated value of C[t] n is 76.99637. For the considered multiphase model, MP [3,4,5] , the average baseline value is 77.0714. e average growth amount from November 2020 to February 2021 is −0.5005, and the additional growth amount is 0.2096 from March 2021 to July 2021.
Similar to latent GCM, one can estimate C[t] n using the estimates of respective coefficients for the multiphase model also. e covariances for τ 0 , τ 1 and τ 2 for all models are also provided in the table.
e variance terms represent the extent to which countries differ at the initial level in intraindividual change and the covariances indicate interindividual differences.  C [11] C [12] T [1] T [2] T [3] T [4] T [5] T [6] T [7] T [8] T [9] T [10] T [11] T [12] τ 0 τ 1 τ 2 Figure 5: Structure plot for MP T [3,4,5] . In Table 4, estimates of coefficients are observed for various GCMs in presence of TVC, temperature. In this table, all coefficients can be interpreted in a similar manner as in Table 3, except regression coefficients due to covariate. e regression coefficients can be defined as one unit change in temperature at time t which is associated with c[t] unit change in [t]. Here, it is noticeable that, almost in all months, temperature is negatively associated with the growth of CPM.

Conclusion
In this study, growth in CPM due to COVID-19 is considered a variable of interest. For different countries, the trajectories of CPM are studied from August 2020 to July 2021. e intraindividual change and interindividual differences were captured using linear, exponential, latent, and multiphase GCMs. Based on certain criteria, the multiphase GCM performs better than the other models. erefore, it can be preferred for analysis purposes. A number of factors are responsible for the rapid growth of CPM in a country. Moreover, these factors impact different countries with different weights. us, in this study, environmental temperature is considered a covariate that significantly impacts the growth of CPM. Different GCMs were fitted to the data without and with a covariate. Based on various fitting criteria, it is noticed that GCMs improve when the temperature is introduced as a covariate. So, we can say that temperature may be one of the reasons responsible for the changes in CPM over the months. Nevertheless, other possible factors may have an important role in the growth of CPM and can be included in the model for further study. e inclusion of other factors in models may improve results. Furthermore, for the study of growth in CPM for a particular region, there may be differences in the modelbased outcomes.

Data Availability
e data used to support the findings of the study are obtained from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.
Computational Intelligence and Neuroscience 9