Posterior Estimates of Dynamic Constants in HIV Transmission Modeling

In this paper, we construct a linear differential system in both continuous time and discrete time to model HIV transmission on the population level. The main question is the determination of parameters based on the posterior information obtained from statistical analysis of the HIV population. We call these parameters dynamic constants in the sense that these constants determine the behavior of the system in various models. There is a long history of using linear or nonlinear dynamic systems to study the HIV population dynamics or other infectious diseases. Nevertheless, the question of determining the dynamic constants in the system has not received much attention. In this paper, we take some initial steps to bridge such a gap. We study the dynamic constants that appear in the linear differential system model in both continuous and discrete time. Our computations are mostly carried out in Matlab.


Introduction
Patients infected with Human Immunodeficiency Virus (HIV) are very likely to develop Acquired Immunodeficiency Disease Syndrome-(AIDS-) related diseases that are usually fatal if not treated with effective antiretroviral therapies. Since the discovery of HIV in 1983, an efficacious vaccine is yet to be developed to fight the deadly virus. Although highly active antiretroviral therapies (HAART) invented in mid-1990s have saved millions of lives and deterred the disease progression of those infected, HIV infection remains a public health threat. Reducing the risk of HIV transmission is of top priority.
One particular challenge in HIV prevention is its long period of latency period. The average time of an HIV infected patient to become symptomatic with AIDS-related diseases can be more than 10 years [1]. In the sexual transmission of HIV, many of the HIV infected patients may not be aware of their HIV infection status, and the virus continues spreading to their HIV negative partners. Therefore, an indepth understanding of HIV transmission is the key to successful HIV prevention.
HIV dynamics have long been studied in the field of mathematical epidemiology using linear and nonlinear models [2,3]. The classic model in epidemiology is the SIR model, which considers the dynamics of the susceptible, infected, and recovered populations [4]. This model is not useful for HIV dynamics, as there is no recovered population. An extension of this is the SEIR model, which includes the population of individuals who are exposed but not yet infected. The period between exposure and infectiousness in HIV lasts about two to four weeks [1]. Since a recovered population does not exist, we can consider this period to have a negligible effect on population dynamics.
Hierarchical models are common in HIV modeling due to the high correlation between risky behavior and HIV incidence [5]. In this paper we will incorporate risk indirectly by considering diagnosed and undiagnosed populations. Intuitively, diagnosed individuals would modify their behavior relative to their behavior prior to the diagnosis.
In this paper, we shall form two models: a continuous time linear differential model and a discrete time differential model. These models are the most fundamental among their kinds. The focus will then be given to determination of the parameter estimates, the dynamic constants in these models. As we will show in this paper, the estimates of the dynamic constants depend on the type of model as well as the qualitative properties of the models.
There are two important dynamic constants in our model, namely, the transmission rates for diagnosed HIV population and for undiagnosed HIV population. One important finding in our study is that the transmission rates for the diagnosed and undiagnosed infected populations are comparable. This leads to our conclusion that the transmission rates should be attached to different groups of susceptibles based on their risk level.

General Nonlinear Differential Model
One of the frequently used mathematical models for HIV population dynamics can be described as follows. Let ( ) be the susceptibles. We divide the HIV positive population into two groups: 0 is the populations that are unaware of the infection; 1 is the populations that are aware of the infection. Let be the mortality rate for the group . Let be the growth rate of the susceptibles. Let 0 be the transmission rate of 0 group and let 1 be the transmission rate of 1 group. Then we have the following nonlinear differential equations: Here 1 1 ( ) ( ) counts for those who are infected by group 1 (per unit time), and among them is the proportion of those who are aware of their infection. The constant denotes the rate of the HIV positive population in 0 group who become aware of their infection (per unit time). So there is a flow of 0 ( ) from group 0 to 1 once a member from 0 finds out his/her infection through HIV testing.
Many variations of this nonlinear dynamic model have been considered and appeared in the literature to study the HIV population dynamics. For example, in [6], mortality rate of the susceptibles is considered and appears in the differential equation of ( ). In addition, the parameters are allowed to change but are piecewise constant.
In our differential equation model, we have a few constants: , , 0 , 1 , 0 , and 1 . These constants essentially determine the qualitative and quantitative properties of the mathematical model. We shall call these constants the dynamic constants of the model. Notice that some of the constants, like , may have prior estimates, based on the data collected directly from the groups and . Some of the constants, like , will have posterior estimates. The constants , may have prior estimates. Our main focus here is to give posterior estimates of these constants.
We shall remark here that the dynamic constants are model-dependent. This might not be obvious. Even though many of them can be estimated statistically without reference to any models, applying these estimates directly to the model may be problematic, as we shall see in the next section. In this paper, we take some initial steps to estimate the model-based dynamic constants.

Linear Differential Model and Preliminary Discussions
We shall now build a simpler linear model. The main assumption is that the susceptible population is a lot larger than 0 and 1 . The change of susceptible population, due to HIV infection, is quite small, comparing with the overall size of susceptible. Therefore, we may ignore the dynamics of susceptible population, by assuming that the susceptible population is a constant. This more or less justifies the use of linear system only involving 0 and 1 . Let us start with the HIV transmission rate estimates by Pinkerton [7]. The estimates of transmission rates are 0 ≅ 0.0927, are estimated in terms of infection transmitted per person per year. Since the overall susceptible population is a lot larger that , we can assume that HIV transmission events are proportional to the size of 1 and 0 . Based on this hypothesis, we may model HIV transmission by linear differential equations: The dynamic constants and remain unchanged. It is also known that 1 ≅ 1.9% [8]. There is no statistics done on 0 . So we can assume 0 ≅ 1.9% as well.
Next, we shall apply the known estimates and study our linear differential model. Notice that remain unknown at this moment. According to [8], is somewhere around 1/4. We may tentatively set = 1/4. Utilizing the estimates of dynamic constants directly from [7,8], let us consider several cases.
3.1. = 4/5. We start by assuming that takes the value of the overall portion of those who are aware of their infection. Now we have the following linear equations: Computational and Mathematical Methods in Medicine 3 We found that the two linear independent solutions have growth rate of However, we know that the growth of 0 + 1 is about 0.048. Hence our assumption = = 4/5 is not valid. Even if we ignore the mortality rate, we have This is still far below the estimated 4.8% growth rate.
3.2. = 1 or = 0. One extreme is that = 1, meaning that the population infected by 1 gets tested and becomes aware of their infection (within the first year). We have Under this assumption 0 will decrease at the rate of −0.268, which means that the population 0 will gradually vanish in a few years. This cannot be true. Another extreme is that = 0, meaning that the population infected by 1 will be initially unaware of their infection (within the first year). We have We have The overall HIV population growth will be less than 0.016. This is quite small comparing with the estimate that the growth rate is about 0.048.

, Not Fixed.
One might conclude that must be a much smaller number than 1/4, what we have initially assumed. We let and be unfixed. In this case, we have We have the matrix We know the growth rates are controlled by the eigenvalues of . In particular, we might assume that det( − ) = 0 with = 0.048. This will guarantee that the dominant term of the solution will grow at the rate of 0.048 (per year). Hence we obtain Simplifying it, we have Since 0 ≤ ≤ 1, we find that 0.043 ≥ ≥ 0.0258. This suggests that there are between 2% to 5% of 0 getting tested. This percentage seems to be too low comparing with the CDC estimate of about 25%.
We shall remark that our discussion is based on the estimates that 0 = 0.0927 and 1 = 0.0268 [7]. As we have seen, directly using these estimates as dynamic constants in differential equation modeling will be inadequate to produce the right kind of outcomes and trend. In this paper, we shall discuss posterior estimate of parameters and hope to find some remedy.

Posterior Estimate of Parameters
In our earlier discussion, we directly insert the transmission rates from the statistical analysis into the linear differential system. The result is not satisfactory. It is desirable to estimate the transmission rates that will produce the right kind of outcome from the linear differential system model. Let us recall the CDC data from 2007 to 2013 (in thousands) [8].
We first simplify our notation. Let N = ( 1 0 ). We rewrite our linear system as where The general solution to this system is Here 1 , 2 are eigenvalues of . They can be both real or complex. There is also a degenerate case 1 = 2 that we do not treat here. The behavior of the linear differential system is quite different in these two cases. It is not surprising that we need to use two different methods to estimate the matrix M.
Notice that the dominant term ( 892.3 115.57 ) exp 0.0273 suggested the overall rate of growth of HIV infected population grows at the rate close to 2.73%. This seems to be reasonable. But , the rate of flow of population from 0 to 1 , is estimated at −4%. This is completely off the mark. One remedy is that we first estimate the dominant term and then estimate the remainder.
where is sometimes called a phase constant. A simple curve fitting shows that So 1 is neglectable and 0 is about 21%. This again makes the model invalid.

Discussion.
In this section, we choose dynamics constants to fit the temporal data. We have found that these dynamic constants depend on the qualitative properties of the model. Yet, none of the dynamic constants we choose match perfectly with the existing estimates. One reason is that yearly data is not suitable for a continuous time model. Therefore, we shall explore the discrete time model.

Discrete Dynamic Model
We may regard N ( = 1, 2, 3, 4, 5, 6, 7) as a discrete time dynamical system. Let us assume that this discrete dynamics is defined by a transition matrix T: In principle, based on our earlier discussion, Now we would like to estimate T.

Basic Estimates.
The easiest way to find T is by considering the following matrix equations: (37) We can see some consistency among these transition matrices. For example, the (2, 1)-th entry has been around 3%. This translates into This is the rate of transmission for group 1 . It seems to be consistent with the estimate of [7]. (40)

(Arithmetic) Average Estimate of T. Now we may average all T's and obtain
Our estimate yields that ≅ 18%; in other words, about 18% of those unaware of their infection will become aware of their infection next year. We also have If the mortality rate 0 is set to be 0.019, then we have 0 = 0.017. Similarly, we have

Least Square Estimate of T.
Perhaps a good way to estimate T is the least square method. We write Applying the least square method, we find that the least square solution to T is This estimate seems to be better than the arithmetic average, in the sense that, irregularities will have smaller effect on the least square solution. Because we can reorder N 's and the least square solution does not change, we also avoid the pitfall that N and N +1 are correlated. We have our posterior estimates: This estimate is similar to the arithmetic average we just computed. The dynamic constant estimates will be very similar. We shall then look for a solution that is more robust. One particular reason that the least square estimate is not satisfactory is that there are additional relations like that least square method does not take into consideration. In other words, T 2 , T 3 can also be estimated and shall be taken into consideration when we estimate T. We shall offer one remedy that avoids this issue.

A More Robust Estimate.
One of the problems with our estimate is that N and N +1 are correlated to each other. As a remedy, we pick N 1 and N 6 as far from each other as possible.
We observe that We compute Then We have ≅ 0.1681,

5.5.
Discussion. This estimate of M is quite consistent with the least square estimate. Our estimate seems to suggest that the transmission rates for 0 and 1 may be in the similar range. By [8], assume that 0 = 1 = 0.019. We have Every year about 18.5% of those unaware of their HIV positiveness become aware of their infection due to testing. About 40% of those infected by 1 group become aware of their infection. This seems to be consistent with some of the observations in [7], with one exception; namely, in our estimates, the transmission rates for 1 and 0 are very close. Figures 1 and 2 show the difference between TN and N +1 .  Source: [8].

Arithmetic
vector N( ), suppose that N( + 1) = TN( ) with transitional matrix T. How should one estimate the matrix T?
As we discussed earlier, we can use least squares with the equations The least square estimate of T, in some sense, is very similar to the arithmetic mean of the transitional matrix T. But what makes better sense is a geometric mean. More precisely, we have to take into consideration that Suppose that T is the transitional matrix at time . Then a good estimate of T should be the "geometric average" of T . For scalars, one can define the geometric average of 1 , 2 , . . . , to be the th root of ∏ . But matrix multiplications are not commutative and one cannot define the geometric average of matrices. It remains a challenging problem to define computationally a geometric mean of T .

Roots
Estimate. Tentatively, we may define the geometric mean by taking roots. For example, we may now consider We can also consider We have Both estimates of T are consistent with the least square estimate and the estimates in the previous section. Above all, all our estimates point to the same range of transmission rates for both 0 and 1 .

Concluding Remarks
Now we shall compare our dynamic constant estimates in the linear differential model in continuous time and in discrete time.
In the continuous time model, we obtain the transmission rate 1 of about 4% for the 1 group, those who were aware of their HIV infection. Nevertheless, the rate of flow from 0 to 1 due to HIV testing turned out to be too low and 0 often came out to be negative, which cannot be the case. The best results are obtained when we assume the two eigenvalues are complex. In this case Yet seems to be quite low. According to the CDC report [8], is estimated at about 25%.
There are various reasons why our dynamic constants are inconsistent with known estimates. Firstly, the CDC data we use tends to underestimate the 0 and 1 population sizes, particularly in more recent years. The CDC estimates the sizes of the populations infected with HIV by back calculation. The estimates for any given year will increase as new diagnoses are obtained. HIV may go undiagnosed for up to 10 years without causing the death of the patient (Table 1) [5]. Depending on the stage of the disease, the individual will be counted as undiagnosed for a number of years prior to the diagnosis. This causes the estimates of the population sizes to be smaller than the actual size of the population. A new estimate of the HIV prevalence agrees with this conclusion [9]. Although this new estimate is more conservative than the back calculation method, it may still underestimate the 0 population. Both estimates show a downward trend in the data, but this is likely to be erased as more individuals are diagnosed in the later stages of the disease.
Secondly, our computations assume that the susceptible population is much larger than the infected population. However, failing to obtain the right estimates suggests that opposite may be true-the susceptible population could be much smaller. HIV infection is overrepresented in some subpopulations, such as men who have sex with men (MSM) [1]. This subpopulation is only about 5% of the US population, or approximately 15 million individuals. Not all MSM can be considered to have equal risk of contracting the disease, and since over 1 million individuals are currently living with the disease, the susceptible population may be comparable to the size of the infected population. For this reason, any differential system model of the HIV transmission must include the susceptible population. The discrete dynamics model seems to be most robust against the bias caused by back calculation and the size of the susceptibles. By ignoring the 2013 year's data, we are able to determine the transmission rates as 1 ≅ 0.0385, Also ≅ 0.184, not too different from the CDC estimate 0.25. We see that the dynamic constants in the discrete time model are less affected by the underestimation caused by back calculation. It is also true that the relative size of susceptibles has less effect on the discrete time model than on the continuous time model.
Finally, our estimates of the transmission rates for diagnosed and undiagnosed HIV population are relatively close. This is very different from the previous estimate, where the transmission rate of the undiagnosed population is about 4 times as high as the diagnosed population [7]. This implies that the transmission rates should be attached to the susceptible population. It makes sense to divide the susceptible population into groups depending on the possible transmission rates for them. We shall pursue this in a different paper.

Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.