Drug Treatment Effect Model Based on MODWT and Hawkes Self-Exciting Point Process

In precision medicine, especially in the pharmacodynamic area, the lack of an adequate long-term drug effect monitoring model leads to a quite low robustness to the instant drug treatment. Modelling the effect of drug based on the monitoring variables is essential to measure the drug benefit and its side effect preciously. In order to model the complex drug behavior in the context of time series, a sin function is selected to describe the basic trend of heart rate variable that is medically monitored. A Hawkes self-exciting point process model is chosen to describe the effect caused by multiple and sequential drug usage at different time points. The model considers the time lag between the drug given time and the drug effect during the whole drug emission period. A cumulative Gamma distribution is employed to describe the time lag effect. Simulation results demonstrate the established model effectively when describing the baseline trend and the drug effect with low noise levels, where the maximal overlap discrete wavelet transformation is utilized for the information decomposition in the frequency zone. The real data of the variables heart rate and drug liquemin from a medical database is analyzed. Instead of the original time series, scale variable s4 is selected according to the Granger cointegration test. The results show that the model accurately characterizes the cumulative drug effect with the Pearson correlation test value as 0.22, which is more significant for the value under 0.1. In the future, the model can be extended to more complicated scenarios through taking into account multiple monitoring variables and different kinds of drugs.


Introduction
Pharmacokinetics (PK) and pharmacodynamics (PD) are aimed at building mathematical models to extract scientific basis of modern pharmacotherapy. Specifically, pharmacokinetics describes the drug concentration-time courses in body fluids resulting from the intake of a certain dose of drug, and pharmacodynamics describes the observed effect arising from a certain drug concentration [1]. Between PK and PD, PD is more important for describing the variation of body conditions after drug treatment. Following a drug treatment, the original organ function can be enhanced or suppressed. Thus, determining the accurate dosage of the drug is rather essential in the drug prescription. During the treatment process with a given drug dosage, the body condition may be improved but may also suffer some untoward reaction or adverse reaction, including side effect, toxic reaction, allergy reaction, secondary reaction, residual effect, and teratogenesis. Therefore, drug treatment can be helpful to the patient but might be harmful as well, and monitoring the drug effect by observing the clinically monitored variables plays an important role in understanding the mechanism.
The major objective of PD is to explore how the drug influences the monitored variables including heart rate, cardiac output, and mean arterial pressure. Modelling the drug effect can help improve medical precision by applying more suitable treatment for the patient under a target criteria of the monitored variable. Drug effect modelling is critical to realize the online forecasting of the dynamic status of patient's disease. On the other hand, this can reduce the expenditure of medical resources including human resources like physicians and caregivers, drug usage, and economic cost.
Due to the special property of drug effect, the influence of the drug on patient is complex. The instant usage of drug has a long-term and dynamic effect on the patient. For example, there may exist a time delay for some drug to show the medical effects, which may be half an hour in some special circumstances. The time length taken by the drug effect is different not only among different drugs but also for different intake methods. For epinephrine, the effect of intramuscular injection is approximately maintained between 10 and 30 minutes, while the effect of subcutaneous injection can take as long as around 1 hour. Drug effect may increase rapidly and then decrease slowly afterwards before finally vanishes. Before the drug effect vanishes, more drug dosage may be required to be applied to the patient. Besides, multiple drug usages can cause multiple cumulative effects. Determining the correct drug dosage becomes quite challenging as the monitored variables vary with an unpredictable trend.
To give a precise dose under a target value of the monitored variable, the whole past drug usage that is still effective should be taken into consideration, as their influence on the patient's future health condition still affects the current required dosage.
In this work, the influence of drug usage on the monitored variables is specifically analyzed, particularly in terms of the cumulative effect from the past drug usage. The rest of the paper is organized as follows: Section 2 gives the research review of drug effect modelling including the state of the art and our contributions, Section 3 presents the method proposed in this research, Sections 4 and 5 show the analysis and the simulation using the real medical data, and Section 6 presents a conclusion of the results and the viewpoints on the further research. All computations are implemented using the software R [2] of the version 4.0.2, and "waveslim" [3] was used for the wavelet decompositions. The hardware platform is iMac Pro (2017) configured with the processor 3.2 GHz 8-Core Intel Xeon W, the memory 32 GB 2666 MHz DDR4, and the Graphics Radeon Pro VEga 56 8 GB.

Literature Review
The approaches to establishing the models for characterizing the effect of drug on electrocardiogram (ECG) signals like heart rate can be generally divided into three main groups. The pros and cons of the available approaches are compared in detail in the following. The first group includes the typical statistical methods like linear regression, logit regression, analysis of variance (ANOVA) and basic statistical description methods like box plot. In order to study how the choice of anesthetic agent can greatly influence CSF tracer influx, Hablitz et al. [4] used the linear regression analysis. Linear regression with extensions like Lasso, Ridge, and Elastic-net such penalized linear regression methods are good at explanation of the real application background and are simple to understand. Capel et al. [5] used 1-or 2-way ANOVA to investigate the propensity of hydroxychloroquine to cause bradycardia. ANOVA is effective to compare the drug efficiency difference among two or more comparison experiments. In the research of Sun et al. [6], pairwise network meta-analyze was performed using DerSimonian-Laird random effects model to analysis the impact of GLP-1 receptor agonists on blood pressure, heart rate, and hypertension among patients with type 2 diabetes. Gilbert and Krum [7] analyzed the effects of antihyperglycemic drug therapy on heart failure in diabetes by using the meta analysis as well. The pros and cons of this type of methods include that these methods are rather efficient when the data has simple format including cross section data, and the research purpose is more focused on producing the medical experiment results instead of the method improvement. When the data structure is complicated or the objective is lied on the methodology, more interests are concentrated on the subsequent second or third group of methods.
The second type of drug effect models mainly includes machine learning and deep learning methods such as ensemble decision tree methods (AdaBoost or XGBoost), neural networks (convolutional neural network and longshort term neural network), support vector machine, and Bayesian classifiers [8,9]. For example, Sherman et al. [10] used machine learning methods to identify drug-cancer cell interaction based on some large in vitro databases. Bresso et al. [11] applied the methods decision trees and inductive logic programming to characterizing each drug's side-effect profiles in terms of drug and target properties. Costabal et al. [12] characterized the effect of 30 drugs on the QT interval using Gaussian process regression, sensitivity analysis, and uncertainty quantification. In the research of Juhola et al. [13], several machine learning methods are compared in the analysis of drug effects on iPSC cardiomyocytes, including decision trees, K-nearest neighbor nearest searching, multinomial logistic regression, and least squares support vector machines. Results show that random forests classifier and least-squares support vector machines have better performance than the other methods. Ekins et al. [14] compared the performance of different machine learning methods for end-to-end drug discovery and development including Naïve Bayesian, support vector machines, and more recently concerned deep neural networks. Madhukar et al. [15] proposed a Bayesian approach to the drug target identification using diverse data types. Soft computing techniques [16,17], which although have not been directly applied to the drug effect modelling, can still be the promising solutions to the problem. The pros and cons of this type of methods include the following aspects. It is pretty impressive that the machine learning methods have high prediction accuracy and many of them have reliable generalization. However, most of the machine learning methods are based on quite complex parametric systems and lack of interpretability of parameter meanings, such that may not be the ideal tools for achieving both high accuracy and good explanation.

2
Computational and Mathematical Methods in Medicine The third type of drug effect models mainly includes mathematical models such as Cox hazard model based on ordinary differential equations and partial differential equations. Chatterjee and Ahmad [18] applied the fractionalorder differential equation model to analyzing the COVID-19 infection of epithelial cells. Huang et al. [19] gave the hierarchical Bayesian inference for HIV dynamic differential equation models which incorporated multiple treatment factors. Thirumalai et al. [20] proposed the fractional differential equations based method to analyze the combined drug therapy for HIV infection. Leander et al. [21] used the stochastic differential equations to analyze the mixed effects involved in pharmacokinetic data of nicotinic acid. Fuentes-Gari et al. [22] have compared the performance of different differential equations in the leukemia treatment, from which it can be found that stochastic differential mixed effects models are useful tools for identifying incomplete or inaccurate model dynamics. In general, those mathematical models have sound performance in both prediction accuracy and explanation.
The motivation of this research is that although the existing results have achieved good performance in PD, especially on drug effect modelling, they still have some shortcomings unsolved due to the complex background of pharmacodynamic, that is, the instant usage of drug can have long and complicated effect on the patient that is shown with the monitored variables. To illustrate this complicated effect, a stochastic modelling approach called Hawkes point process model is introduced to describe stochastic point process. In this work, a Hawkes point process model [23], namely, the self-exciting point process [24] is proposed to characterize the complex drug effect. The essential property of self-exciting point process is the occurrence of any event increased the probability of further events occurring which is consistent with the behavior of drug on patient. This model origins from the area of earthquake behavior analysis [25] and is recently extended to the fields of finance [26], disease prediction [27], and social media [28], where the model has shown satisfactory performance, but it has not been used for drug effect modelling yet.
The acquired data sets of the input variables in this work are time series that have clearly nonstationary property. They are also stochastic time series involved with random noise. Instead of using the original data in the model, the maximum overlap discrete wavelet transform (MODWT) is applied for the feather extraction [29]. MODWT is a method of decomposing the original time series into scaling and wavelet coefficients on different resolution levels, which can be seen as its smooth information and detail information [30]. In this research, these coefficients are used as the deep information involved in the original information. The method Fourier transform [31] can also decompose similar information but it is not suitable for non-stationary time series [32].
The innovation and contribution points in this research mainly have three aspects. The first one and also the main one is that self-exciting point process is proposed to describe the complex behavior of drug on patient. The second one is that MODWT is applied as the feather extraction method to find deep information involved in the original time series. The third one is that the drug effect model developed in this research can describe the effect of a sequence of drug usage on the monitored variables instead of only one usage. The results can be further developed for assisting precision medicine.

Method
The data contain two variables, the drug usage variable d i and the monitored variable x t . d i denotes the ith drug dosage at time t i , and x t gives the value of the monitored variable at time t. The goal is to describe x t by using d i , that iŝ where g represents the other information included in the prediction of x t . In this paper, we mainly analyze the monitored variable heart rate as it is a typical variable that has similar properties to the other monitored variables. Variable x t such as the changes of the heart rate mainly contains three components: the original waveform change involved in the heart beat change itself, the trend change resulting from the disease, and the trend change arising from the drug usage. In order to keep track of the trend caused by the drug and disease, the wavelet transform method MODWT is applied before implementing the self-exciting point process.
Since the wavelet basis Haar wavelet is simple to understand and has good modelling performance, it is chosen as the basis for the MODWT. The Haar scaling function is defined as Using dilation and translation, the scaling function at resolution level j and location k is For more details, see Graps [33]. Then, the scale variables s j,k can be given by The Haar mother wavelet function ψðtÞ is defined as And the wavelet function at resolution level j and location k is ψ j,k ðtÞ = 2 j/2 ψð2 j t − kÞ. The wavelet coefficients d j,k are defined as d j,k = <xðtÞ, ψ j,k > . The scale variables vector s j = ðs j,0 , s j,1 , ⋯, s j,t Þ and wavelet coefficient vector 3 Computational and Mathematical Methods in Medicine d j = ðd j,0 , d j,1 , ⋯, d j,t Þ become new variables which can be used for classification and regression [34,35]. We name them as scale variables (or scale information) and detail variables (or detail information) in the following sections. For convenience, s j,t is rewritten as s t,j . By applying the Haar wavelet to the data x t , the new data W t is given by So far, the step of feature extraction has been finished by using MODWT with the input variables extracted as W. Instead of regarding all the variables in W as the target variables, Granger cointegration test is performed to verify how the drug influence the variables. The variable with the highest significance is chosen as the final target variable. In the Granger test, we set where ε t is the random noise that follows the Gaussian distribution. Granger test is based on the F test, and the hypothesis is defined as and the corresponding F statistic is where SSR r is the residual sum of squares being restricted, and SSR ur is that unrestricted. If F > F α ðm, n − kÞ, then, the hypothesis H 0 that d t is not the Granger reason of x t is rejected.
By applying the Granger cointegration test to all the variables in W, the most significant variable can be selected as the final target variable, and we can regard the variable as the original x t without the rest information g. In that case, the trend caused by the drug instead of x itself or the trend incurred by the disease can be discarded to some extent. For convenience, we still use the symbol x to represent the final target variable.
After getting the final target variable x, it can be applied to the Hawkes point process involved with the drug information. In the self-exciting point process, the target variable is a number that can be counted, and in this work, we extend it to a continuous context. As a result, the conditional intensity of x t during the time interval ðt j , t j+1 Þ is given by where g t includes the information available up to time t. After that, a fairly general self-exciting process can be defined in terms of an intensity of the form where t i is the time when the ith drug usage occurs. μ t > 0 provides a base level for the process, and we set it as a sin function for simplicity, The function γðt − t i , d i Þ ≥ 0 is defined as the exciting kernel of the process, which can be a Gamma distribution. The kernel provides the contribution to the intensity at time t that is made by all the previous drug usage events that   Computational and Mathematical Methods in Medicine occur at previous time t i < t, with the associated drug dosage d i . The meaning of the intensity function is that each event increases the intensity and then decays according to the function γ until the next drug usage occurs to push it up again. The chosen kernel function is a sigmoid function and a Gamma distribution function with the form as where a 1 , a 2 , b 1 , b 2 , m 1 , κ 1 , and κ 2 are all parameters to be determined using the real data. In this kernel function, the sigmoid function is proposed to describe the instant drug effect, which we assume the appropriate drug dosage can bring about the corresponding correct effect while using an extra dosage can make little effect when the drug has been already too much or rare. In that case, the sigmoid function is chosen to make it consistent with the assumptions. The Gamma function can be used to describe the trend of the drug effect from the increasing stage to the decreasing stage slowly. It can also describe the decaying process of the drug effect with the time. In the function of λ t , the cumulative of the γðt − t i , d i Þ describes all the previous drug effect until the current time. In order to calculate the parameters θ = fa 0 , b 0 , α 0 , β 0 , a 1 , a 2 , b 1 , b 2 , m 1 , κ 1 , κ 2 g, the ordinary least squares method (OLS) is used to estimate these parameters. The target function to be minimized can be expressed as  Table 1.

Computational and Mathematical Methods in Medicine
Sincex t is not easy to be obtained, Δx t is used for the replacement of it. Thus, the target function L can be given by Replacing Δx t with λ t in L gives It follows that Taking the derivative of each parameter in θ, for parameter a 1 , we have Similar results can be obtained for the other parameters in θ. It is clear that we cannot get a closed form solution from the derivation as the estimated parameters are mutually dependent with each other. Hence, an optimization method of Broyden-Fletcher-Goldfarb-Shanno (BFGS) is proposed to minimize L, which is a quasi-Newton method, also known as the variable scale method. This algorithm improves the weakness of Newton's method from the aspects that BFGS will not be easily affected by the initial value without the cost of computing the exact hessian matrix and its inverse in the process of each step of optimization. The NFGS, namely, the Newton improved BFGS, has the characteristic of fast searching of Newton's method and has an improvement in efficient searching for the global optimal solution.
The measurement of the goodness of fit has the metric using R 2 , which ranges from 0 to 1, with 1 as the best fit. R 2 has the form as where SST is the total sum of squares, and SSE is the residual (error) sum of squares, In addition to the R 2 , the Pearson correlation test value is also selected for comparison, which is formulated as Some other evaluation criteria to assess the performance of the models include mean absolute error and mean percentage error. They all give the measurement of the model fitness, and R 2 and the correlation test can have similar or better effect in measuring the results as their values are constrained in 0 to 1 or -1 to 1, such that the performance can be compared with the significant level. In conclusion, we develop a method to analysis the influence of drug usage on the monitored variable by using self-exciting point process with the kernel function of Gamma distribution.

Simulation
In this section, we analyze the performance of the method under different noise levels. Since the drug effect can produce the cumulative effect, as shown in Figure 1, we will also analyze how the drug effect influences the model performance. In this Figure 1, the black line represents the cumulative effect ∑ t i −t γðt − t i , d i Þ, and the dashed lines represent the separate drug effect γðt − t i , d i Þ for each i.
In order to explore how the parameters influence the model performance, the simulation is performed using dif-ferent parameter settings. The true parameter and the simulated parameters are shown in Table 1. With the parameter given in Table 1, the R 2 results are shown in Figure 2. It can be seen that some parameters have few influence on the R 2 , including b 1 , a 1 , a 2 , b 0 , a 0 . The parameters that have the apparent influence on the R 2 include b 2 , κ 2 , κ 1 , m 1 , b 0 , α 0 , β 0 : From Figure 2, it can be seen that when the values of b 2 and κ 2 keep in a fixed ratio around 10, the R 2 keeps in its good performance. When κ 1 increases from 0 to around its best value 0.3, the R 2 increases, but deceases afterwards when κ 1 continues to grow. For the parameter m 1 , R 2 has the similar trend. The model performance is not good when m 1 is below its best value but performs relatively well when m 1 is above its best value. For b 0 , α 0 and β 0 , as they are in the sin function, R 2 shows some kind of routinely performance.
The level of noise also influences the best fitting ability. That is why MODWT is applied before the model training. For example, as shown in Figure 3, by adding the noise which follows the Gaussian distribution, the monitored variable with noise becomes far away from the one without noise. This gap causes the R 2 to be smaller than it should be. By setting the noise level as 0.1 to 10, we can get the best possible R 2 value by choosing the best parameters, and the results are shown in Figure 4. The R 2 decreases as the noise level increases. When the variance of the noise level is bigger than 4, the R 2 decreases below 0.9. When the noise level is 5, the R 2 decreases below 0.8. This result also shows the best possible R 2 under different noise levels.

Real Data Analysis
The real data we use in this work is the circulatory failure data Hyland et al. [36]. The original source of the data comes from the Medical Information Mart for Intensive Care-(MIMIC-) III database [37], which provides the critical care data of over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center. Importantly, MIMIC-III was deidentified, and patient identifiers were  Instead of using all the variables, the variable heart rate is selected as the monitored variable as it has quite few missing values. The minimum and maximum of the heart rate are 61 and 117, respectively. The average and standard deviation are 77 and 9.17, respectively. The drug data in use is the amount of dosage of drug liquemin with unit 5000 U/ml for per dosage. The dosage in use in this real data is 25. The number of usage is 63 times. Liquemin is used to decrease the clotting ability of the blood and help prevent harmful clots from forming in blood vessels. This medicine is sometimes called a blood thinner, although it does not actually thin the blood. Specifically, it is also used in the treatment of heart attacks and unstable angina. Two timeseries phases are selected for the real data analysis which contain only liquemin. The flowchart of the proposed method is shown in Figure 5. As shown in Figure 6, the original heart rate variable is transformed by using MODWT to leave out the noise, and the scaling information is selected by using the Granger cointegration test. After that, the scaling information and the drug information are put into the To sum up, due to the complex development of the monitored variables, there are still some trends caused by the disease that cannot be fully described by our model. We propose a model that gives a different and efficient method for the drug effect measurement in pharmacodynamic.

Conclusion
In this research, we develop a Hawkes model by using selfexciting point process and sin function to describe the development of medical monitored variable heart rate. Selfexciting point process can describe the effect of the drug and the dosage of the drug. The sin function can describe the basic trend of heart rate. By combining these two functions together, some of the heart rate trend can be described. The model can be used for drug effect prediction and solve the problem of drug precision suggestion. It can also be used for the medical assistant by giving more precise drug dosage description. Specifically, the results show that the model can have a correlation Pearson test to be significant under the significant level of 0.1. The increases and decreases of the trained drug effect can fit the trend of the real drug effect.
The limitations of this research are summarized as the following. In our research, only one kind of drug is modelled, but if many kinds of drug are considered in the model, the model can also be complex. Specifically, in the further research, more complex model can be developed to describe the details of the monitored variable trend including the trend caused by the disease, in addition to the basic trend and drug effect trend. The trend caused by the disease is quite complex to be described but still can be modelled. The analysis of multiple drugs is not simply based on the adding of single drugs, in that case, using many self-exciting point process cannot meet the demand of multiple drug effect analysis. Mutual-exciting point process Hawkes model can be considered to describe the effect of multiple kinds of drug and cumulative drug effect dosage. If multiple drug effect can be modelled, then the further analysis can be concentrated on the inverse problem of drug dosage decision under a predefined monitored variable value.
The proposed modelling solution can be further extended to some other areas in addition to the medical area. For example, in the financial area, the model can be used to monitor the effect of economic events or political events to predict the stock price and security price trend after the event occurs. In the environmental area, the model can be used to monitor the effect of air pollution by decomposing the original time series into three aspects, namely, the basic routine trend, the trend arising from events, and that caused by solar circles. In conclusion, the model developed in this research is an efficient and generalized model for time series data which contains multiple kinds of trends either embraced by the time series itself or caused by the external events.

Data Availability
The source code in the method are available from the corresponding author upon request. The real data in application can be requested from Hyland et al. [36].