A Bioequivalence Test by the Direct Comparison of Concentration-versus-Time Curves Using Local Polynomial Smoothers

In order to test if two chemically or pharmaceutically equivalent products have the same efficacy and/or toxicity, a bioequivalence (BE) study is conducted. The 80%/125% rule is the most commonly used criteria for BE and states that BE cannot be claimed unless the 90% CIs for the ratio of selected pharmacokinetics (PK) parameters of the tested to the reference drug are within 0.8 to 1.25. Considering that estimates of these PK parameters are derived from the concentration-versus-time curves, a direct comparison between these curves motivates an alternative and more flexible approach to test BE. Here, we propose to frame the BE test in terms of an equivalence of concentration-versus-time curves which are constructed using local polynomial smoother (LPS). A metric is presented to quantify the distance between the curves and its 90% CIs are calculated via bootstrapping. Then, we applied the proposed procedures to data from an animal study and found that BE between a generic drug and its brand name cannot be concluded, which was consistent with the results by applying the 80%/125% rule. However, the proposed procedure has the advantage of testing only on a single metric, instead of all PK parameters.


Introduction
In order to test if two chemically or pharmaceutically equivalent products, for example, a generic drug and its brand name, have the same efficacy and/or toxicity, a bioequivalence (BE) study is usually conducted [1][2][3]. The objective of a BE trial is to determine whether the test ( ) and the reference ( ) formulation of a pharmaceutical product are "equivalent" with respect to blood concentration × time profiles. In contrast to a difference test, the null hypothesis in an equivalence test states that two agents differ in terms of the endpoint under consideration by at least the minimum 2 Computational and Mathematical Methods in Medicine tolerable amount, called the equivalence margin Δ, whereas the alternative hypothesis states that such difference is less than the equivalence margin Δ: 0 : ≥ Δ, In a BE test, some parameters derived from the concentration-versus-time curves are evaluated. Those parameters traditionally include the area under plasma concentrationversus-time curves (AUC 0 ), peak plasma concentrations ( max ), and its corresponding time ( max ). Among them, AUC is the most accepted measure of absorption rate. max is also of importance because, for some drugs, a certain level of concentration needs to be reached to guarantee the desired therapeutic effect. max is a relevant measure for drugs such as antibiotics that must reach the peak concentration quickly, but it may not be an appropriate measure for drugs requiring multiple dosage before a therapeutic effect is observed. Since none of these three parameters are universally superior to the others, a BE test usually considers them together. For example, under FDA regulations, BE can be claimed only when the 90% confidence intervals (CIs) for the ratio of max , AUC (0− ) , and AUC (0−∞) of the tested (e.g., generic) to the reference drug (e.g., brand name) are within 80% to 125% [4]. This is referred to as 80%/125% rule, corresponding to a ±0.223 rule on the logarithmic scale (of the / ratio). Hence, the value of Δ is usually set to 0.223. Of note, the 90% CIs of the BE-endpoint represent a 0.05 significance level on the equivalence test since the hypothesis-testing problem in an equivalence test is divided into two one-sided hypothesis tests [2,5,6]: one where the null hypothesis states that the difference between two agents is less than −Δ whereas the other one assumes such difference is larger than Δ: Although the BE parameters can be easily obtained from either one of the concentration-versus-time curves using a suitable pharmacokinetic (PK) model such as a onecompartment model or a nonparametric method [7], there are many parameters to be tested, which inflates the type I error rate, requiring the adjustment for multiple comparisons. Because these parameters may be highly correlated, such an adjustment is challenging.
On the other hand, some researchers had pointed out that the requirement of all confidence intervals falling within the equivalence bounds might lead to a conservative result, depending on the correlations among the PK parameters and the study power [8,9]. To address this, simultaneous testing of all PK parameters had been explored and developed; see, for example, the semiparametric Bayesian approach proposed by Ghosh and Gönen [10]. However, such multivariate methods are less popular than the univariate approach of testing PK parameters, partially because of the modeling complexity associated with multivariate methods as compared with their univariate counterparts.
More importantly, when the confidence intervals of max , max , and AUC between two drugs all fall within the equivalence boundary (such that BE is concluded), it does not imply that the drugs are clinically equivalent since the overall shapes of the concentration-versus-time curves may in fact differ [11]. A falsely determined BE between two drugs when they are not clinically equivalent may be very harmful to the public. To alleviate these limitations and drawbacks, we propose to make a direct comparison of the concentration-versustime curves that accounts for the differences between the profile shapes for BE testing. The main objective of this paper is to present a strategy to test BE with the aid of local polynomial smoother (LPS), which is used to construct concentrationversus-time curves. Then, a summary statistic, whose standard error is estimated by bootstrapping, is defined. This allows the calculation of CIs upon which decisions over equivalence between such two curves are made.
LPS is a flexible nonparametric regression method to model curves or surfaces. Specifically, the fitted regression function at a covariate value is based only on observations within a prespecified neighborhood of . LPS can be traced back to the late 19th century in actuarial sciences where it was used to estimate the gradation of mortality rates and in time series modeling [12]. In contrast, another popular nonparametric regression method, locally weighted scatterplot smoothing (LOWESS), obtained smooth fitted values by a weighted linear least squares regression over the prespecified spans. In this study, however, we do not emphasize their differences and instead treat them as synonymous.
With advances in computing, the potential applications of LPS continue to expand. For example, it has been used for dye normalization of two-color microarray experiments [13]. Recently, Anders and Huber [14] used LPS in R's DESeq package to estimate the mean-variance relationship in a negative binomial model, a technique now commonly used to model RNA-Seq data. The popularity of LPS in statistical applications may be due to its several advantages including satisfactory boundary behavior and straightforward interpretability of the nonlinear relationships [15,16].

Experimental Data.
The study included 24 beagle dogs weighting 9-11 kg, and those dogs were randomly assigned to receive either single or multiple (every four weeks for 3 times) administrations of either 1.4 mg/kg Sandostatin or a generic drug developed by GenSci. There were 6 dogs including 3 females and 3 males per treatment-dosage group. Blood samples for PK were collected into tubes containing heparin sodium and centrifuged at room temperature with 3500 rpm for 20 min to obtain plasma samples. This animal study was conducted in accordance with the rules and regulations of Chinese Pharmacopoeia, 2000 Edition/Version 2 and approved by the Committee on the Ethics of Animal Experiments of Jilin University. The plasma concentrations were determined using an Agilent 1100 liquid chromatographytandem mass spectrometry. In the single-dose regimen, blood samples were collected at the initial time corresponding to hour 0 and at the hours of 0.25, 0.5, 1, 1. 5,2,4,8,24,48,96,144,192,240,336,432, 528, 624, 720, 816, 912, 1008, 1104, and 1200.
Based on [7], PK parameters in this study were calculated as follows. For each dog, the maximum plasma concentration ( max ) and its corresponding time ( max ) were determined by visual inspection of the profiles. The apparent terminal elimination rate constant ( ) was calculated by linear regression of the natural logarithms of the terminal plasma concentrations. The terminal half-life ( 1/2 ) was derived as 0.693/ . The area under the curve (AUC) to the last measured point (AUC 0 ) was calculated using the trapezoidal rule. The AUC for the plasma concentration-versus-time function from 0 hours to infinity (AUC 0−∞ ) was calculated as the sum of AUC 0 and / , while was the last quantifiable concentration.

The LPS Model.
Let ( 1 , 1 ), . . . , ( , ) be covariateresponse pairs for observations. To simplify the notation, suppose there is only one covariate. Then, the response is related to the covariate through the following model: where the error term is assumed to be independently distributed with zero mean and variance equal to 2 . The goal of a local polynomial smoother is to estimate the smooth function ( ) = ( | = ) using an approximation provided by a local polynomial of low order in the neighborhood of the point . Usually, the model includes only first-and second-degree polynomials, that is, either locally linear or locally quadratic. This is because any function can be well approximated in a small neighborhood by low-order polynomials and the inclusion of higher order terms often does not improve the model fit dramatically and even may lead to overfitting instead. Specifically, for − ℎ ≤ ≤ + ℎ, The abovementioned approximation is fitted by local weighted least squares, where the estimated coefficients are those that minimize the following function: where the weighted function gives greater weight to in the vicinity of and is usually specified as a symmetric bounded function such as a normal kernel function.
There are two commonly used approaches for the selection of bandwidth ℎ in the equation above [12]. One choice is simply to make the bandwidth a constant, which is adequate if the regression function behaves smoothly. The other one is to allow the bandwidth to change as a function of . In our applications, we chose the latter by using a nearest-neighbor bandwidth selection method [17]. Briefly, in the nearestneighbor bandwidth selection method, a new parameter was included, which was multiplied by the sample size and rounded up to an integer . Then, the bandwidth ℎ( ) was defined as the distance from to the th closest . In this study, measurements were taken at nonuniformly spaced time points, with more frequent measurements at the beginning, where a larger variation of the PK dynamic was expected among subjects. Under this condition, the nearestneighbor bandwidth selection approach will provide better fit than the constant bandwidth. The nearest-neighbor bandwidth parameter was determined by a leave-one-out (LOO) cross-validation due to the small sample size. Specifically, for a grid of possible values, cross-validated mean square error (MSE) was calculated, and the corresponding to the minimum MSE was selected.

Statistical
Language. The statistical analysis was carried out in the R language version 2.15 (http://www.r-project.org/). The R-code is available upon request.

Testing the Equivalence of Two Curves by Combining LPS and Bootstrapping: The Proposed
Procedure. Here, we propose to evaluate bioequivalence between drugs by directly comparing the plasma concentration curves PC( ). Suppose we have two treatments where 1 subjects were randomly assigned to receive the test drug and 2 to the reference drug . We proposed to reframe the hypothesis represented in (1) as 0 : ( ( ) , ( )) > Δ, versus 1 : ( ( ) , ( )) ≤ Δ, with representing the difference between the PK curves for and drugs. For the study subjects, the plasma concentrations of these drugs were measured for those at a grid of time points. Using LPS (details in Section 2), concentrationversus-time curves for both treatments can be estimated as ( ) and ( ). To evaluate if the differences between two curves are within the tolerable range, we define the following estimator of the average difference between the two curves at a grid of prespecified time points, for example, the time points where the concentrations were taken in this study: wherêand̂are the fitted values of plasma concentration using LPS at time for the test drug and the reference drug , respectively. Here, = 1, 2, . . . , , and is the number of time points at which the difference between two curves is evaluated. Note that ln( ) is a measure of the difference between the curves and when the two curves are identical, and they should be identical at all time points and thus the value of ln( ) should be 0.
If the 90% CIs is within 0.8∼1.25, then the equivalence between two curves is claimed.

Case Study.
Octreotide is an octapeptide that mimics natural somatostatin pharmacologically even though it is a more potent inhibitor of growth hormone, glucagon, and insulin than the natural hormone. Brand name Sandostatin (Novartis Pharmaceuticals) of octreotide had been approved for the treatment of several diseases, such as acromegaly and gigantism. To evaluate the BE between a generic octreotide developed by GenSci Pharmaceuticals (Changchun, China) and Brand name Sandostatin, a pilot study using 24 beagle dogs was conducted. The generic drug is referred to as GenSci hereafter. Using this study as an example, we showcase how to incorporate LPS seamlessly with other statistical methods to test BE. Once concentration-versus-time curves are obtained via LPS, testing for BE will be akin to testing if the two LPSderived curves are the same. Firstly, we conducted an equivalence test using the procedure described in Methods. The fitted plasma concentration PC( ) curves are shown in Figure 1. Interestingly, the curves estimated by LPS suggested some PK differences between the two drugs. Specifically, we observed a surge in plasma concentration and then a steep drop in the GenSci group. In addition, the GenSci group entered the plateau stage faster and remained in this stage longer than the Sandostatin group, dropping down at a sharper scope. Based on these observations, we suspected that GenSci might have a better PK behavior than Sandostatin.
The 90% CIs for the proposed metric of GenSci against Sandostatin were calculated as 119.02% ∼166.99%. With its upper bound above 125%, the equivalence between the two drugs cannot be established based on this method. As a comparison, we also estimated the 90% CI of the ratio of the traditionally considered PK parameters (i.e., (AUC 0 ), max , and max ) based on the FDA guidelines [4]. The results summarized in Table 1 show that all 90% CIs of the ratio of (AUC 0 ), max , and max of GenSci against Sandostatin were outside 80% to 125%, the FDA-sanctioned BE criteria. In summary, BE cannot be established using either the FDA-criteria or our proposed procedure. Nevertheless, the proposed procedure has the advantage of testing only a single statistic and hence avoiding multiple hypotheses issues.
The sample size of 6 dogs per group raises major concerns in light of the failure to establish BE between GenSci and Sandostatin. To explore the impact of the sample size in this pilot study, we combined the data from this single-dose cohort with additional data from a multiple-dose cohort, but including only data collected in the first 720 hours where no boosts were given and the experimental methods\conditions were exactly the same as in the single-dose cohorts. Interested readers may see Section 2 for more information on how the experiments were conducted. Analysis on the combined cohorts including 12 dogs per group led to similar results. Given that this application represents a pilot study, as preliminary work for larger studies, no confirmative conclusions were drawn here. By presenting this study, our intent is to illustrate the LPS-based BE test as an alternative to the classical univariate tests on PK parameters.

Simulation Study.
We conducted a simulation study to characterize our BE-testing procedure. Here, we used the PK model presented in [11] where the concentration function is represented by where and are the absorption and the elimination rate constants. The concentration-versus-time curves for the reference drug and test drug were simulated as log ( ) = log ( ( )) + , log ( ) = log ( ( )) + ∇ + , where the correlation between two measures, on the logarithm scale, from the same subject ( = 1, 2, . . . , 2 ) was specified as a positive constant; that is, corr( , ) = Computational and Mathematical Methods in Medicine 5 Note: represents the absorption rate for the generic drug and represents the absorption rate for the reference drug; represents the sample size of each group; ∇ represents a constant difference between the concentrations of two drugs on the logarithm scale; is the correlation coefficient between measures over time from the same subject; its influence on the estimation is expected to be bigger when its value is larger.
for ̸ = . The curves above differ only by a constant ∇ on the logarithm scale. In the second set of simulations, the two concentration curves for the test drug and the reference drug were set to have different absorption rates (i.e., and ) instead of ∇ ̸ = 0. By varying the values for , ∇, and / and the sample size, we simulated 100 different scenarios in total. The empirical coverage of the 90% CIs for each scenario was calculated via simulating 1000 bootstrap replicates. The simulation results are presented in Table 2, from which we found that a sample size of 24 per group is large enough to provide an adequate statistical power for the proposed testing procedure. Furthermore, the empirical coverage of the 90% CIs when two curves are equivalent is approximately 95% or above, suggesting the 80/125% rule is applicable for the proposed metric.

Our Extension to the Superiority Test between Curves.
It is worth pointing out that by virtue of being coupled with a permutation test [19,20] to calculate an overall value, LPS can also be used to evaluate if two curves are different. In this case, the hypothesis tested is 0 : ( ( ) , ( )) = 0, with representing the difference between the two curves. As shown on the simulation studies (in Supplementary Material available online at http://dx.doi.org/10.1155/2016/4680642), the permutation-based test provides a valid means for testing the difference of two curves, achieving the desired type I error rate. We applied this procedure to longitudinal data collected by Orange et al. [21] and showed that the results obtained were consistent with those obtained by using a generalized estimating equation model [22,23]. Therefore, the proposed procedure to test the difference between curves using LPS can be employed in a wide range of applications in longitudinal data analysis.

Conclusions
In this study, we presented a framework which directly compares the concentration-versus-time curves constructed using LPS to test BE. This approach avoids multiple testing by evaluating the equivalence between curves with one single metric, instead of multiple PK parameters. Furthermore, since LPS can be viewed as an extension of linear weighted regression and can be easily implemented in any standard statistical software, our proposed procedure can be easily utilized in other applications.
van der Meersch et al. [24] have argued that the objectives and methodology of bioequivalence trials differ greatly from those of noninferiority and equivalence trials due to the decision criteria and study design. In our opinion, their claims are not persuasive. In terms of decision criteria, it is difficult to define an equivalence margin in a longitudinal sense because the measures such as plasma concentrations are collected over time. It might represent one major reason why the FDA and other regulatory agencies have defined BE by using confidence limits for a grid of PK parameters instead of a single hypothesis testing. Regarding the study design, van der Meersch et al. [24] deemed a crossover design to be suitable and appropriate. In practice, however, there are many BE studies using a parallel design that the FDA reports [4] explicitly listed the cases where a parallel design is considered more suitable than a crossover design. Given that a BE test is a longitudinal equivalence test by nature, we believe that the use of nonparametric smoothers such as LPS as a basis to construct the concentration-versus-time curves for direct comparisons warrants additional attention and research.