Determination of Cefoperazone Sodium in Presence of Related Impurities by Linear Support Vector Regression and Partial Least Squares Chemometric Models

A comparison between partial least squares regression and support vector regression chemometric models is introduced in this study. The two models are implemented to analyze cefoperazone sodium in presence of its reported impurities, 7-aminocephalosporanic acid and 5-mercapto-1-methyl-tetrazole, in pure powders and in pharmaceutical formulations through processing UV spectroscopic data. For best results, a 3-factor 4-level experimental design was used, resulting in a training set of 16 mixtures containing different ratios of interfering moieties. For method validation, an independent test set consisting of 9 mixtures was used to test predictive ability of established models. The introduced results show the capability of the two proposed models to analyze cefoperazone in presence of its impurities 7-aminocephalosporanic acid and 5-mercapto-1-methyl-tetrazole with high trueness and selectivity (101.87 ± 0.708 and 101.43 ± 0.536 for PLSR and linear SVR, resp.). Analysis results of drug products were statistically compared to a reported HPLC method showing no significant difference in trueness and precision, indicating the capability of the suggested multivariate calibration models to be reliable and adequate for routine quality control analysis of drug product. SVR offers more accurate results with lower prediction error compared to PLSR model; however, PLSR is easy to handle and fast to optimize.

Literature review presents several analytical methods for assay of CEF in its pharmaceutical formulation including spectrophotometric methods for determination of CEF [5,6], near-infrared reflectance spectroscopy [7], and derivative UV spectrophotometry for determination of CEF in combination with sulbactam [8]. Chromatographic methods were applied for analysis of CEF and sulbactam [9,10]; besides, an HPLC method with -cyclodextrin stationary phase for determination of CEF, ampicillin, and sulbactam was reported [11]. CEF and sulbactam were determined in plasma also by LC-MS/MS method [12]. Additionally, electrochemical behavior and voltammetric determination of CEF [13,14] were reported.
There are two main aims for the presented study. Firstly, the presented chemometric models show the ability of multivariate models to analyze selectively CEF in ternary mixtures with its two reported impurities using cost-effective and available instruments like UV spectrophotometer. Second, the presented study aims to compare two methods of multivariate calibration, PLSR and linear SVR chemometric models, through assay of CEF, 7-ACA, and 5-MER mixtures indicating the advantages and limitations of each model. The selected models offer better trueness and precision for quantitative determination of CEF in pharmaceutical formulation compared to the reported HPLC method [15].  (El-Nasr Pharmaceutical Chemicals Co., Abu-Zaabal, Cairo, Egypt).

Standard Solutions.
(a) Stock standard solutions of 1 mg mL −1 for each of CEF, 7-ACA, and 5-MER were prepared in 3 mL of 0.05 M K 2 HPO 4 solution and volume was completed with pure methanol. 7-ACA is soluble only in slightly alkaline solvent; accordingly the solubility was achieved by addition of fixed small volume of 0.05 M K 2 HPO 4 solution before adding methanol. This step was carried out with all stock solutions of CEF, 7-ACA, 5-MER, and pharmaceutical formulation as well.
(b) Working standard solution of 100 g mL −1 of CEF was prepared in methanol. Two working standard solutions were prepared for each impurity. Working standard solution (1) of 100 g mL −1 and working standard solution (2) of 10 g mL −1 for both 7-ACA and 5-MER were prepared to allow preparation of final mixtures with very small concentrations of the impurities, up to 3% of CEF calculated on molar basis.

Linearity.
UV spectra of a set of standards of CEF from 1 to 70 g mL −1 were recorded from 210 to 300 nm. CEF exhibited linearity between 5 and 50 g mL −1 at its max is 229 nm. The superimposed spectra of 10 g mL −1 of CEF, 7-ACA, and 5-MER are shown in Figure 2.

Calibration Set.
A 4-level, 3-factor calibration design was implemented using 4 concentration levels coded as +2, +1, −1, and +1, where (−1) is the central level for each of the components to be analyzed including the main drug (CEF) and its two impurities (7-ACA and 5-MER). The design aims to span the mixture space appropriately, where 4 mixtures for every compound at every concentration level exist, ending up with 16 mixtures for the training set [16]. The central level selected for the design was 20 g mL −1 for CEF and the concentration of each level for CEF depended on its calibration range. Concentration levels of impurities were based on  involving impurities in an amount up to 3% of CEF calculated on molar basis to span most of the probabilities in future analyses. Table 1 represents the design matrix for concentrations. 2D scores plot of the first two PCs of the concentration matrix was drawn to affirm the orthogonality, rotatability, and symmetry of the training set mixtures (presented as circles) as shown in Figure 3. Mean centering of all types of data was the best preprocessing method to provide best results.

Test Set.
To examine the validity and prediction capabilities of the introduced chemometric models, the independent test set mixtures were obtained by preparation of nine independent mixtures other than the training set mixtures but within the concentration space of the design as indicated in Table 1. The well position of the mixtures of both training set and test set mixtures is shown in Figure 3.

Analysis of Cefobid and Cefoperazone Vials.
For each of the two dosage forms, accurately weighed aliquot equivalent to 100 mg of CEF was transferred into 100 mL volumetric flask. To prepare stock solution, 3 mL of 0.05 M K 2 HPO 4 solution was added and volume was completed using pure methanol. The solution was then diluted to prepare 100 g mL −1 working solution using methanol as solvent. Lastly, 2 mL portion of the working solution was diluted to 10 mL with methanol. The average of three corresponding spectra was recorded. This experiment was replicated six times and the produced spectra were analyzed by the suggested models.

Chemometric Methods
Multivariate calibration models are chemometric tools that set a relation between the spectra in data matrix X and the concentrations in data vector c. Multiple linear regression (MLR), principal component regression (PCR), and partial least squares regression (PLSR) are among the common models used for pharmaceutical analysis. PCR and PLSR methods are more preferable than MLR, because MLR needs further variable selection steps to perform optimally and to avoid multicollinearity. Additionally, PCR and PLSR can cope with a large number of spectral variables by decomposing the X data matrix into a relatively small number of scores. The scores matrix T then replaces the original X data matrix in the further steps. PLSR is more developed than PCR, where the c data vector shares in construction of the scores as well [17,18]. Furthermore, the compression to a small number of scores works as a useful filter for noise [19]. Hence, PLSR is implemented in our presented analysis.

Partial Least Squares Regression (PLSR).
The PLSR model depends on the theory of existence of a linear relation between the X data matrix and the independent variables in concentration vector c [20]. The data matrix X and the response vector c are decomposed using a given number of PLS components (latent variable LVs) [21][22][23][24] as follows: where T and P are the scores and loadings for X and q is the loading vector for c. E and f are the residuals for X and c, respectively. PLSR is considered one of the best in multivariate calibration because it is reported to perform better than MLR and PCR in several pharmaceutical applications [20].

Optimization of Number of PLS Components for PLSR Model.
For prediction of optimum number of PLS components, bootstrap technique [25,26] was used. This technique is based on dividing the original training set to two-thirds (bootstrap training set) and one-third (bootstrap test set). The PLSR model is then applied through building a model with the bootstrap training set to predict concentrations in the bootstrap test set and calculating the prediction error through the following equation: where is the number of bootstrap test set samples, is the known concentration of sample , and̂is the corresponding predicted concentration at the defined number of PLS components. Equation (2) represents just one iteration out of 1000 iterations that were implemented in the presented study. The higher the number of iterations, the higher the probability of selecting all samples in both data sets (training set and test set). Finally a plot was established of the average of the 1000 root mean square error of prediction (RMSEP) values for different number of PLS components against the corresponding number of PLS components to choose the optimum number of optimum PLS components. The bootstrap training set was mean centered with every iteration.

Support Vector Regression (SVR). For a data set X ( × )
of an output vector c, SVR models aim to find a multivariate regression function ( ) that depends on X to predict a desired response (e.g., concentration of a chemical compound) from an object (e.g., a spectrum). SVR model equations are illustrated in literature [26][27][28] and the summary equation can be given as follows: where and * are the Lagrange multipliers that fit to the constraint 0 ≤ , * ≤ . is known as penalty error or regularization constant which determines the trade-off between model simplicity and training error. The parameter is the offset of regression function ( ). Further illustrations of (3) and the parameters and are found in literature [27,29,30]. -insensitive loss function is an additional factor commonly applied for SVR and will be used and optimized in this study [31,32]. SVR method can be applied for both linear and nonlinear data. Linear SVR model is used in this study, where the used spectral data exhibit linearity guaranteed through the well planned experimental design. Finally, in prediction step, unknown̂value can be calculated as given below [33]: Journal of Analytical Methods in Chemistry

Optimization of Number of the Linear SVR Model Parameters.
Optimum and values were calculated by using a grid search that depends on 4-fold cross validation to give the lowest root mean square error of cross validation (RMSECV). Primary range of values was set for (0.01-1) and . For each set of SVR parameters, 4 samples ( = 4) were taken out; the linear established SVR model was applied on the remaining 12 ( − ) samples. Further, RMSECV was predicted for the samples and finally, the average of RMSECV after removal of all samples was calculated as follows: where is the true concentration for sample and̂is the corresponding predicted concentration.

Parameters' Optimization Results.
Bootstrap technique was applied to choose optimum number of PLS components to build the best calibration model based on the training set. The optimum number was "three" as shown in Figure 4. Concerning SVR, the grid search that gave the lowest RMSECV (5) resulted in the values ( = 0.02 and = 280).

Data Analysis
Results. This study aims to introduce a comparative study between two chemometric methods known as PLSR and linear SVR via analysis of CEF in presence of its reported impurities: 7-ACA and 5-MER. The two multivariate models could handle the UV data and overcome the overlapping difficulty of the components' spectra shown in Figure 2. Both models were successfully able to determine the concentrations of CEF in training set and test set manifested by high recovery % with low SD as presented in Table 2. The RMSEP is a parameter used to evaluate the prediction abilities of the two models, Table 2. RMSEP comparative plot between PLSR and linear SVR for prediction of test set samples is presented in Figure 5.
There are many ways of comparison included in our study such as root mean square error of calibration (RMSEC), root mean square error of prediction (RMSEP), and calculation intensity and optimization steps. First, it was observed that linear SVR gives the least RMSEC (0.1960) compared to PLSR (0.2306) indicating better trueness and the corresponding standard deviation is smaller also indicating better precision. Second, the comparative bar plot in Figure 5 indicates that linear SVR gives the least RMSEP (0.3386) compared to PLSR (0.4457) indicating higher ability to process future samples and better generalization ability of linear SVR when compared to PLSR. Concerning calculation intensity, PLSR is computationally simpler than SVR which is deemed to be more intense in calculation and time-consuming because of intense calculations for optimization. Choice of optimum parameters' values for SVR model could be misleading, where parameters that give the lowest RMSECV can be used (considered as overfitting as in PLSR) but still give better RMSEP or vice versa. In this study, we apply 4-fold cross validation to optimize SVR parameters to evade overfitting through prediction of small subsets of data instead of single sample (as used in leave-one-out cross validation technique), so improving the robustness of model and its generalization ability. The model that is more subject to overfitting is usually less robust. Accordingly, with PLSR, selecting too many components will lead to less robust model, that is, with less ability to predict future samples that have unknown signals. SVR parameters are calculated by using -fold cross validation to evade overfitting, showing better robustness and higher prediction ability for future samples and hence being a more general model [34]. The probability of adherence of SVR to overfitting is less common than PLSR [35,36]. The implemented linear SVR model in this study has another merit of optimizing only two parameters, unlike the nonlinear SVR models that use kernels and need optimization of more parameters and hence are more time-consuming in optimization process.
In conclusion, PLSR model is considered as one of the best in multivariate calibration and usually used in quality control routine applications and industry. It proved to be simple in optimization and computation and gives comparable results to reference HPLC methods in spite of processing simple UV data. However, SVR is still considered as a more general model with higher predictive ability for future samples.  were implemented for analysis of CEF in Cefobid 0.5 gr vial and cefoperazone 1 gr vial and satisfactory results with good recoveries were obtained. These results were statistically compared to the results obtained by applying the reported HPLC method [15] using -and -tests. The calculatedand -values are less than the tabulated ones showing no significant difference between the two introduced models and the reference HPLC method with regarding both trueness and precision, Table 3.

Conclusion
In general, the goals of this paper are presenting two multivariate chemometric models, PLSR and linear SVR, for analysis of CEF in presence of its reported impurities and making a modest comparison between the two models highlighting the advantages and limitations of each. Concerning predictive ability, the linear SVR proved to be better than PLSR according to RMSEP values indicating better generalization ability. However, PLSR is simpler and fast to optimize. The two chemometric methods were also applied for the pharmaceutical formulations and statistically compared to reference HPLC method [15]. The calculated -andvalues were found to be less than tabulated ones showing no significant difference in respect to both trueness and precision. The proposed methods showed high selectivity and trueness. The presented advantages of the proposed models suggest their use for routine quality control analysis without interference of normally encountered pharmaceutical additives or impurities that could be present in minor ratios.
Additionally, the obtained results affirm the possibility of using modern chemometric approaches, especially linear SVR, for assay of different pharmaceutical dosage forms using accessible cheap and simple instruments like UV spectrophotometer even in presence of large number of interfering components with extremely overlapped spectra.

Disclosure
Parts of Experimental (Section 2) and Chemometric Methods (Section 3) are reproduced from our previous studies on other mixtures [35][36][37]. The authors would like to declare the presence of another study entitled "Determination of Cefoperazone Sodium in Presence of Related Impurities by Improved Classical Least Squares Chemometric Methods: A Comparative Study" which uses the same data sets as a case study to establish a comparison among a set of improved CLS models (not published to date).