Machine Learning for Predicting Hyperglycemic Cases Induced by PD-1/PD-L1 Inhibitors

Objective Immune checkpoint inhibitors, such as programmed death-1/ligand-1 (PD-1/L1), exhibited autoimmune-like disorders, and hyperglycemia was on the top of grade 3 or higher immune-related adverse events. Machine learning is a model from past data for future data prediction. From post-marketing monitoring, we aimed to construct a machine learning algorithm to efficiently and rapidly predict hyperglycemic adverse reaction in patients using PD-1/L1 inhibitors. Methods In original data downloaded from Food and Drug Administration Adverse Event Reporting System (US FAERS), a multivariate pattern classification of support vector machine (SVM) was used to construct a classifier to separate adverse hyperglycemic reaction patients. With correct core SVM function, a 10-fold 3-time cross validation optimized parameter value composition in model setup with R language software. Results The SVM prediction model was set up from the number type/number optimization method, as well as the kernel and type of “rbf” and “nu-regression” composition. Two key values (nu and gamma) and case number displayed high adjusted r2 in curve regressions (nu = 0.5649 × e(− (case/6984)), gamma = 9.005 × 10−4 × case − 4.877 × 10−8 × case2). This SVM model with computable parameters greatly improved the assessing indexes (accuracy, F1 score, and kappa) as well as coequal sensitivity and the area under the curve (AUC). Conclusion We constructed an effective machine learning model based on compositions of exact kernels and computable parameters; the SVM prediction model can noninvasively and precisely predict hyperglycemic adverse drug reaction (ADR) in patients treated with PD-1/L1 inhibitors, which could greatly help clinical practitioners to identify high-risk patients and perform preventive measurements in time. Besides, this model setup process provided an analytic conception for promotion to other ADR prediction, such ADR information is vital for outcome improvement by identifying high-risk patients, and this machine learning algorithm can eventually add value to clinical decision making.


Introduction
Diabetes mellitus and cancer are among the leading causes of death worldwide. As a major contributor to neoplastic transformation and an important prediabetes index, hyperglycemia (high blood glucose) is also influenced by cancer treatment. Higher blood glucose level, dealing with or ligand-1 (PD-L1) inhibitors, and immune checkpoint inhibitor regimens in clinical trials. Among the endocrine dysfunctions, hyperglycemia was the third (after hypothyroidism and hyperthyroidism) at all-grade immune-related adverse events (irAEs) and the first at grade III or higher irAEs [5,6]. Our previous research indicated that Nivolumab and Pembrolizumab were positive hyperglycemia-causing drugs [7,8]. In 2021, JAMA Oncology reported that risks of chronic irAEs should be integrated into treatment decision making [3]. Prolonged exposure to hyperglycemia can epigenetically modify gene expression profiles in human cells, and this effect is sustained even after blood glucose is therapeutically controlled. Cancer cells exposed to hyperglycemia would grow permanently and aggressively, even after euglycemia returned.
is metabolic phenomenon is called hyperglycemic memory, which contributes substantially to the pathology of various diabetic complications [9,10].
Worldwide public database on adverse events could provide many drug-usage information [11,12], having become a new information source in drug post-marketing phase. US Food and Drug Administration (FDA) is responsible for protecting the public health by ensuring the safety, efficacy, and security of drugs, biological products, and medical devices. e reports of Food and Drug Administration Adverse Event Reporting System (FAERS) are evaluated by clinical reviewers to monitor the safety of products after they are approved. FAERS is such a database that contains adverse event reports, medication error reports, and product quality complaints resulting in adverse events. e database is designed to support the FDA's post-marketing safety surveillance program for drug and therapeutic biologic products. e informatic structure of the FAERS database adheres to the international safety reporting guidance issued by the International Conference on Harmonization [13].
Machine learning (ML) "learns" a model from past data in order to predict future data.
e key process is the learning which is one of the artificial intelligences [14]. Artificial intelligence-based solutions can improve medication safety with minimal overhead for patients and health professionals [15]. Machine learning focuses on how computers learn from data, using its emphasis on efficient computing algorithms [16,17]. In health record field, machine learning techniques have been a hot spot in data mining; trajectory data mining has become an important research direction [18]. Modeling this big data information requires managing overfitting, model interpretability, and computational cost [19]. It offers a lot of advantages for assimilation and assessment of complex health big data, including flexibility and scalability, which is widely used in risk stratification, diagnosis, classification, and survival predictions. Health data's diversity trait calls for machine learning at demographic records, laboratory findings, images, or doctors' records, as well as predictions for disease risk, diagnosis, prognosis, and appropriate treatments [20].
Many different statistical, probabilistic, and optimization techniques can be implemented as learning methods such as the logistic regression, artificial neural networks, K-nearest neighbor, decision trees, and Naïve Bayes [14]. However, traditional biostatistical methods (e.g., logistic regression or linear regression) could not provide higher precision in health-data prediction, especially in real-world research. Support vector machine (SVM) is widely used as a type of supervised learning algorithm which analyzes data and recognizes patterns, mainly used for binary classification and regression by linear or nonlinear decision boundary [21]. It aims to divide samples into worthy bifurcations that enable the prediction of labels from one or more feature vectors.
is decision boundary, through a line or plane called the maximum margin hyperplane in multidimensional feature, is orientated in such a way that it is as far as possible from the closest data points from each of the classes. ese closest points are called support vectors. SVM is powerful at recognizing subtle patterns in complex datasets, being used to recognize handwriting, recognize fraudulent credit cards, and identify a speaker, as well as detect face [14].
SVM-based approach performances well in managing sparse data in high dimensions, in that it overcomes the other state-of-the-art competitors providing the best compromise between prediction and computation time [19]. It has been applied to seizure prediction, detection, and classification [22]. One advantage of SVM is its classification of small number of training samples; another is solving linear and nonlinear regression problems [23].
To date, the correlation of hyperglycemic occurrences and personal-related features, or which kinds of special features could mostly correlate with hyperglycemia, has become an urgent problem, for which all of these predictions have not been reported yet. Predicting and avoiding terrible ADR can provide better guidance for clinical decision making. In this study, we aimed to construct an effective SVM-based machine learning model with high accuracy, low computational costs, and computable parameters to predict hyperglycemia in PD-1/PD-L1-treated patients.

Exclusion Criteria
Exclusion criteria were as follows:

2.2.
Procedure. e procedure comprises data download, algorithm selection, key parameter regression, and model prediction performance (Figure 1).
Step 1. Data Version Step Step 1.2. Raw data and complete data Original data were fixed into two versions: raw data (cases with missing values) and complete data (deleted cases with missing value(s)). e two versions of data contained above 13 variables.
Step 2. Core SVM algorithm Positive and negative cases could be substituted as 2category factor type ("Yes" and "No") or number type ("1" and "0") in SVM model setup.
ere are two parameter optimization methods: number optimization and R-function optimization. In number optimization, the best values were defined by the composition of parameters, according to the best range of each parameter. In R-function optimization, the parameter range was input to a built-in function (R library: tune.svm). If the best values were close to range boundary, the new range would be adjusted for optimization again. e factorized data were optimized via both number and function methods.
Step 2.1. Number/factor Each drug with 600 cases (positive and negative cases proportionally) was extracted from raw data and complete data. All the type-optimization compositions were tested. In raw data, the results of factor number were very close to those of factor tune, so they are only displayed as "f_raw" and "n_raw." As R-tune optimization needs variables in form of factor, there were no results of "nr" (number type/R-tune optimization), but only "nn" (number type/number optimization). e data version was also determined at this step.
Step 2.2. Variables e key to construct an SVM model that can screen the active markers accurately is to select the appropriate variables. Variable selection was according to two methods: near zero variance method (R library: nearZeroVar; "T" means deletable) and model assessment (R library: varImp; "0" means deletable).
Step 3. Key parameters With complete data, the general modeling set was from stratifying random-split cross validation into training data (70% data) and testing data (30% data), containing proportional positive and negative cases, respectively.

Cross validation
Step 1 Step 2 Step 3 Step 4  Step 1: original data fixed into raw data and complete data.
Step 3: adjustment of parameters in 70% complete data to set up model using 3 × 10 cross validation and parameter composition. Analysis optimizing parameters (gamma and nu values) in curve regression. Step 4: model prediction performances in four indexes and ROC curves.
precise rate) values were selected as evaluation indicators. Parameter value determination (e.g., gamma, nu, cost): gamma and cost were set as the minimal values; nu was set as the mean value in parameter composition.
Step 3.2. Regressions for selected parameters Regression analysis of parameters (gamma, nu) and corresponding case number were tested for computable correlations.
Step 4. Model prediction performance e model prediction was performed on testing data and other testing drugs. Four indexes (accuracy, F1, kappa, and sensitivity) and ROC were checked for its effect.

Statistical Analysis.
Descriptive analysis was used to summarize patient demographic characteristics, with mean values for continuous variables and ratios for categorical variables. To explain the impacting factors in hyperglycemia, "Reaction" is defined as a dependent variable and others as response variables. T-test was performed for comparing normal distributions and defining 95% confidence intervals, and Wilcoxon rank test was used for comparing other unknown distributions. R language (version 4.2.0 for Windows) was used for statistics, and its library of e1071 was used for SVM model buildup.

Results
A demographic summary of complete data for analysis is provided in Table 1. Due to the limited number (only 1) of positive cases in the complete data, cam and cem were not included in model setup. e two drugs as well as ipi were set as testing drugs.
Pilot assay was used for algorithm selection based on proportional 600 cases of each drug. Positive and negative cases were classified as number type ("1," "0") or factor type ("Yes," "No"). Optimization methods were combined as number and R-tune. Accuracy (total precise rate), F1 score, kappa (consistence), and sensitivity (positive precise rate or recall) values were evaluated as indicators of performance. Confusion matrix was calculated according to Table2. Compositions of "fn" and "nn" in complete data displayed best performances in Figure 2.
One key to constructing an SVM model that can accurately screen the active markers is to select the appropriate variable indexes [23]. Classical variable selection methods were near zero variance method and model assessment.
ree variables (Nos. 9-11, Table 3) as well as "ID" and "Outcomes" were deleted in the following analysis. "Source" and "Reporter" introduced where the ADR came from, without more clinical information in ADR control; "Outcomes" did not influence "Reactions." e other 8 variables (Nos. 1-8, Table 3) were selected for model setup.
SVM model is determined by its kernel ("l," "p," "r," "s") and type ("C," "one," "eps," "nu"). e five PD-1/L1 drugs were test by the 4 × 4 kernel-type cross compositions. As the index range span is too long, y-axis values were logged in the boxplots. In number type, "r-nu" ("rbf" and "nu-regression") displayed better performances, especially at the important indexes of F1 score and kappa. In factor type, "p-C" displayed better performances, except for being lower at sensitivity. Moreover, "r-nu" performed better than "p-C" at F1 score and kappa ( Figure 3).   As "eps" and "nu" regression need numeric dependent variables, "r-nu" and number type/number optimization were selected for SVM model setup.
Generally, modeling set was from stratifying randomsplit cross validation into training data (70% data) and testing data (30% data), containing proportional positive and negative cases, respectively.
Parameters of SVM mainly included degree, cost, gamma, nu, coef0. From pilot study, nu was defined by the mean value and cost (=1) and gamma by the minimal value  Figure 2: Performance for data version and algorithm classification. In raw data: f_raw: factor type; n_raw: number type. In complete data: nn: number type, number optimization; ft: factor type, R-tune optimization; fn: factor type, number optimization. To compare the effect of missing values, raw data and complete data (filtered from raw data) were both checked in the pilot test. e positive cases were marked in the form of factor type ("yes" and "no") or number type ("1" and "0"); the optimization methods were number and tune. According to SVM model options, there were 5 compositing types. e assessing indexes of tested drugs were checked for the more optimized algorithm. All four indexes in raw data displayed lower scores, while these in algorithms (fn, nn) of complete data displayed highest scores.
Serious T  0  T  0  T  0  T  0  T  0  11  Source  T  0  T  0  T  0  T  0  T  0 nzv: near zero variance. Overall: overall value of model assessment. T: true. F: false ("T" or "0" means this variable should be deleted in each method).
Journal of Healthcare Engineering 5 in the best prediction range (1∼10); while other parameters did not need to be adjusted. e main model setup algorithm is shown in Algorithm 1. Main model setup indicated the critical points depended on the patterns of nu and gamma. In regression analysis, we found out that the nu values (0 < nu ≤ 1) were exponential and gamma values (1 ≤ gamma ≤ 10) were quadratic to case number, regression formula, and curves as in (1) and (2) and Figure 4. (2) e setup model with optimized parameters (type: nuregression; kernel: rbf; parameter: nu and gamma from formulas (1) and (2)) was applied on testing-part (5 testing data and 3 drugs). ree indexes (accuracy, F1 score, and  Figure 3: Performance of kernel and type compositions. SVM algorithm is based on its kernel ("l," "p," "r," "s") and type ("C," "one," "eps," "nu"). Drugs were tested by the 4 × 4 kernel-type cross compositions. In number type, "r-nu" ("rbf" and "nu-regression") displayed highest score in (b) and (c) and were still in the top class in (a) and (d). In factor type, "p-C" displayed highest scores in (e)-(g) but lower scores in (h). Comprehensively, "p-C" (the best in factor type) performed weaker than "r-nu" (number type). (a) Accuracy (log Y).
Journal of Healthcare Engineering kappa) were greatly improved in model than in initial prediction ( Figure 5). Receiver operating characteristic (ROC) analysis was used to describe the discrimination accuracy of a diagnostic test or prediction model [24]. e diagnostic values from this model prediction and single variables ("Reactions," "Reason," "Country," "Weight," "Year," "Age," "Priority," and "Sex") on the testing parts were evaluated by ROC curves, whereas the predictive performances were much better from model than single variables in Figure 6.

Discussion
e application of machine learning in healthcare delivery presents unique challenges that require data preprocessing, model training, and refinement of the system with respect to the actual clinical problem [20]. In this study, we have developed a machine learning algorithm with correct cores and computable parameters.
Hyperglycemia is a serious ADR in cancer treatment [25], and it is urgent to predict occurrence among cancer sfInit(parallel � T, cpus � detectCores() − 1) sfExport("train," "myfun," "j," "i," "K," "R") a � sfLapply(nu.r, myfun) b � matrix(unlist(a), ncol � 6, byrow � T)    Figure 5: Prediction performance before/after parameter regression. TCN: total case number, PCN: positive case number in tested parts. a: accuracy, f1: F1 score, k: kappa, s: sensitivity. Red columns stand for initial prediction before parameter regression; green columns stand for prediction from model. Model set up from 5 drugs, so only testing data were for checking (a∼e); and another 3 drugs for checking (f∼h). is model improved greatly at indexes of accuracy, F1 score, and kappa, as red column scores were much lower than those of green ones. patients. Hyperglycemia influences the outcome of cancer therapy via various mechanisms such as inflammation sponsoring [1], immune destruction [26]. In inclusion criteria, "Diabetic" is set as an item to include diabetic complications such as ketoacidosis and coma. As PD-1 joint therapies (e.g., nivo combined with ipi, and pem combined with chemotherapy [27]) have been approved by the FDA, ipi is also included for better testing hyperglycemic ADR prediction in drug usage. To the best of our knowledge, this is the first time that hyperglycemia is predicted from realworld clinical practice via machine learning model. SVM is a kind of structural dependence model to find maximum margin hyperplane with ADR and reported features. To train the algorithm, new cases are projected in the same situation to test which side of the hyperplane they are located on [22]. As the adverse events may have occurred in a small fraction of patients, for class-imbalance, data were split into proportional and random training, and testing parts were performed by R library: createDataPartition. SVM is powerful in data mining for better classification; however, it is greatly influenced by the parameters. It is inefficient to use traditional grid search, learning curve, and   in ROC curves, prediction from single items ("reactions," "reason," "country," "weight," "year," "age," "priority" and "sex") "stay" around diagonal with low area under curve (AUC), while prediction from model (red line) nearly at top in (a∼f ). In (g∼h), the single variable did not "stay" around diagonal, maybe because of the influence by limited positive case (only 1); but red lines were still at the top. e predictive performances were much better improved from this model than any single variables. other parameter adjustment methods [28]. SVM is usually used as control algorithm in diabetic prediction [29], while our model has obtained the optimal parameters effectively considering the key parameters as well as their large space and enhanced the prediction precision, where we discussed its parameter tuning and provided a new conception of parameter adjustment.
In raw data, the high-performance accuracy was from the data itself, as the low hyperglycemic incidence and high negative-case number pushed up its accuracy. In complete data, number optimization performed better than R-tune optimization, in that R-tune leans toward various minor compositions other than the comprehensive adjustment in number optimization (Figure 2).
Variable selection depended on both statistical and clinical significance. Due to omission values and duplicated cases, variables should be filtered in machine learning. Variable importance was tested by two methods: in near zero variance method, variables displayed "T" as deletable; in overall value of model assessment, variables displayed "0" as deletable. Compared the two methods and considered clinical implications, variables of "reactions," "reason," "country," "age," "weight," and "year" are included. ough "Sex" is deletable in overall method, "nzv" method indicated it as meaningful. Furthermore, "Sex" was an important parameter clinically, so it was included in variables. For the clinical and algorithmic assessment, "Serious," "Source," and "Reporter" were not included in model setup (Table 3).
SVM model is usually based on kernel and type kinds. From crossing composition, "r-nu" in number type showed better value than average in accuracy and sensitivity, and top values in F1 score and kappa, and it is better than "p-C" of factor type ( Figure 3). Furthermore, "eps" and "nu" regression did not accept factorial variables. e number type/ number optimization and kernel and type of "r-nu" composition were selected for SVM model setup. Since SVM-Type of nu-classification did not work in both factor and number here, it was not checked in type and kernel (Step 2.3).
Five parameters (degree, cost, gamma, nu, and coef0) were adjusted in parameter selection (Step 3.1). As no results changed in regulating degree and coef0, the two parameters were set as the function default. To avoid overfitting, cost value was defined as "1" (the function default number). Adjusting the other two parameters (nu and gamma) could improve model performance in 3 × 10 (10-fold 3-time) cross validation.
en, nu was defined by the mean value and gamma by the minimal value from model. Because the whole range of nu (0∼1) had been checked in cross validation, its mean value could cover entire situations. However, gamma determines distribution of new feature space, meaning that the smaller the gamma value, the more the support vectors. So, the choice of gamma on smallest value would not be effected by its primeval testing range. Different from previously published preprint [30], model results from 5 drugs were included because of imported new cases and new positive case criteria. In regression test, we found that if nu was chosen after minimizing gamma value, the prediction performances were not well either at small or big case volume (in linear, partially linear, or nonlinear regression). For balance, we chose gamma from 1∼10, while we chose nu from 0∼1 in this study; Figure 5 displays their good prediction performances.
In the exponential curves (1), the constant value (6984) was iterated from more and more narrow ranges in linear regression between nu value and e (− case number/constant) (iterated highest r 2 ). e positive hyperglycemic ADR case ratio is relatively low (<5%, Table 1), so the mass of negative cases pushed up sensitivity values in initial prediction ( Figure 5). e other three indexes could also support this model's good performance. e graphical ROC curve is produced by plotting sensitivity (true positive rate) on the y-axis against 1-specificity (false positive rate) on the x-axis for the various values tabulated [31]. Areas under the ROC curves (AUC) from single variables were far less than those from model prediction (the uppermost red line in Figure 6). e prediction of composite variable was powerful, since that of every single item is close to the diagonal line separately. In cam and cem, low positive case (only 1 case) influenced single variable prediction, but they were also lower than model prediction (the uppermost red line in Figure 6).
One SVM's downside is that finding the best model requires testing of various compositions of kernels and model parameters.
e success or failure of machine learning approaches on a given problem may vary strongly with the expertise of the user [14]. Our model process set up an efficient kernel and composite parameter and improved the prediction for positive cases.
ough SVMs are extremely powerful classifiers and there are no medicolegal implications, clinicians' understanding, or privacy and security on public database, several limitations must be addressed. One thing that needs to be clear is that pharmacovigilance databases can just be used to predict correlations rather than causality. FAERS (including spontaneous reporting systems) declares no certainty that the reported event was due to the product, insufficient details, incomplete reports, and duplicate reports [13]. Other drug factors (e.g., doses, frequency) and biological factors (e.g., genomic data, personal habits) were not included in the development of the present algorithm, which might prevent accuracy and F1 score from rising up to 100%. It should be clear from the narrative examples used in this paper that choice, tuning, and diagnosis of machine learning applications are far from mechanical [32]. Furthermore, (1) and (2) could only function in given case range; if the case number is too huge or small, the model should be re-set up.
Despite the limitations described above, this algorithm has provided meaningful information in order to adjust care goals for these patients or provide signs for further wellorganized clinical studies.
ADR is one of the pharmacovigilance keys during drug post-marketing phase. On the one hand, it is challengeable to find out the effective and fast means from clinical heterogeneity results. On the other hand, interestingly, ADR prediction is more important in that its occurrence is usually unknown in clinical treatment, since real time monitoring is expensive and inconvenient for discovering ADRs. is algorithm model from clinical available features at the time of presentation was proved robust and generalizable in later testing-part sequence.
is noninvasive and precise prediction could greatly help clinical practitioners to distinguish high-risk patients. erefore, this study provided an orientation to predict hyperglycemic ADR with these drugs.

Conclusion
In summary, the SVM model established here can noninvasively and precisely predict hyperglycemic ADR in patients treated with PD-1/PD-L1 inhibitors from given personal-related features and given case number. An SVM model was set up based on compositions of correct kernels and computable parameters. e SVM model showed good prediction performance in testing data, which proved that this model is robust and generalizable in this field. is model setup process provided an analytic conception for promotion to other ADR prediction. We also believe that the availability of medical and personal information will further facilitate this model. Such information from prediction is vital for preventing or even overcoming ADR and to improve patient outcomes by distinguishing high hyperglycemia-risk patients, and this machine learning algorithm can eventually add value to clinical decision making.

Data Availability
e data used to support the findings of this study are available from FAERS public dashboard; and algorithm(s) are available from the corresponding author on reasonable request.

Disclosure
A preprint of this paper has previously been published [30].

Conflicts of Interest
All the authors declare that there are no conflicts of interest.

Authors' Contributions
Jincheng Yang and Ning Li contributed equally to this work.