Near-Infrared Spectroscopy as a Process Analytical Technology Tool for Monitoring the Steaming Process of Gastrodiae rhizoma with Multiparameters and Chemometrics

Steaming is a vital unit operation in traditional Chinese medicine (TCM), which greatly affects the active ingredients and the pharmacological efficacy of the products. Near-infrared (NIR) spectroscopy has already been widely used as a strong process analytical technology (PAT) tool. In this study, the potential usage of NIR spectroscopy to monitor the steaming process of Gastrodiae rhizoma was explored. About 10 lab scale batches were employed to construct quantitative models to determine four chemical ingredients and moisture change during the steaming process. Gastrodin, p-hydroxybenzyl alcohol, parishin B, and parishin A were modeled by different multivariate calibration models (SMLR and PLS), while the content of the moisture was modeled by principal component regression (PCR). In the optimized models, the root mean square errors of prediction (RMSEP) for gastrodin, p-hydroxybenzyl alcohol, parishin B, parishin A, and moisture were 0.0181, 0.0143, 0.0132, 0.0244, and 2.15, respectively, and correlation coefficients (Rp2) were 0.9591, 0.9307, 0.9309, 0.9277, and 0.9201, respectively. Three other batches' results revealed that the accuracy of the model was acceptable and that was specific for next drying step. In addition, the results demonstrated the method was reliable in process performance and robustness. This method holds a great promise to replace current subjective color judgment and time-consuming HPLC or UV/Vis methods and is suitable for rapid online monitoring and quality control in the TCM industrial steaming process.


Introduction
Gastrodiae rhizoma (G. rhizoma), also called "Tianma" in Chinese, which is regarded as one of the ten "magical plants" in China, has been widely used to treat diverse disease including headache, migraine, dizziness, epilepsy, infantile convulsion, tetany, and so on [1]. Phytochemical studies of G. rhizoma validated that the major chemical constituents linked with the pharmacological activity of this plant are phenolic compounds, such as gastrodin, gastrodigenin, bis (4-hydroxybenzyl) ether, 4-hydroxybenzaldehyde, and parishin, and more than about 30 phenolic compounds have been successfully isolated or transferred from G. rhizoma [2][3][4][5]. According to the Japanese and Chinese Pharmacopeia, the steamed roots of G. rhizoma have been defined as the standard form [6,7], which is also defined as monarch drug or the major effective ingredient of many Chinese patent medicines, such as "Quantianma Capsules," "Tianmaduzhong Capsules," and "Tianmasu Tablets." erefore, the steaming process is a vital unit operation that affects the quality of the pharmaceutical products. e purpose of steaming is to change the property of medicine and expand the range of medicine usage, to reduce side effect, or to be convenient for sliding pieces [8]. e endpoint of the traditional steaming process usually pays attention to only the content of gastrodin in G. rhizoma but ignore other active ingredients during the evaluation of quality, which may not be able to reflect the changes of the content of active ingredients and judge the endpoint accurately. Various instrumental techniques and methods have been developed for the qualitative and quantitative analysis of G. rhizoma constituents, including high-performance liquid chromatography (HPLC) or LC-MS, gas chromatography-mass spectrometry (GC-MS), and capillary electrophoresis (CE) [9][10][11][12]. However, these methods are often time-consuming, laborious, and tedious, since they require multiple steps of sample preparation. erefore, new approaches that can overcome these drawbacks are highly desirable. Process analytical technology (PAT) tool which can increase the efficiency of process environment and guarantee the final product quality to be homogeneous and controllable should be paid attention, which is useful to monitor the content of the main active components changes in the steaming process of G. rhizoma.
Near-infrared (NIR) spectroscopy fulfills many of the criteria of an ideal PAT tool for pharmaceutical applications and has already been validated for different applications such as blend homogeneity, extracting, or active content and moisture determination [13][14][15][16][17]. As NIR spectra are featured by broad and overlapping absorption bands, which have thousands of wavelength variables, identification to specific chemical group vibrations seems to be rather difficult. Consequently, chemometric tools such as mathematical pretreatments and some regression methods are used to extract the significant and useful information. To the best of our knowledge, there have no reports yet on NIR spectroscopy as a PAT tool to monitor the steaming process of G. rhizoma so far.
For the qualitative and quantitative analyses by NIR spectroscopy, multivariate calibration models could be established through the combination of information-rich spectroscopy and efficient regression tools provided by modern mathematics. However, the selection of wavelength/ variable is of great significance to acquire robust models with good performance. Nowadays, there are many mathematical strategies for variable selection such as stepwise multiple linear regression (SMLR), partial least squares (PLS), synergy interval PLS (Si-PLS), and principal component regression (PCR) [18][19][20], and some studies have shown that models built with effective wavelengths have a better performance. e aim of this study was to (1) prepare different algorithm and build high-performance NIR calibration models in the steaming process of G. rhizoma: the PLS models will be evaluated to determine the contents of gastrodin, p-hydroxybenzyl alcohol, parishin A, and parishin B; the PCR models will be tested to determine the contents of the moisture content. (2) Investigate the feasibility and application of NIR spectroscopy to monitor the changes of chemical and physical properties mentioned before during the additional steaming process of G. rhizoma.

Steaming Process and Sample Collection.
Steaming process of G. rhizoma was simulated according to the actual process condition. Upon arrival, each raw G. rhizoma was washed cleanly immediately and then smashed into fluid homogenate. About 140 g of G. rhizoma was put into an electric steamer with a stirring paddle (Supor Group, Hangzhou, China) and heated for 10 minutes. e temperature of the herbal medicine during the steaming process was controlled at 100°C. e uniform distribution of samples with high or low concentrations is indispensable with the purpose to obtain similar prediction accuracy [21]. Samples were collected at 30 s intervals for reference analysis immediately after spectral measurements during the whole steaming process. In this study, about 200 samples were collected from 10 different batches, which were used to build the calibration and validation models.
All the procedures were strictly controlled to lower the risk of the uncertain parameters during sample collection, separation, and process.

Near-Infrared Spectra Acquisition.
e NIR spectra were collected using an Antaris II FT-NIR analyzer ( ermo Nicolet, USA) at room temperature using the standard method. An integrating sphere diffused reflection mode with an InGaAs detector was selected to record the in-line NIR data throughout the steaming process dynamically.
Spectra were dynamically recorded in-line over the wavelength range from 4,000 to 12,000 cm −1 with a resolution of 8 cm −1 . Each spectrum was an average of 32 scans and recorded as the logarithm of the reciprocal reflectance, which is log (1/R). e background spectrum was scanned with air as the reference in order to eliminate the effect caused by background. Considering that room temperature and relative humidity may be a risk to influence the surface moisture of G. rhizoma. Room temperature was maintained at 25°C and humidity at 80% during the spectra collection.

Sample Preparation and Reference Assays for HPLC.
HPLC reference analysis was performed immediately after the sample NIR spectra were collected. e sample preparation was similar to the method used before [22]. About 0.5 g G. rhizoma sample was weighed and extracted with 25 mL of diluted alcohol, weighed the total solution content before ultrasounded for 30 minutes. en, weighed again and complemented loss weight and filtered the solution twice, pipetted 10 mL of the subsequent filtrate to the evaporating dish that concentrated solution to dry, using the mixed solution of acetonitrile : water (3 : 97) to resolve residue, and fixed the volume to a 10 mL volumetric flask. e extraction solution was filtered with a 0.45 μm filter membrane before HPLC analysis. We developed an HPLC method for the quantitative determination of gastrodin, phydroxybenzyl alcohol, parishin A, and parishin B in the samples.
e chromatographic analysis was carried on a Diamonsil C 18 column (250 mm × 4.6 mm, 5 μm) at 30°C on a 1260 HPLC system (Agilent, Santa Clara, USA) consisting of a vacuum degasser, a thermostatic column compartment, a quaternary pump, an auto sampler, and a diode array detector (DAD). e mobile phase in this study consisted of 0.05% phosphoric acid solution (A) and acetonitrile (B) at a flow rate of 1.0 mL/minutes. A gradient program was set as the following profile: 0-10 minutes, 3%-8% B; 10-18 minutes, 8%-12% B; 18-40 minutes, 12%-25% B; and 40-50 minutes, 25%-40% B. e detection wavelength was set at 270 nm, and the injection volume was 10 μL. e moisture content during the steaming process was determined by the weight loss method according to the Chinese Pharmacopoeia. All of the results acquired above were used as reference values for the NIR analysis. e standard stock solutions used were prepared in advance by dissolving the four reference standards in 50% methanol (methanol: water � 1 : 1, v/v) to a final concentration of 0.0591 mg/mL for gastrodin, 0.0055 mg/mL for phydroxybenzyl alcohol, 0.0315 mg/mL for parishin A, and 0.0025 mg/mL for parishin B.

Spectra Transformation and Data Analysis.
It is necessary to transform the NIR spectra to remove noise and irrelevant information and to select the variables to reduce the phenomena of redundancies and colinear besides to improve the prediction performance of the models. erefore, spectra were transformed with several different methods, such as multiplicative scatter correction (MSC), standard normal variate transformation (SNV), first derivation (FD), and Savitzky-Golay filter (SG) smoothing. For the quantification of the four phenolic compounds, three different regression models were adopted, namely, SMLR, full-PLS, and Si-PLS, and established, and their results performance was systemically compared and explored. For the quantification of moisture content, the PCR model was constructed.

Evaluation of Model Performance.
e performance value of established different models (i.e., SMLR, PLS, Si-PLS, and PCR) was evaluated by four performance indexes including the determination coefficients of calibration (Rc 2 ), root mean square error of calibration (RMSEC), coefficients of prediction (Rp 2 ), and root mean square error of prediction (RMSEP). e determination coefficient (R 2 ) reflects the consistency between the actual and the predictions of the quality parameters, which suggest an performance about the predictive efficiency of the model. e models' efficiency can be concluded by the parameters of RMSEC, RMSEP, and RPD (ratio of standard deviation of the validation set to standard error of prediction). Generally speaking, a highperformance model should yield higher Rc 2 and Rp 2 values but lower RMSEC and RMSEP values.

Software.
e samples were divided into two groups consisting calibration and validation sets by the Kennard-Stone algorithm. All data processing of NIR spectra and applications of chemometric methods, including spectral transformation, wavelength/variable selection, and different model construction (SMLR, full-PLS, si-PLS, and PCR), were conducted using TQ Analysis software (version 8.0, ermo Nicolet, USA). A paired t-test was performed to determine if there were differences among the five components' contents obtained by HPLC and in-line NIR analysis using SPSS 17.0 (SPSS Standard version 17.0, SPSS Inc., Chicago, IL), which is a simple and efficient tool for data analysis.

Results of Reference Values Analysis.
All samples collected were analyzed using the HPLC method described in Section 2.4. A representative steaming process chromatogram is shown in Figure 1, which reflects that the four phenolic compounds (gastrodin, p-hydroxybenzyl alcohol, parishin A, and parishin B) were all baseline separated. e HPLC method was validated before analyzing the samples. e main parameters of the HPLC method are listed in Table 1.
e moisture and the concentration of the four analytes in the steaming process samples are listed in Table 2.
e measurement results of four phenolic compounds and moisture in raw G. rhizoma showed obvious variation in different samples. e gastrodin, p-hydroxybenzyl alcohol, parishin A, and parishin B content ranged from 0.18% to 0.54%, 0.10% to 0.34%, 0.20% to 0.54%, and 0.03% to 0.25%, respectively, with an RSD value of 7.8%, 5.8%, 4.1%, and 3.5%, respectively. Since the raw G. rhizoma materials with variable quality due to different geographical sources, harvest times, cultivation conditions and storage, the quality control is essential for steaming process preparation.

Analysis of Near-Infrared Spectra.
e raw NIR spectra of these collected samples during the steaming process are shown in Figure 2(a), which well monitored the changes in physical and chemical attributes. As the spectra show, the absorbance increased with the process of steaming. While it is generally known that the NIR spectra features with the overtones and combinations of species contain H groups such as -OH, -CH, and -NH, there still exist several characteristic absorption peaks. According to previous studies [23][24][25], the region from 4200 to 5000 cm −1 is implied by the C-H, O-H, and N-H stretch/C-H deformations in the phenyl, since there are several phenyls in the molecular structures of gastrodin, p-hydroxybenzyl alcohol, parishin A, and parishin B. In addition, the intense absorptions of the NIR spectra at 5155 cm −1 and 6944 cm −1 were accounted for the first overtone and deformation of O-H in water [26]. To some extent, the change of spectra in those ranges can describe samples characteristics of the steaming process. erefore, the multivariate calibration techniques are useful to reveal the relationship between the NIR spectra and the parameters. Preprocessing of the raw spectra with enhanced signal-to-noise ratios and removal of invalid variations is a necessary step to build high-performance models. To eliminate the baseline drift and scattering effects derived from the inhomogeneous distribution and irregular form of the particles, the first derivative (FD) can be a better selection. To remove the augmentation of noise that derived from the derivatization, the Savitzky-Golay filter algorithm was useful. Finally, in this study, FD with SG smoothing (7th order polynomial, a 5-point window) (FD/SG) was adopted. e spectra processed by FD/SG are shown in Figure 2(b).

Division of Calibration and Validation Sets.
All samples were divided into two subsets: the calibration and the prediction subsets. e former was used to establish models and the latter to test the models' accuracy. Initially, spectral outliers were determined by the principal component analysis (PCA) method. According to the original PCA score plots, samples 56, 63, 72, and 141 were abnormal points, which was necessary to eliminate them before model calibration. en, the Kennard and Stone (K-S) algorithm was adopted to ensure that both sets were well proportioned, which is to cover the multidimensional space in a uniform manner by maximizing the Euclidean distances between already selected objects and the remaining objects. Finally, in about four-fifths of the total samples, 156 were chosen as the calibration set, while the remaining 40 samples were selected   as the prediction set. e sample parameters (mean, range, and standard deviation) of the calibration and validation sets are listed in Table 3. It indicates that samples in the calibration and validation sets were distributed appropriately.

Spectral Transformation and Variables Selection.
As mentioned above, there are various signal transformation methods that can be used to remove radiation scattering and baseline drift. For example, MSC and SNV are useful for correcting light scattering effects, while FD and SD can eliminate baseline drifts and peak overlap and also can avoid enhancing the noise effect. e performance of four phenolic compounds and moisture calibration with different methods is shown in Table 4, which was evaluated by RMSEC and Rc 2 . e optimization of the spectral transformation methods for NIR models of gastrodin was MSC + SG9 + FD, in which the RMSEC and Rc 2 of the model were 0.0160 and 0.9610; for the NIR models of p-hydroxybenzyl alcohol was MSC + SG7 + FD, which the RMSEC and Rc 2 of the model were 0.0165 and 0.9331; for the NIR models of parishin B was MSC + SG7 + FD, in which the RMSEC and Rc 2 of the model were 0.0108 and 0.9245; for the NIR models of parishin A was MSC + SG9 + FD, in which the RMSEC and Rc 2 of the model were 0.0181 and 0.9561; and for the NIR models of moisture was MSC + SG9 + SD, in which the RMSEC and Rc 2 of the model were 1.7 and 0.9513.

Development of Calibration Models.
In this study, four different multivariate calibration models including SMLR, full-PLS, si-PLS, and PCR were used to establish the calculated models, and their performance was compared and validated. Specifically, the SMLR model is an early developed regression method that is suitable for the simple system, which performed a better linear relation among different varieties. However, it is prone to be overfitting and lose useful spectrum information. e PCR model was a linear regression model tool that decomposes X matric spectrum information, which contains an important step to select best principle factors. Full-PLS is an improved multivariate regression tool that made use of X and Y matric spectrum information, which was built on the full spectrum. While the Si-PLS model is developed on the different optimal subintervals, which is a subinterval-combination procedure test better than the full-PLS model.

Results of PLS Models.
Gastrodin, p-hydroxybenzyl alcohol, parishin A, and parishin B were modeled by SMLR, full-PLS, and si-PLS. ese models were established using 8 batches as the calibration set and verified by two batches as the validation set. Besides, the latent variables (LVs) were optimized by the leave-one-out method and were determined according to the minimum value of RMSECV. ese models' performance was evaluated by the Rc 2 and RMSEC value. e performance of five parameters is shown in Table 5.
As can be seen from Table 5, the SMLR models performed worse as compared with other models, considering that they may lose some important spectrum information to reduce the models' prediction ability. Given this, we selected PLS methods, including full-PLS and Si-PLS models, while the number of principal components (PCs) is critical to the full-PLS models' performance. In this study, the optimum numbers of PCs of the full-PLS models for gastrodin, phydroxybenzyl alcohol, parishin B, and parishin A were 8, 7, 8, and 6, respectively. e results indicated that the calibration model can be further improved to show a better performance.

Results of PCR Models.
e moisture content was modeled by PCR. PCR is a kind of multivariate regression tool, which can map the complex and nonlinear data into a higher dimensional feather space. Some studies have validated that the intense absorptions of the NIR spectra at 5155 cm −1 and 6944 cm −1 were accounted for the first overtone and deformation of O-H in water [21], and the absorption of former spectra band was stronger than latter. Considering these information for moisture feather spectrum absorption, in this study, these models were first established by 8 batches as the calibration set and verified by two batches as the validation set. e Mahalanobis distance was selected to delete the spectrum outlier in the PCR model in Figure 4. e result suggested that 4 samples should be ignored, and remaining 156 samples were used to establish the calibration model. en, MSC was chosen as the signal transformation method, taking into account that this signal transformation can reduce the effect of scattered light on diffuse reflection NIR spectra, and SD with SG smoothing (9 th order polynomial, a 5-point window) (SD/SG) was adopted in Figure 5. As can be seen from Figure 5, the selected spectral area is related to the moisture content of the sample. It was further confirmed by the fist loading factor of the PCR model from Figure 6, which reflects the main loading information of spectrum. In the end, the performance of the PCR model was evaluated in terms of Rc 2 and RMSEC value. e result of the moisture PCR model is shown in Figure 7.

Validation of Best-Fitted Calibration Models.
To predict the content of gastrodin, p-hydroxybenzyl alcohol, parishin A, and parishin B and moisture in the steaming process, two batches of 40 samples were selected as the validation set. RMSEP was used as the most critical performance index to evaluate the predictive ability of calibration models. e RMSEP values for above five parameters were sufficiently low, and R 2 p value was high enough, which means that the models had superior predictive ability. e performance indexes (R 2 p , RMSEP, and RSEP) to evaluate the predictive ability of PLS (full-PLS and Si-PLS) and PCR models are shown in Table 6. Prediction results of the models are shown in Figures 7 and 8. e results show that the established models had a satisfactory predictive ability, and the models could be used to monitor the content of the four analytes and the changes of moisture in the steaming process of G. rhizoma.

In-Line and Real-Time Monitoring.
In traditional Chinese processing of G. rhizoma, steaming is the first step in the manufacturing procedure; however, the variation caused by the batch to batch raw Chinese medicine materials (origin, variety, and so on) and operation environment (equipment, operator, procedure, and so on) could have an effect on the next drying process. In order to improve production efficiency and ensure uniform and controllable quality and in-line and real-time parameter measurements based on PAT technique as NIR would be necessary.
In this study, the best-fitted models have been calculated, validated, and uploaded to the above NIR instrument. en, this NIR method was put into use to monitor the in-line steaming process of three additional batches (provided by the Guizhou Jiulong Group), which were not used before either as a calibration or validation set. Good agreement between gastrodin, p-hydroxybenzyl alcohol, parishin A, and parishin B and moisture values predicted by NIR and HPLC was concluded for these additional testing as shown in Figure 9, indicating that these models could be successfully applied in real-time to monitor the steaming process of G. rhizoma.
In addition, to evaluate the robustness of the established NIR method, the components' contents obtained by HPLC and in-line NIR analysis were compared using a paired t-test in Table 7. Before the paired t-test, the variance of both methods was compared using an F-test to assess whether they differed significantly, and the results showed that the experimental statistic was lower than the critical (for a significance level of 0.05); thus, it can be concluded that there were no significant differences between the standard deviation of the methods for the sample sets.
Equally, the t-experimental statistics were also lower than t-critical (for a significance level of 0.05) for five quality control indicators. us, the robustness of both methods was comparable, and the established NIR models were considered acceptable for a manufacturing perspective.
Finally, if NIR has been applied to control the endpoint of steaming, the batches would have been released for the next drying step in real-time. While in the traditional Chinese operational mode, the batch operator is waited until time up or depended on the analysis. is "conventional procedure" implied not only a considerable increment in manufacturing, which is time-consuming, laborious, and tedious, but also a potential risk of oversteaming. However,  the validated NIR application here is now applied to realize these benefits.

Conclusions
e steaming process of G. rhizoma was stimulated under lab scale in this study, which evaluated the feasibility of the PAT tool NIR spectroscopy approach to improve the quality control efficiency of the process of G. rhizoma. First, the reliable and robust NIR quantitative models of gastrodin, p-hydroxybenzyl alcohol, parishin A, and parishin B and moisture in the steaming process were established and validated, and the proposed algorithm Si-PLS was superior to other models. en, the established best-fitted models were in real-time applied to the release test of G. rhizoma, which can guarantee stable and reliable quality of the steaming process. Additionally, compared to the traditional method such as HPLC, this nondestructive and rapid technique could offer significant advantages, especially in aspect of improving the stability and uniformity of the product, which is beneficial to industrial factory. Overall, the results of this study showed that NIR technique coupled with the multivariate regression tool could be applied successfully for real-time and in-line measurements of the steaming process of G. rhizoma on an actual industrial scale. In addition, since these results preliminarily demonstrated that NIR technique coupled with the multivariate regression tool could be applied successfully for real-time and inline measurements of the steaming process of G. rhizoma, more samples will be needed to confirm these results and develop more robust models for prediction in the further study.

Data Availability
e data used to support the results of this study are included within the article. Any further information is available from the authors upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.