Robust PLS Prediction Model for Saikosaponin A in Bupleurum chinense DC. Coupled with Granularity-Hybrid Calibration Set

This study demonstrated particle size effect on the measurement of saikosaponin A in Bupleurum chinense DC. by near infrared reflectance (NIR) spectroscopy. Four types of granularity were prepared including powder samples passed through 40-mesh, 65-mesh, 80-mesh, and 100-mesh sieve. Effects of granularity on NIR spectra were investigated, which showed to be wavelength dependent. NIR intensity was proportional to particle size in the first combination-overtone and combination region. Local partial least squares model was constructed separately for every kind of samples, and data-preprocessing techniques were performed to optimize calibration model. The 65-mesh model exhibited the best prediction ability with root mean of square error of prediction (RMSEP) = 0.492 mg·g−1, correlation coefficient (R P) = 0.9221, and relative predictive determinant (RPD) = 2.58. Furthermore, a granularity-hybrid calibration model was developed by incorporating granularity variation. Granularity-hybrid model showed better performance than local model. The model performance with 65-mesh samples was still the most accurate with RMSEP = 0.481 mg·g−1, R P = 0.9279, and RPD = 2.64. All the results presented the guidance for construction of a robust model coupled with granularity-hybrid calibration set.


Introduction
Near infrared (NIR) reflectance spectroscopy is widely used for quality assessment of solid sample in areas of pharmaceuticals, agriculture, food, fruits, forage, and so on due to its rapid measuring speed, flexibility, and less or even no sample preparation [1][2][3]. This technology has also shown many applications in Chinese herbal medicine (CHM), including quality control of raw materials [4], manufacturing process control [5][6][7][8], and quality assessment of final dosage form [9]. Before NIR analysis, sample preparation of CHM is vital because CHM shape was irregular with coarse surface. Sample preparation was performed by crushing the sample into powder and controlling the particle size by passing the ground powder through sieves so as to keep the consistency of sample presentation.
However, for sample presentation of CHM, different particle sizes affected sample homogeneity, sample packing density, and sample surface, which all introduced uncontrolled variations that brought forth difference in optical path length and multiplicative light scattering effects [10,11]. Several mathematical methods such as multiplicative scatter correction (MSC) [12], standard normal variate (SNV), extended multiplicative scatter correction (EMSC) [13], orthogonal signal correction (OSC) [14], and optical path length estimation and correction (OPLEC) [15] have been used to mitigate light scattering effects. But the degree of the scattering effects to be mitigated was different according to different granularity effect of sample.
In addition, the fact that sample presentation to the instrument (e.g., particle size) has been found to affect the characteristics of NIR spectra should be paid great attention, 2 Journal of Analytical Methods in Chemistry thus determining the robustness and accuracy of NIR as analytical technique. According to the effect of soil particle size (SPS) on the NIR measurement of exchangeable sodium (Na), NIR accuracy for soils with great particle sizes (SPS-0.212, 0.212 mm) was higher than soil with small particle sizes (SPS-0.053, 0.053 mm) [16].
Therefore, how to guarantee low noise and good NIR model performance with different granularity effect was worth clarification. Researches concerning this issue have done limited work to give conduction in CHM. David reported a method for quantifying the median particle size of a dry powder using preprocessing NIR spectra. A quadratic model was developed to explain these summations as a function of median particle size, since the effect of densification was minimal [17]. In addition, Sarraguça et al. compared the estimation of the particle size distribution of a pharmaceutical powder using NIR. The estimations were made by considering the former data blocks separately and together using a multiblock approach [18]. Furthermore, particle size determination of amoxicillin trihydrate particles was developed by Bittner. A linear coherence between particle size and absorbance signal was found at specific wavenumbers [19].
Nevertheless, this is only one paper on the particle size of CHM in NIR measurement, which illuminates the influence of granularity on NIR spectra characteristic of Coptis chinensis [20]. Few studies focused on the effect of granularity on the quantitative analysis of active pharmaceutical ingredients (API) in CHM, and there was not a globally accepted method that guided the crushing process.
Bupleurum chinense DC. is a well-known CHM and is used in at least 66% of the prescriptions in Chinese medicine and Kampo medicine [21]. Saikosaponin was demonstrated to be the major active ingredient in Bupleurum chinense DC. Therefore, the content of saikosaponin A (SSA) was quantitatively analyzed by NIR technique with different sized samples with the aim of presenting a methodology to investigate the effects of granularity on different NIR frequency range. Partial least squares (PLS) regression analysis with incorporating samples of various granularities into calibration set was developed for low content of SSA of Bupleurum chinense DC.

Sample Preparation.
All Bupleurum chinense DC. samples were collected from different growing places of China to give increased geographical variations. All the samples were identified by Dr. Chunsheng Liu (Beijing University of Chinese Medicine, China). Sample origins and the numbers of samples are shown in Table 1.
After being cleaned by brushing off soil dust from the surface, Bupleurum chinense DC. was crushed into pieces by a disintegrator. Then the samples were ground to fine pieces with a blender and screened through a 20-mesh sieve. Finally, the powders were divided into four parts. Every part was continually smashed and screened through 40-, 65-, 80-, and 100-mesh sieve, respectively. S h a n x i U n k n o w n 6∼9 S h a n x i U n k n o w n 10∼14 Shanxi Hebei Cultivated 2.2. NIR Spectra Acquisition. About 1 g sample powder was packed into the sample cup. NIR spectra were acquired in reflectance mode with the Integrating-Sphere module of the Antaris I FT-NIR analyzer (Thermo Fisher, USA). Each spectrum was the average of 64 successive scans with air as the background. The spectral range was 10000-4000 cm −1 with 1.928 cm −1 data interval. To guarantee the analysis accuracy, each sample was analyzed in triplicate and the mean value of three spectra was used in the following analysis. To avoid the effects of environment condition in the laboratory, such as temperature and humidity, the room temperature was controlled at 25 ∘ C, and the humidity was kept at an ambient level.

Reference Analysis
Method. The reference method used for SSA determination was the high performance liquid chromatography (HPLC) assay recommended by the Chinese Pharmacopoeia (ChP, 2010 Edition) for Bupleurum chinense DC. Amounts of SAA (12.5 mg) were accurately weighed using an XS205DU electronic balance (Mettler Toledo, Greifensee, Switzerland) and dissolved with methanol into a 25 mL volumetric flask. Chromatographic analysis was conducted on a Wondasil C18 column (250 mm × 4.6 mm, 5 m, SHIMADZU, Japan) at 30 ∘ C using an Agilent 1100 series HPLC apparatus, equipped with a quaternary solvent delivery system, an autosampler, and a DAD detector. The detection wavelength was 210 nm. With a flow rate of 1.0 mL/min, the linear gradient elution program was set, as shown in Table 2.

Data Pretreatment and Analysis.
All the computations were performed using TQ Analyst software package (version 8.0, Thermo Scientific, Madison, USA). Other data analyses were performed by Unscrambler 9.7 software package (Camo Software AS, Norway) and MATLAB version 7.0 (MathWorks Inc., USA). Some of the algorithms used in this paper were developed by us.  Figure 1 shows typical HPLC chromatograms of Bupleurum chinense DC. extraction solution. The retention time of the SSA in the sample solution was the same with the reference standard solution. The calibration curve of the HPLC method was investigated before real sample analysis. The calibration curve exhibited good linearity ( = 0.0031 + 0.0126, 2 = 0.9999) within the content range 0.804-6.432 g.

Effects of Granularity on Absorption Characteristics of
Overtones and Combination of NIR. Figure 2(a) shows typical raw spectra of one sample with different granularity. Figure 2(b) describes overtones and combination characteristics of NIR spectra to the granularity. It was obvious that the difference of spectral characteristics was closely related to granularity. The effects of granularity were wavelength dependent. According to Kubelka-Munk function (1), reflectance was inversely proportional to the light scatter coefficient : Former research demonstrated that value was inversely proportional to particle size [22]. Therefore, Log(1/ ) value was proportional to particle size. However, Figure 2 shows that this principle was only effective for NIR spectra of Bupleurum chinense DC. in the first combination-overtone region (FCOT, 7100-5000 cm −1 ) and combination region (CR, 5000-4000 cm −1 ). It could be observed that Log(1/ ) value was sensitive to granularity changes, which tended to become larger as the particle size increased. Compared with FCOT region (RSD, 0.025-0.035), NIR absorption of CR region was more easily interfered with by granularity (RSD, above 0.035). However, in the second combination-overtone region (SCOT, 7100-10,000 cm −1 ), Log(1/ ) value was relatively steady and not vulnerable to disturbance (RSD, less than 0.015).

Optimization of NIR Data-Preprocessing Methods.
To avoid bias in sample selection, the Kennard-Stone (KS) algorithm was used to split the NIR data set into calibration and validation. Twenty concentration levels including 60 samples were used as the calibration set, and the remaining samples were the validation set, which was shown in Table 3. Outliers were firstly removed before model calibration according to Dixon test. Dixon test is defined as that if the deviation of a standard from the mean is outside a 95% confidence threshold, the standard is an outlier. Data-preprocessing techniques were investigated prior to calibration development. To optimize the spectra, the empirical multiplicative light scattering correction method, MSC, and SNV were applied. Then combination of derivative methods including first derivative (1D) and second derivative Journal of Analytical Methods in Chemistry (2D) was used to reduce baseline variations observed in original diffuse reflectance spectra and to enhance spectral features. Meanwhile, smoothing methods including Savitzky-Golay smoothing filter (SG) and Norris derivative filter (ND) were employed to depress the background noise amplified by derivative. The optimum preprocessing method was determined by the lowest PRESS value (Figure 3). It was concluded that, for Bupleurum chinense DC. of different granularity, the optimization result was a little different.

Effects of Granularity on Local PLS Model Prediction
Ability. After application of the best data pretreatments, four local PLS models were constructed with powder samples, which were screened through 40-, 65-, 80-, and 100-mesh sieve separately. To compare the prediction performance of every local model, test-set validation was performed and the result was shown in Table 4. The correlation diagram was shown in Figure 4. To avoid overfitting phenomenon,  the solid sample, sample granularity should be considered. Furthermore, the local model was not very perfect though its correlation coefficient was greater than 0.9. To further improve model performance, granularity-hybrid calibration model was tried in the next section.

Construction of Granularity-Hybrid Calibration Model.
To develop a robust calibration model and realize model's successful application, another way to defend variations of particle sizes is to construct a granularity-hybrid calibration model (GH model), including calibration set of every granularity (240 samples, 40, 65, 80, and 100 mesh). Then validation sets of every particle size were predicted by the GH model, as shown in Figure 4. RMSEP and were compared to find whether model with granularity-hybrid sample set could be more accurately predicted. GH model performance constructed with different datapreprocessing methods was exhibited in Table 5. We concluded that MSC + 1D + SG was the best method for GH model development. The correlation diagram of GH model was shown in Figure 5. Similarly conclusion has shown that model performance of 65-mesh sample could be the most accurately predicted based on the chemometric indicators. The 80-mesh and 100-mesh samples' prediction results showed no significant difference ranking the second. Furthermore, the prediction performance of 40-mesh samples was still the worst. The above results illuminated basic guidance for sample preparation. It was obvious that GH model was better than local model, which demonstrated that hybrid calibration model was a good alternative to deal with variations.

Conclusions
Effects of granularity on NIR were investigated; the results concluded that influence on NIR spectra was wavelength dependent. NIR intensity was proportional to particle size in the FCOT and CR region. After appropriate data preprocessing, the local PLS model of 65-mesh samples showed the best prediction ability for Bupleurum chinense DC. Furthermore, a granularity-hybrid calibration model was developed by  incorporating granularity variation. It showed that model performance of hybrid calibration model was better than local model, which demonstrated that hybrid calibration model was a good alternative to deal with variations. All the results present guidance for sample preparation in NIR analysis of CHM and reference for construction of a robust model eliminating granularity factors.