A Dataset on Corn Silage in China Used to Establish a Prediction Model Showing Variation in Nutrient Composition

,


Introduction
Te Chinese dairy industry has developed rapidly over the recent decades and needs to provide milk products for a quarter of the world's population. To adapt to intensive systems, it further requires sustainable supplementation of feedstufs. A sequence of studies indicates that the nutrient variation of feedstufs contributes to variation in dairy production [1,2]. For example, a 13.1% crude protein (CP) diet signifcantly reduced milk yield to 16.2% CP of the total mixed ration (TMR) in dairy cows [3]. Te increase in ether extract (EE) improved the milk fat percentage of dairy cows [4]. Te content of starch (28.5%) owned higher dry matter intake (DMI) and milk yield (MY) than 24% [5]. Tus, it is truly important to detect the nutrient composition before they are utilized.
Whole plant corn silage (WPCS) is the main ingredient in dairy TMR under most dietary regimes, especially for high-yield cows. Te corn silage percentage utilized in TMR has contributed to 42% [6]. Te extensive use of it could attribute to a high and stable production in conjunction with high contents of total digestible nutrients and metabolizable energy [6][7][8][9]. With the exception of genotype and harvest maturity, the yield and nutritional quality of WPSC are highly infuenced by environmental conditions [10][11][12]. For example, high growing temperatures can reduce the digestibility of corn silage because of a substantial increase in lignin content in stovers and a decrease in starch content in the cobs [13,14]. Moreover, previous studies have reported that precipitation was one of the most infuential abiotic factors for plant productivity [15], and drought stress generally contributed to delays in plant growth and development by decreasing cell elongation and reducing photosynthesis [16]. Furthermore, soil moisture and growing temperature were highly related to DM yields because they afected the canopy and anatomical development of maize crops [17]. Above all, it is necessary for dairy farms to detect the nutritional quality of roughage delivered from diferent frown regions before they are formulated and fed, which provides fundamental information to satisfy the exact nutrient requirements.
Traditional wet chemical analysis requires considerable human, material, and fnancial resources, and the reagents used would result in environmental pollution [18]. Terefore, the exploration of real-time, efcient, and environmentally friendly techniques has attracted a widespread interest. As a fast, simple, noninvasive, and economical technology, Fourier-transform infrared (ATR-FTIR) spectroscopy can complement or replace existing techniques [19][20][21][22]. Using the ATR-FTIR technique, previous studies [23,24] constructed prediction models for dry matter (DM), CP, neutral detergent fber (NDF), and acid detergent fber (ADF) contents in plants. Tese experiments have predominantly implemented traditional linear regression methods, such as partial least squares regression (PLSR), to build a prediction model between the nutrient contents and spectroscopy information of feedstufs. But nowadays, the application of artifcial neural networks (ANNs) could bring signifcant improvements in the development of models because of their ability to build complicated and potentially nonlinear relations without any prior assumptions about the underlying data-generating process [25]. A backpropagation artifcial neural network (BP-ANN) is the most representative and extensively exploited ANN using the error backpropagation algorithm [26][27][28][29]. However, to the authors' knowledge, there is limited information on the application of ATR-FTIR spectroscopy along with PLS and BP-ANN methods to predict the nutrient content of WPCS collected from various grown regions in China.
Te objective of this study was to evaluate the betweenregion diferences in nutritive components and rumen degradation of WPCS and to develop rapid and efcient models for predicting nutritional concentrations of WPCS based on ATR-FTIR spectroscopy technology combined with PLSR and BP-ANN algorithms. Simultaneously, a better prediction performance model was selected for further applications.

Sample Preparation and Chemical Measurements. Te
Zhengdan 958 cultivar of WPCS was selected for this study from eight diferent areas of China, and the location information is shown in Table 1. In each area, three plots were selected, and 20 plants from each plot, 10-15 cm above the ground, were harvested at the kernel maturity stage of the half milk line. Te exterior 1 m area of each plot was excluded from sampling to ensure uniformity in the plants being sampled. After harvesting, the corn material from the entire plant was chopped into 2 cm sections and immediately transported to the laboratory. Here, they were prepared by vacuum sealing the inoculated plant material into polyethylene bags (25 × 30 cm). Tey were then stored in the dark at an ambient temperature until analysis. Te flling, compression, and sealing processes were the same for all twentyfour bags.
After 60 d of fermentation, the polyethylene bags were opened, and the samples were collected for the measurement of nutrient concentrations and digestibility. A total of twenty-four subsamples from all the individual bags were dried in a forced-air oven at 65°C for 48 h to determine DM. Tey were then ground in a Wiley mill (Model no. 2; Arthur H. Tomas Co., Philadelphia, PA) to pass through a 1 mm screen to analyze the chemical composition or through a 4 mm screen to detect the in situ nutritive disappearance. Te crude protein (CP) was measured using the 988.05 method of the Association of Ofcial Analytical Chemists [30]. Neutral detergent fber (NDF) and acid detergent fber (ADF) analyses were performed in an ANKOM 200 fber analyzer (ANKOM Technologies, Macedon, USA) using thermostable α-amylase [31]. EE was obtained using an automatic extractor (ANKOM XT101; ANKOM Technology Corp., Macedon, NY, USA). Ash was determined by combustion at 600°C for 6 h in a furnace according to method no. 924.05 [32]. Te starch content was analyzed using a total starch assay kit (Megazyme, Bray, Ireland; method no. 996.11) based on the AOAC method [32].

Animals and Digestible Measurements. Tree healthy
Holstein dairy cows (139 ± 15 days in milk, 2.50 ± 0.50 parity) fxed with permanent rumen fstula from the experimental base of China Agricultural University were used for the in situ incubation study. Te trial procedure was submitted to the Experimental Animal Welfare and Animal Ethics Committee of China Agricultural University (approval no. CAU2021009−2). Te animals were fed TMR with a forage-to-concentrate ratio of 60: 40, twice daily at 07:00 h and 21:00 h. Te TMR components and nutrient levels are shown in Table 2. Subsamples (ca.7 g) were randomly incubated in sealed nylon bags (10 × 20 cm, pore size 40 μm) in the rumen of fstulated cows for 6, 24, 30, and 48 h, using the "gradual in/all out" schedule. Starch digestibility after 6 h of incubation and NDF digestibility after 30 h of incubation were associated with the value and quality of feedstuf [33,34]. Tree replicate bags per sample from individual cows were used at each incubation time point. After incubation, all the nylon bags were removed from the rumen, washed with cold running tap water six times, and then dried to constant weight in forced air at 65°C. Te dried residues of the replicate bags of each sample were pooled and mixed according to the incubation time, ground, and stored in sealed plastic bags for further analysis. Te rumen degradation characteristics were calculated using the following formula [35,36]:

Sample Preparation, ATR-FTIR Spectra Analysis, and
Model Building. To establish stable and precious predictive models of nutritional components (DM, CP, EE, ash, NDF, ADF, starch, Ca, and P), 974 WPCS samples (43 cultivars) were collected from more than 200 dairy farms located in Beijing, Tianjin, Ningxia, Inner Mongolia, Shandong, Heilongjiang, and some other sites. Te relative information of these samples is shown in Table 3. Te physical parameters of fresh plants, including whole-plant height and weight, kernel number, ear number, and weight, were measured immediately at harvest, and the emergence rate was calculated later. All the samples selected were chopped into  Journal of Spectroscopy small particles (1−2 cm) and transported to the laboratory, where they were ground through a 1.0 mm screen for chemical analysis, or a 0.25 mm screen for molecular spectral analysis. ATR-FTIR spectra were acquired using a Fouriertransform spectrometer (FOSS-DS-2500, FOSS Analytical SA, DK 3400 HillerØed, Denmark). Two grams of each crushed WPCS powder was placed into a glass sample. During each scanning procedure, the ATR-FTIR spectra were recorded with a wavelength in the range of 800-2500 nm at 1 nm intervals, and 32 scans at a resolution of 8 cm −1 were taken per side and averaged into a single spectrum. Each sample was scanned three times, and the average value was used for spectrum analysis. Te spectral absorbance values were obtained as log 1/R, where R is the sample refectance. Te raw ATR-FTIR spectra of the 974 samples are shown in Figure 1.
Raw spectra measured using the ATR-FTIR spectrometer included noise and extra background information in addition to sample information. Terefore, preprocessing of spectral data before calibration of a reliable, accurate, and stable model was necessary. In the current study, mean centering was applied to the spectral preprocess. A principal component analysis (PCA) model was used to detect outliers and reduce the dimensions of spectral data in the WPCS samples through principal components and scores (PCs) [27].
Te PLSR algorithm implemented in Unscrambler X 10.4 software (CAMO Software AS, Oslo, Norway) was used to establish a predictable model. A three-layer structure (input, hidden, and output layers) BP-ANN implemented in MATLAB R2019a (MathWorks Inc., Natick, MA, USA) was used as another predictable model [37]. To assess the efciency of the multivariate calibration models, two statistical parameters, root mean square error of calibration (RMSEC) and root mean square error of prediction (RMSEP), were calculated according to the following equations (27): where yi and yi are the predicted and measured values (nutrient content of the WPCS), respectively. Te correlation coefcients for calibration (R 2 c) and prediction (R 2 p) are generally used to evaluate the correlation between the results: where y is the average measurement of the WPCS samples and n denotes the number of WPCS samples in the dataset. A model with high R 2 and low RMSEC demonstrated superior performance [30]. Te model may be used for crude prediction if 0.66 ≤ R2 ≤ 0.81, more accurate prediction if 0.82 ≤ R2 ≤ 0.90, or normal analysis if R2 ≥ 0.91 [31].

Statistical
Analysis. Data on nutritional components and rumen degradation kinetic parameters were analyzed using one-way ANOVA in SAS 9.2 software (SAS Institute, Cary, NC, USA). Te Duncan method was used to analyze the multiple comparisons based on the following model: where Y ijk represents the nutritional components and realtime degradable rate, μ is the overall average, and T i represents the diferent growing regions of alfalfa hay, D j is the random efect, and e ijk is the model error.
For all the statistical analyses, a signifcant diference was declared at P < 0.05, whereas a tendency was identifed at 0.05 ≤ P ≤ 0.1.

Efects of Nutritional Contents and Rumen Degradation of WPCS Grown in Diferent Regions.
As the main roughage source of ruminant feedstufs, nutritional content and rumen degradability have attracted widespread attention. According to Table 4, except for CP, the nutrient components of WPCS, including DM, NDF, ADF, EE, ash, and starch of WPCS, varied considerably in diferent regions. WPCS grown in Wuxi had the highest DM content (93.89%), whereas Jinan (92.48%) had the lowest. Meanwhile, the highest NDF (47.19%) and ADF (26.77%) concentrations of WPCS were observed when they were cultivated in Jinan. Te city of Ningxia had the highest EE (3.32%), while Liaoning represented the opposite condition (2.33%). Tese results imply that climate conditions such as precipitation and growing temperature can afect internal nutrient accumulation in WPSC [38]. Higher soil pH accelerates the deposition of fatty acids in plants [39]. Tis means that the diferent EE contents of WPCS may be the result of soil salinity. Te WPCS from Jinan and Ningxia had higher ash content than that from Lanzhou and Durbert. Tis result may have contributed to the discrepancy in smooth harvesting ground. More soil was taken into the feedstufs, and higher ash content was detected when ground fatness was poor. Starch is one of the main factors that infuence cow milking performance [40][41][42]. Te results related to starch content in our study have verifed that alfalfa hay grown in northeast China may have a greater milking quality. Figure 2 shows that the DM degradability of WPCS planted in Bayannur was substantially higher than that in Jinan. Te starch content of WPCS from Bayannur was also the highest after 6 h in the rumen. Te diference in rumen degradation among various regions could be explained by the efective area of rumen microbial invasion to feed and the protein structure [36,37]. Te passage rate of digestation through the foreign stomachs is triggered by particle size, rumen washout, rumen wall distension, or papillae tactile signals that also occur in the diferent results [43]. Sugar digestibility may be another reason that led to the discrepancy [44], and it is worth investigating in the future. In the current study, the 6 h and 48 h rumen degradation of NDF and ADF in WPCS from diferent regions refected their various nutritional uses [45]. 850  883  917  951  985  1024  1067  1110  1153  1197  1240  1283  1326  1370  1413  1456  1499  1543  1586  1629  1672  1716  1759  1802  1845  1889  1932  1975  2018  2062  2106  2148  2192  2235  2278  2321  2365  2408  2451

Establishment and Validation of the PLSR Model.
Te substantial variation in nutritional indices and rumen degradation indicated that it was necessary to evaluate the nutrients in roughage before they were priced, formulated, and used. However, traditional chemical methods not only consume human, material, and fnancial resources but also contribute to a potential environmental pollution caused by reagents [18], which deviates from dairy farming profts and is inconsistent with sustainable development. Te conventional method resulted in some errors owing to diferent experimenters and instruments. Terefore, a rapid, efcient, and environment-friendly technique needs to be explored. ATR-FTIR technology has expanded considerably worldwide because of its ability for feld or online applications and its simultaneous evaluation of large amounts of samples over relatively short timescales. Terefore, 43 WPCS cultivars from over 200 dairy farms located in fve Chinese regions were collected to establish a model for predicting nutrients. As shown in Table S1, a high variable coefcient (CV) was calculated, especially the contents of Ca (33.58%), ash (24.08%), and starch (23.64%), which were followed by ADF  Tables S2 and S3. PLSR is the most commonly used regression method for quantitative analysis of the ATR-FTIR spectrum [46]. In this study, cross-validation was performed on the calibration set to select the optimal factors for the PLSR model [22]. With the growth of the factors, the ascensional range of the explained variance becomes relatively small. Te closer the explained value is to 1, the higher the accuracy of the constructed model. However, a wide gap between the calibration and prediction sets would be observed if many factors contributed to overftting [25]. Terefore, the selection of a strategic number of factors is more conducive to the establishment of an optimum model. All the WPCS was sorted randomly into N counterparts. Each part had similar numbers and accounted for approximately 5% of the total samples. Subsequently, one out of N was removed as the prediction set, and the remaining samples were used as the calibration set (for more details on the PLSR models, please refer to Xing et al. [47]). RMSE and R 2 were used as parameters to select the optimal calibration model, which was  then applied to the prediction set. Te smaller the RMSE and the bigger the R 2 , the greater the prediction performance of the model [48].
A summary of the optional factor number of diferent nutrients in WPCS, in conjunction with the calibration and prediction results, is shown in Table 5 and Figure S1. Te PLSR model developed showed excellent prediction performance for NDF, ADF, and starch of WPCS samples, with R 2 c of 0.910, 0.921, and 0.933 and R 2 p of 0.904, 0.916, and 0.929, respectively. Our results were partially similar to those of Werbos et al. [49], who constructed optimal prediction models for NDF and ADF. Te reason for this similar phenomenon may be explained by the high contents of NDF and ADF. Te existence of hydrogen-containing groups in them produced pronounced absorption peaks in the near-infrared region. ANKOM 2000i (ANKOM Technology, USA) was used for the measurement of NDF and ADF, and six parallel replicates ensured the accuracy of the analysis. However, He et al. [24] reported that the predictive performance of NDF and ADF contents was lower than that of other nutritional items. Tis may be related to the source and number of samples, in conjunction with the ATR-FTIR sensitivity as well as the chemical determination accuracy [50].
A strong performance for predicting DM and CP was displayed with R 2 c values of 0.836 and 0.903 and R 2 p values of 0.823 and 0.900, respectively. Ten, EE and ash were tested according to values of R 2 c of 0.788 and 0.795, and R 2 p of 0.763 and 0.799, respectively. Anyway, the value of R 2 obtained in the current study is usable for screening and most applications according to Williams [51]. However, neither Ca nor P could be forecasted based on the available data because of the low values of R 2 . A likely explanation for this is the lack of ATR-FTIR absorption features for minerals which may be related to water absorption bands [22]. It means that the potential limitations and drawbacks of ATR-FTIR technology, such as its inability to accurately assess certain nutrients, are not adequately addressed, and it is worth searching further.

Establishment and Validation of the BP-ANN Model.
Te BP algorithm was initially proposed by Werbos [52], and its application for the training of ANN was popularized by Niu et al. [53]. Working as neurons in the brain, the BP-ANN model is a powerful intelligent chemometric method for data processing [28]. Te working principle of BP-ANN was introduced by Pérez−Marín et al. [29]. In this study, 974 WPCS samples were classifed into calibration and prediction sets according to a ratio of 9:1. A total of 877 calibrations and 97 prediction set samples were obtained. Before BP training, some parameters were set as follows: 20 principal components were used as input layers because they explained more than 99% and close to 100% of the population variability. Te transfer function of the hidden layer was transient, and the node number of the hidden layer was 6. Te transfer function of the output layer used purelin, and the note number of the output layer was 1. Te algorithm of LM (Levenberg-Marquardt) and ADAPT gradient descent momentum learning function were employed for model training; the training speed was 0.001.
Te measured and predicted values of the nutrient content in the WPCS are shown in Figure S2. Table 6 shows the evaluation parameters of the BP-ANN model. Tese results indicate that the BP-ANN model exhibited excellent prediction performance for CP (R 2 c � 0.945; R 2 p � 0.927), NDF (R 2 c � 0.965; R 2 p � 0.935), ADF (R 2 c � 0.991; R 2 p � 0.975), and starch (R 2 c � 0.972; R 2 p � 0.944). Te indicators of DM (R 2 c � 0.900; R 2 p � 0.845), EE (R 2 c � 0.886; R 2 p � 0.853), and ash (R 2 c � 0.902; R 2 p � 0.847) were also well predicted. However, poor prediction performance was observed for Ca (R 2 c � 0.730; R 2 p � 0.509) and P (R 2 c � 0.615; R 2 p � 0.453). Te acquisition of successful prediction models, especially NDF, ADF, and starch, may be a result of large samples obtained from fve Chinese regions that expressed an extensive geographical span. ATR-FTIR is a typical indirect analytical technique, and its veracity is strongly associated with the precision and accuracy of conventional chemical measurements. In addition, we need to continuously enlarge samples and upload data in the system to guarantee predictive accuracy.

Performance Evaluation of the PLSR and BP-ANN
Multivariate Calibration Methods. Te evaluation parameters for the comparison of the PLSR and BP-ANN models are shown in Table 7. Te BP-ANN model exhibited more efective prediction performance for the nutrient content of  WPCS than the PLSR model because of the higher R 2 c and R 2 p in conjunction with lower RMSEC and RMESP values. Tese were strongly infuenced by the fexibility of the BP-ANN method. BP-ANN could determine the linear and nonlinear relationships between the ATR-FTIR spectrum data and the corresponding physicochemical attributes [28]. Te use of BP-ANN reduced the training time and provided higher computational efciency than the PLSR method.

Conclusions
In conclusion, the nutrient composition and rumen degradation of WPCS grown in diferent regions showed substantial discrepancies. Based on the representative data, ATR-FTIR technology is utilized and considered as an effcient and simple tool for predicting nutritional components of WPCS, which not only quickly optimizes feed formulation but also improves the productivity of the dairy industry. Furthermore, the application of the BP-ANN algorithm could contribute to marked improvements in the models developed and fnally can supply a more rapid and reliable model because of its self-learning, self-organizing, strong fault-tolerating, and adapting high nonlinear computing abilities. Finally, extensive samples of WPCS were collected from diferent regions and dairy farms to improve the robustness and universality of the present study, which also enhanced the practical applicability of the models we explored.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.  Figure S1: distribution of predicted and measured nutrient contents of whole plant corn silage based on the PLSR model. Figure S2: distribution of predicted and measured nutrient contents of whole plant corn silage based on the BP-ANN model. Table S1: the nutrient contents and variation ranges of whole plant corn silage. Table S2: the rumen degradation and variation ranges of whole plant corn silage.