Classification of Different Blueberry Cultivars by Analysis of Physical Factors, Chemical and Nutritional Ingredients, and Antioxidant Capacities

Blueberry fruits of diﬀerent cultivars are featured with diﬀerent quality indices. In this work, three types of quality factors, including 6 physical parameters, 12 chemical and nutritional components, and 3 antioxidant indices, were measured to compare and classify blueberry fruits from 12 diﬀerent cultivars in China. Using the autoscaled data of quality factors, unsupervised principal component analysis was performed for exploratory analysis of intercultivar diﬀerences and the inﬂuences of quality factors. A supervised classiﬁcation method, partial least squares discriminant analysis (PLSDA), was combined with the global particle swarm optimization algorithm (PSO) and two multiclass strategies, one-versus-rest (OVR) and one-versus-one (OVO), to select discriminative quality factors and develop classiﬁcation models of the 12 cultivars. As a result, OVO-PLSDA with 8 quality factors could achieve the classiﬁcation accuracy of 0.915. This study will provide new insights into the quality variations and key factors among diﬀerent blueberry cultivars.


Introduction
Blueberry, a native American cultivar, has been widely cultivated around the world [1]. It is a very popular fruit due to its pleasant flavor, high nutritional value, and healthy effects [2,3]. Fresh blueberry fruits are rich in various nutritional ingredients, such as anthocyanins, polyphenols, flavonoids, polysaccharides, vitamins, minerals, and dietary fibres, [4]. Moreover, modern scientific experiments have revealed many of its functional activities, such as antioxidant, antimicrobial, antihypertensive, anti-inflammatory, and neuroactive properties, and its ability to prevent obesity, diabetes, cancer, and other chronic diseases [5][6][7][8][9][10]. Besides being consumed as fresh fruits, blueberries are also widely used to produce natural extracts, blueberry wine, beverages, jams, preserved fruits, and food ingredients or additives. e cultivation of blueberry in China started in the mid-1980s and began to be popularized in the early 21st century. At present, the main cultivation area has amounted to about 3500 hm 2 around Northeast and North China, Jiangsu, Zhejiang, Liaodong Peninsula, and Southwest and South China [11]. According to an incomplete survey, over 100 different cultivars of blueberries have been introduced and bred in China, among which about 10-15 cultivars are important in the domestic market [12]. e physical and chemical quality factors, the nutritional ingredients, and functional activities of blueberries depend largely on the specific cultivars and cultivation conditions. Many sensory indices (such as hardness, brittleness, and chewiness) of fruits are closely related to texture and physical factors, which directly affect their storage and transportation characteristics.
ey have become important indices for testing fruit quality and the primary factors for evaluating acceptability of freshly consumed fruits [13]. e levels of chemical indices and nutritional ingredients also play an important role in the flavor and nutritional value of blueberries, which have been intensively studied and compared among different cultivars and producing areas [14]. Among its various functional activities, the antioxidant capacity and substances, such as polyphenols (especially anthocyanins) and flavonoids, are also associated with many other functional activities and have been compared with blueberries from different cultivars, geographical origins, and with different postharvest processing methods [15]. Statistical and chemometrics have been widely used to reveal the contributions of multiple variables in complex chemical systems [16][17][18][19][20][21]. At present, the studies on blueberry quality mainly focus on one or several indicators, but few studies have been performed yet on comprehensive evaluations of physical, chemical, and nutritional quality factors among different cultivars of blueberries [22][23][24]. e objective of this work was to study the quality variations among some major blueberry cultivars by a fusion analysis of some physical and chemical factors, nutritional ingredients, and antioxidant abilities. In order to reveal the key factors among different blueberry cultivar, besides the unsupervised principal component analysis (PCA), the supervised partial least squares discriminant analysis (PLSDA) was also used to develop multiclass classification models using feature sets selected by the global particle swarm optimization (PSO) algorithm [25][26][27][28].

Blueberry Samples.
Mature blueberry fruit samples (N � 366) of 12 different cultivars were provided by several local blueberry orchards in Huaining, Anhui province. e blueberry fruit was harvested in 3 days after attaining the maximum blue color. After harvesting, the blueberries were packed in fresh-keeping boxes and placed in 4°C incubators, which were transported to the lab on the second day. Intact fruits with uniform size and color were selected, and their quality indexes were measured. Fresh fruits were washed and dried, packed and sealed, and frozen at −18°C to be used for determination of physicochemical indexes.
e detailed information about sample size, cultivar, and sources are listed in Table 1.

Quality Analysis of Blueberry Fruit.
In this work, a set of 21 quality factors, including 6 physical factors, 12 chemical and nutritional components, and 3 antioxidant indices, were determined for the collected blueberry fruit. e 21 quality factors are listed in Table 2.

Measurements of Physical Factors.
For each sample, 10 blueberry fruits were randomly selected, and L * and hardness were measured at 3 sites along the equatorial line. e measurement of lightness value (L * ) was performed using a CR-400 Chroma Meter (Minolta, Osaka, Japan). e hardness value was determined using the method by Hu et al. with a TA.XTplus texture analyser (Stable Micro Systems, England) [29]. e average single fruit weight, shape index (the ratio of maximum height to width), and specific gravity were also measured on 10 randomly selected fruits. To measure the juice yield, fruits (20 g) were beaten and centrifuged at 6000 r/min for 15 min to obtain the upper juice.

Determination of Chemical Factors and Nutritional
Ingredients.
e total soluble solid (in°Brix) was analyzed using a PAL-1 Digital Hand-Held Pocket Refractometer (Atago, Japan). e pH value was measured on blueberry pulp using a PB-10 pH meter (Sartorius, Germany). e titratable acidity (TA) value was analyzed using an indicator method for acidbase titration [30]. e level of vitamin C (ascorbic acid) was determined by the 2,4-dinitrophenylhydrazine colorimetric method with a U-3900 ultraviolet-visible spectrophotometer (Hitachi, Tokyo, Japan) [31]. e contents of total phenols (TP) were analyzed using the Folin-Ciocalteu method [32]. e standard curve was made using pyrogallic acid (in ethanol), and TP content was computed as gallic acid equivalents per 100 g fresh weight (mg GAE/100 g FW). e analysis of total flavonoids (TF) was performed using the AlCl 3 colorimetric method [33]. TF content was expressed as catechin equivalents per 100 g fresh weight (mg CE/100 g FW). e total soluble sugar was determined using the modified anthrone-sulfuric acid method [34]. e level of reducing sugar was analyzed using the direct titration method of copper tartrate solution [35]. e moisture was determined by the direct drying method [36]. e ash content was analyzed by burning and weighing [37]. Protein determination was performed using the classical Kjeldahl method [38]. e contents of anthocyanins were determined by the differential pH method [39].

Antioxidant Analysis.
ree different antioxidant indices were measured to compare the antioxidant capacities of blueberries. e scavenging capacity of 2, 2-diphenyl-1picrylhydrazyl (DPPH) and hydroxyl radicals and the ferric reducing antioxidant power (FRAP) values were determined following the procedures described for blueberry samples [5,40,41].

Chemometrics Data Analysis.
Considering the scale variations in different quality factors, each factor was autoscaled, namely, the values were made to have a zero mean and a standard deviation of 1. For exploratory analysis of the data, unsupervised principal component analysis (PCA) was performed to show the class distributions of blueberry [42]. e DUPLEX algorithm was performed to divide the data of each class into training and test objects, which were combined to generate the final training and test sets [43].
In order to probe and reveal the key quality factors reflecting the cultivar variations of blueberry, the global particle swarm optimization (PSO) algorithm was used to select the most discriminative feature sets [28]. PSO can imitate the social behavior of bird flocking where a population of particles or candidate solutions are improved iteratively to approach the best solution by combining random search and the best known solutions. PSO can be started with a population of random feasible solutions. In this work, 100 initial feasible solutions were randomly generated as strings of 0 s and 1 s, where 0 and 1 represent the absence and presence of a quality factor, respectively. Discriminative feature sets were selected to obtain the lowest overall classification error rate of Monte Carlo cross validation (OCERMCCV) defined as where B is the number of random data splitting by MCCV; M i and N i are the number of misclassified objects and test objects for the ith data splitting, respectively [50].

2.7.
Software. All the data processing and chemometric algorithms were performed on MATLAB 7.0.1 (MathWorks, Sherborn, MA, USA). e DUPLEX algorithm was performed using the codes included in the TOMCAT toolbox [51]. All the other data analysis algorithms, including PCA, OVO, OVR, PLSDA, and PSO, were performed using selfcompiled MATLAB codes.

Results and Discussion
e ranges and standard deviations (SD) of the 21 quality factors of 12 blueberries are summarized in Table 3 where the raw data of quality factors have different scales. To illustrate the distribution of different classes, principal component analysis (PCA) was performed on the autoscaled data ( Figure 1). e first two principal components (PCs) account for 87.59% of the total data variances. Projection of the 12 classes onto the first 2 PCs showed the variations among different cultivars of blueberries. e loadings of the first PC ( Figure 1) indicated that the levels of 1 physical parameter (hardness) and 5 chemical and nutritional components (vitamin C, total phenols, total flavonoids, proteins, and anthocyanins) contribute significantly to the class separation achieved by PC1. For the second PC (Figure 1), 5 parameters had important contributions, including 2 physical parameters (average single fruit weight and shape index), 2 chemical and nutritional components (titratable acidity and anthocyanins), and an antioxidant index (scavenging capacity of DPPH radical). Obviously, the level of anthocyanins was a key quality factor to discriminate different blueberries as it plays an important role in both PC1 and PC2.
Although PCA could obtain some separation of different blueberries, supervised methods were needed to achieve more accurate classification models based on key quality parameters. erefore, multiclass classification models were developed using OVR-PLSDA and OVO-PLSDA models with subsets of key quality parameters selected by the PSO algorithm. To obtain representative training and test data sets, the DUPLEX      Journal of Food Quality algorithm was performed on each of the 12 classes of blueberries to divide the measured data into training and test objects (Table 1). e training and test objects from each class were combined to generate the final training and test sets, including 236 and 130 objects, respectively. For both OVR-PLSDA and OVO-PLSDA, the number of significant latent variables (LVs) of each binary PLSDA submodel was determined using MCCV to obtain the lowest OCERMCCV. With different sizes (3-15) of parameter subsets, PSO was performed to search for the optimal subsets by minimizing the OCERMCCV. In this study, the number of data splitting of MCCV was 100. For each data splitting, 80% of the training objects was used for model development, and 20% of the training objects was used for validation. For PSO, the algorithm was stopped when the value of objective function (OCERMCCV) could not be reduced by 0.1% in the next cycle. e maximum total number of PSO cycles was set to be 100, which to our knowledge was sufficient to solve the small-scale (21 variables to be selected) optimization problem in this work. To examine the optimization performances of PSO, a 100-cycle PSO was performed to search for the best subsets for the 8variable OVO-PLSDA and 10-variable OVR-PLSDA, and the lowest OCERMCCV for each cycle is shown in Figure 2.
ough there were slight fluctuations of OCERMCCV, PSO could significantly reduce OCERMCCV with sufficient cycles, and the searching results were stable for both of the two methods.
e classification results of OVR-PLSDA and OVO-PLSDA with selected subsets of quality factors are summarized in Table 4. In terms of OCERMCCV, for OVR-  Journal of Food Quality PLSDA, the 3 best subsets included 10, 12, and 9 quality parameters, respectively, and for OVO-PLSDA, the 3 best subsets had 8, 9, and 8 quality parameters, respectively. Generally, OVO-PLSDA could obtain better training and prediction accuracy than OVR-PLSDA. is could be attributed to the large submodel complexity and uneven class sizes of OVR. Both OVR-PLSDA and OVO-PLSDA required at least 8 or 9 features to obtain best classification accuracy of the 12 blueberries, indicating that the variations among different blueberries were multivariate.
To demonstrate the variations in key quality factors, the pooled coefficient of variation (CV) for each variable was computed and compared. e best classification accuracy of 0.915 was obtained by OVO-PLSDA using 8 selected quality parameters, including hardness (CV, 0.2593), juice yield (CV, 0.1516), titratable acidity (CV, 0.2692), vitamin    Figure 3. shows that the correlation of titratable acidity and vitamin C was 0.64 and anthocyanins and DPPH was 0.56. e low correlations among the key quality factors also imply that the discrimination of different blueberries requires multivariate quality factors. Key quality factors could be identified from their high frequency of being included in the selected sets in Table 1, including hardness (4), total flavonoids (6), vitamin C (5), total phenols (5), anthocyanins (6), and DPPH (5), indicating that all of the 3 types of quality factors are useful to characterize and discriminate different blueberries. e results demonstrated that the quality of blueberry is of multivariate nature, and the analysis requires the aiding of chemometrics.

Conclusions
In this study, the quality factors of blueberry fruits from 12 different cultivars in China were compared by analysis of 6 physical parameters, 12 chemical and nutritional components, and 3 antioxidant indices. Unsupervised PCA and supervised PLSDA were used to reveal the variations among different blueberries and to identify the key quality factors aided by the PSO algorithm. e results indicated that high classification accuracy (0.915) of the 12 blueberries could be obtained by using 8 quality factors, and all of the 3 types of quality factors are useful to characterize and discriminate different blueberries. Key quality factors were identified from their high frequency of being selected by PSO, including hardness, total flavonoids, vitamin C, total phenols, anthocyanins, and DPPH.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.