Application of Common Components Analysis to Mid-Infrared Spectra for the Authentication of Lebanese Honey

,


Introduction
Honey is considered as a major product in the food industry due to its valuable nutritional and medicinal benefts.Honey is highly valued for its potential benefcial activities to the human body as an anti-infammatory, antidiabetes, and antibacterial [1].
Te honey trade is a signifcant global industry that provides various economic and nutritional benefts, but it is also facing some challenges that need to be addressed to ensure the quality and authenticity of the product.Besides, food safety standards are very important in honey trade.Tere are various laws and regulations, such as the Codex Alimentarius standard (CODEX STAN 12-1981, Rev. 1-200) and the European Commission directive (2001/110/EC), that establish guidelines for honey quality.
Te production of honey and its quality highly depend on the foral and nectar sources foraged by honeybees.Tus, honey is divided into two categories: nectar and honeydew.Nectar, which is a sugar-based liquid, originates from the lymph of plants and is secreted by special glands called "nectaries," whereas honeydew is a secretion produced by parasitic insects that suck on the plant lymph [2].
Te main constituents of nectar which are carbohydrates especially sucrose and glucose, in addition to water, are considered unchanging for the same species of plants and have a direct efect on honey composition [3].Glucose and fructose dominate the composition of honey, and other constituents are present such as enzymes, amino acids, organic acids, carotenoids, vitamins, minerals, and aromatic substances [4].
According to the Codex Alimentarius standard, "honey should not have any ingredients added; no particular constituent can be removed from it; it does not have any objectionable matter, favor, aroma, or taint absorbed from foreign matter during processing and storage; and it should not be heated or processed to such an extent that its essential composition is changed and/or its quality impaired" [5].
Honey produced by bees can be either uniforal or multiforal (polyforal).Te latter contains remarkable quantities of nectar and/or honeydew provided from different plant species.In contrast, uniforal honey is composed of nectar or honeydew provided from single plant species.Actually, honey produced from free fying bees is very rarely pure uniforal [6].In addition, honey composition, favor, and aroma vary signifcantly depending on its botanical and geographical sources as well as the fowering and harvesting seasons, production methods, and storage conditions [7] which increase its complexity.
Nowadays, the major concern is to ensure honey authenticity, as the honey quality and economical value are infuenced by its authentication.Te authentication of honey involves two main aspects: honey production and honey origin.Te frst aspect is related to the methods used in honey processing such as addition of sugar syrup, fltration, heating, and moisture content.As for the honey origin, it has a signifcant impact on its market price and overall quality, and it includes both geographical and botanical origins.Monitoring and controlling the authenticity of honey origin is of great importance, and this is achievable through the combination of reliable analytical methods and advanced chemometric tools [8].Over the years, many studies have been investigating and developing several new analytical techniques for this purpose.Te authentication and characterization of honeys, based on their botanical and geographical origins, were usually performed through classical complementary methods: sensory, melissopalynological, and physicochemical analyses [9].However, these methods are laborious, time consuming, and require specialized skills.
Te potential of mid-infrared spectroscopy coupled to diferent chemometric techniques has been evaluated for the authentication of foods [10].Combining these methods was found to be a promising approach for the authentication of uniforal honey [11,12].For example, David et al. used MIR-ATR associated to PLS-DA to classify Romanian honey based on botanical origins and harvesting years [13], while Kasprzyk et al. employed FTIR-ATR followed by discriminant analysis to authenticate uniforal rape honey [11].However, the characterization and authentication of polyforal honey remains quite challenging.
In Lebanon, beekeeping is a main activity in the agricultural sector.It is one of the oldest professions in the country and has an important economic impact [14].Because of the small surface of the country, ofering within its 10,452 km 2 , many geographical and topographical morphologies with diferent altitudes, ranging from 0 to 3000 meters, and 4 distinct seasons, honey production in Lebanon is essentially multiforal [15].
According to Lebanon's 5th national report to the convention on biological diversity, Lebanon has around 9,116 known species out of which 4,486 fauna species and 4,630 fora species distributed over fve geographical regions: (1) the Coastal Zone, (2) the Mount Lebanon, (3) the Bekaa Plain, (4) the Anti-Lebanon, and (5) the South Lebanon [16].
Beekeepers in Lebanon tend to move their hives during their beekeeping practices to follow the blossoming fowers.Tis seasonal migration is also known as transhumance.In Lebanon, vertical transhumance is the base of professional beekeeping.It consists in moving hives from higher to lower altitudes according to the seasonal changes, the temperature variations, and the blooming of fowers [14].Tis frequent practice makes the authentication of honeys according to their geographical and botanical origin even more difcult and challenging.
Consequently, the aim of this work is to study the effciency of common components analysis (CCA) which is a novel exploratory chemometric tool when applied to midinfrared spectra, in determining certain aspects of multiforal Lebanese honey authenticity.

Materials and Methods
A total of 96 Lebanese honey samples were collected directly from the beekeepers just after the extraction date and stored at 4 °C until analysis.Tey originate from all Lebanese regions at altitudes ranging from 0 to 1950 meters and with diferent botanical origins.Te distribution of the samples according to geographical and botanical origins is summarized in Table 1.

MIR Spectroscopy.
Mid-infrared (MIR) spectra were recorded using Termo Scientifc Nicolet iS5 spectrometer equipped with an Attenuated Total Refectance ATR-ZnSe accessory.MIR measurements were done using a Globar IR source and a DTGS-KBr detector.Te honey samples were analyzed in triplicate without any prior preparation process with 64 scans per spectrum at a spectral resolution of 4 cm −1 in the wavenumber range from 500 to 5000 cm −1 .
Te MIR spectra were collected in a matrix of size (288, 2335), with 288 triplicate analysis of 96 samples, and 2335 variables corresponding to the MIR wavenumbers (cm −1 ).

Data Preprocessing.
Data preprocessing methods are chosen based on the data analysis methods.All preprocessing methods aim to reduce random noise and systematic variations in the resulting data to improve their analysis.For this reason, preprocessing of data is an important and relevant step, especially when reasonable results are to be obtained [18].

2
Journal of Spectroscopy First, on the acquired MIR spectral data, the wavenumber region 1830-5000 cm −1 related to water absorption bands was removed for all samples.Ten, four sets of pretreatments were carried out to choose which pretreatment works best.Tese were standard normal variates (SNVs) with column centering, probabilistic quotient normalization (PQN) with column centering, frst derivative calculation, and Detrend with SNV.Afterwards, CCA was applied to the pretreated datasets, and the results allowed a decision regarding the most suitable pretreatment, based on the clearest group separation pattern obtained.Te pretreatment thus chosen was SNV with column centering applied on truncated spectral data.In SNV, each individual spectrum is normalized to zero mean and unit variance [19,20] so as to eliminate baseline shifts and uncontrolled variations in global signal intensity.Mid-infrared data were then centered by column (Figure 1).

Common Components Analysis (CCA). Te pretreated matrix was analyzed by common components analysis (CCA) as a novel exploratory tool.
Common components analysis is a modifcation of the common components and specifc weights analysis procedure (CCSWA or "ComDim") [21].
Te original ComDim algorithm aims to describe m data tables observed for the same n samples but possibly diferent numbers of variables.Extracted components correspond to the maximum amount of variance that is common to the largest number of tables.Te method consists in determining a common space for all m data tables, with each matrix having a specifc contribution ("Salience") to the defnition of each dimension of this common space.Te procedure calculates for each successive common component a common scores vector (coordinates of the n samples along the direction defned by that common component) [22].
Te idea of the ComDim algorithm is to calculate a weighted sum W G of the samples' variance-covariance matrices, W i � X i X T i , using an initial weighting or salience (λ i ), of 1 for all tables.Te vector of scores of the frst normed principal component is extracted from W G as an initial estimate of the frst "common component" (CC).Te salience of each block W i is then recalculated from these scores.Te estimation of the global scores and saliences is optimized by iterative recalculation until convergence.After computation of the frst CC, each original matrix X i is defated, and the procedure is repeated for the calculation of the second CC, and so forth [21].
Common components analysis has been proposed as a modifcation of ComDim where each table contains a single variable.Te CCs calculated by CCA are based on linear combinations of the individual variables with strong weightings for a given dispersion of the observations, hence grouping together strongly correlated variables [22].
Te advantages of using CCA in exploratory analysis lie in its ability to provide a holistic view of multivariate patterns and its robustness to noise in the data.While some exploratory methods focus on individual datasets or specifc aspects of the data, CCA brings a unique perspective by emphasizing the commonality between sets of variables.
Both preprocessing and data treatment were performed using MATLAB R2017a (simulink).

Results and Discussion
3.1.Interpretation of the MIR Spectra.An example of a midinfrared spectrum of a honey sample is shown in Figure 2, and the bands attribution in the diferent zones of the spectrum is summarized in Table 2.
Earlier   which correspond to the C-O stretch in the C-OH group as well as the C-C stretch.A small peak at 1110 cm −1 is linked to the stretching of the C-O band of the C-O-C linkage.A peak at 1321 cm −1 is due to O-H bending of the C-OH group, and the peak at 1411 cm −1 is due to a combination of O-H bending of the C-OH group and C-H bending of alkenes [26].A more detailed assignment was provided earlier by Wang et al., in which characteristic absorption bands identifed for glucose are at 991, 1031, 1080, 1107, 1151, and 1367 cm −1 with a key peak at 1032 cm −1 , the ones for fructose are at 966, 1063, 1083, 1155, 1254, 1346, 1416, and 1456 cm −1 with a key peak at 1053 cm −1 and those for sucrose are 995, 1055, 1113, and 1138 cm −1 with key peaks of 994 and 1049 cm −1 .Te characteristic absorption bands for carbohydrates in the region 904-1153 cm −1 were attributed to the stretching modes of C-O and C-C bonds whereas those from the region 1199-1474 cm −1 were due to the bending modes of the groups O-C-H, C-C-H, and C-O-H [27].

Interpretation of the CCA Results.
Mostly, multiforal honey is produced in Lebanon as each honey sample contains contributions from several botanical sources and plant species.For this reason, two approaches were suggested to characterize honey samples depending on their botanical origin: the frst classifcation is based on the foral sources that the bees fed on, based on the bees' nutritional sheet provided with each honey sample.Te second classifcation is based on the Lebanese geographical regions from where the samples were collected (Table 1).
Te CCA results are presented in terms of scores and loadings plots.Te scores plot represents the distribution of the samples and their repetitions in the space of the common components, while the loadings plot shows the contribution of the initial variables on this distribution.
Te CCA scores plot CC1 vs CC2 presented with the grouping based on the botanical origin of honey (Figure 3) show a separation of the honey samples groups mainly along CC2, where the Honeydew and Citrus blossom honey samples are located on the positive side of the CC2 axis, while multiple fowers and Mountain honey samples are positioned on the negative side of the axis.
Te CCA loadings plot of CC2 (Figure 4) shows the contribution of the variables related to MIR wavenumbers characterizing fructose.Tis concerns the negative peaks at 1051 cm −1 and 962.3 cm −1 on the CCA loadings plot of CC2 (second common components), which highly infuence the position of the honey samples on the negative side of CC2 axis.Tis means that multiple fowers and Mountain honey samples are richer in fructose than Honeydew and Citrus blossom honey samples.Table 2: Bands attribution in the honey MIR spectrum [23][24][25].

MIR bands MIR bands attribution
Band A (3000-3700 cm When comparing the state of dispersion of the Honeydew honey samples along the frst component CC1, it is worth mentioning that most of these samples are concentrated on the negative side of CC1.Te loadings plot of CC1 (Figure 5) shows an intensive negative peak at 997 cm −1 .Tis wavenumber is attributed to sucrose.Hence, most of Honeydew honey samples analyzed with mid-infrared spectroscopy have high sucrose levels, noting that the predominant type of Honeydew honey collected from Lebanese regions is oak honey.
Tis result comes in agreement with the study of Poyrazoglu et al. reporting that pine and oak honeys (Honeydew honey samples) contained lower concentrations of fructose and higher concentrations of sucrose compared with all the foral honeys that showed a relatively higher concentration in fructose [28].
Tis result aligns also with the one generated by Gok et al., which stated that there is a diference between honey samples originating from trees (honeydew), and those from fowers (foral).Te diference was also due to the carbohydrate content [29].
In fact, the same sugars are present in the diferent honey types, but in diferent percentages and proportions, which are mainly related to the fora [30].Te main sugars found in honey are fructose and glucose, which are products of the sucrose reduction done by enzymes deposited in the honey by the bees.According to Codex Alimentarius, as well as the European Commission, Honeydew honey should contain a minimum of 45 g/100 g of the sum of fructose and glucose, while foral honey should contain a minimum of 60 g/100 g of the sum of fructose and glucose.Many studies, namely the study of Bergamo et al., showed that Honeydew honey contains a higher concentration of di-and trisaccharides, as well as lower mean contents of glucose and fructose than nectar honeys [31].
Furthermore, the diference between the Lebanese geographical regions from where the honey samples were harvested is also discussed.
CCA scores plot of CC1 vs CC2 presented with the grouping based on Lebanese geographical regions (Figure 6) show that honey samples collected from the coastal zone, Mount Lebanon, and from the South of Lebanon do not represent an in-between group diference.Te only separation seen is relative to the samples collected from the Bekaa plain, which are located on the negative side of CC2 axis.
Te intensive negative peaks (at 1051 cm −1 , 975.8 cm −1 , and 962.3 cm −1 ) observed on the CC2 loadings plot (Figure 4) correspond to the MIR variables that contribute signifcantly on the separation of the honey groups along the CC2 axis and highly infuence the position of the honey samples collected from the Bekaa plain on the negative side of CC2.Tis means that honey samples collected from the Bekaa plain are richer in fructose than the other honey samples collected from the other regions.
According to the "second report on the state of plant genetic resources for food and agriculture" in Lebanon, the Bekaa plain is known for its cultivated feld crops that include cereals, potatoes, sugar beets, onions, forages, tobacco,  Journal of Spectroscopy and food legumes; these crops can be the source for nectar, which explains the high concentration in fructose found in honey samples collected from the Bekaa plain [32].
In another hand, if we look closely to the distribution of the honey samples collected from the coastal zone on the CCA scores plot of CC1 vs CC2 (Figure 6), we can see clearly the separation of two distinct groups along the CC1 axis: one group located on the negative side and corresponding to the Honeydew honey and another group located on the positive side of the axis and corresponding to the multiple fowers honey.When analyzing the loadings plot of CC1 (Figure 5), we fnd again that the variable that contributes signifcantly to this separation is the one that correspond to the intensive negative peak at 997 cm −1 characterizing sucrose.Tis variable describes the samples located on the negative side along the CC1 axis, which represent most of the samples collected from the coastal region and correspond to the Honeydew honey.
Again, according to the "second report on the state of plant genetic resources for food and agriculture" [32], in the coastal zone of Lebanon, pine trees, carob trees, storax, oak trees, and willows are mainly found and thus, could be the main source of Honeydew, providing most of the honey samples collected from the coastal region with a higher concentration in sucrose, located on the negative side of the CC1 axis.
Tis diferentiation observed in the coastal region is not evident for the South Lebanon and Mount Lebanon regions.Tis is due to other factors related to altitude and climatic diversity inside each of these Lebanese zones.
At the end, in the case of Lebanese honey, it is very challenging to discriminate between all botanical origins and geographical zones.Tis fact is due to the seasonal migration activity also called transhumance performed by the beekeepers all year long.Beekeepers tend to move their hives following the diferent seasons and fower bloom to increase honey production [33].According to a survey with Lebanese beekeepers [14], 87% of Lebanese beekeepers practice transhumance, with the lowest rate of migratory beekeeping found for beekeepers of the Baalbeck-Hermel region which may explain this distinction of the honey samples collected from this region.Considering the small area of Lebanon, felds with plants uniformity do not exist.Honeybees are hence ofered a diverse and variable foral environment composed of diferent plant species all year long.Terefore, all honey samples produced from diferent foral origins cannot be diferentiated, although the diference is seen between foral and Honeydew honey samples.Tis activity is the one of the relevant reasons behind the heterogeneity of Lebanese honey.

Conclusions
Tis study allowed the characterization of Lebanese multiforal honey based on their botanical and geographical origin in the aim of their authentication; a task which has so far remained very challenging.Te application of CCA as a novel exploratory tool on the data of mid-infrared spectroscopy showed a high potential in discriminating between the botanical origin of the Lebanese honey.Tis diference was explained by variables that correspond to MIR wavenumbers related to specifc vibrations of fructose and sucrose functional groups.
Moreover, the calculation of CCA on the MIR spectral data, based on the geographical origin, permitted to discriminate honey samples collected from the Bekaa plain from the honeys collected from the other Lebanese regions.Honey samples collected from the Bekaa plain showed to be richer in fructose than the other honey samples.Tis specifcity of this region is due to the low transhumance activity and to the types of crops cultivated in this region.
Tis work constitutes a step forward towards the authentication of multiforal honeys, which has been until now very difcult.Te application of CCA on MIR data has shown to be a promising and efcient method in the characterization of Lebanese multiforal honeys.
studies of Anjos et al. and Wang et al. allowed the diferentiation of glucose, fructose, and sucrose by analyzing the spectral region 750-1500 cm −1 .According to Anjos et al., sugar characterization could be determined by the peaks related to carbohydrate structure at 918 cm −1 which corresponds to the C-H bending, and at 1043 cm −1 and 1254 cm −1

Table 1 :
Te grouping of honey samples.Number of samplesGrouping based on the botanical origin of honey centering applied on MIR spectra

Figure 2 :
Figure 2: MIR spectrum of a honey sample.

− 1 )
Te stretching of -OH from carbohydrates, water, and organic acids present in the honey Band B (2700-3000 cm −1 ) Te stretching vibration of bonds C-H that principally constitute the chemical skeleton of sugars as well as stretching of O-H, and NH 3 due to carbohydrates, carboxylic acids, free amino acids, and phenolics Band C (1600-1700 cm −1 ) Te stretching vibrations of ketone functional groups C�O of fructose and aldehyde functional groups CH�O of glucose as well as the stretching band of C�C of phenolic molecules Fingerprint region D (700-1470 cm −1 ) Te stretching vibrations of C-C, and C-H bonds and the bending vibrations of C-H present in the chemical structure of carbohydrates and C-O in phenols; a combination of O-H bending of the C-OH group and C-H bending of alkenes; to favanol and phenol vibrations 4 Journal of Spectroscopy

Figure 3 :
Figure 3: CC1 vs CC2 scores plot with the grouping based on the botanical origin of honey.

Figure 6 :
Figure 6: CC1 vs CC2 scores plot with the grouping based on the Lebanese geographical regions.