Robustness of Tomato Quality Evaluation Using a Portable Vis-SWNIRS for Dry Matter and Colour

The utility of a handheld visible-short wave near infrared spectrophotometer utilising an interactance optical geometry was assessed in context of the noninvasive determination of intact tomato dry matter content, as an index of final ripe soluble solids content, and colouration, as an index of maturation to guide a decision to harvest. Partial least squares regression model robustness was demonstrated through the use of populations of different harvest dates or growing conditions for calibration and prediction. Dry matter predictions of independent populations of fruit achieved R2 ranging from 0.86 to 0.92 and bias from −0.14 to 0.03%. For a CIE a⁎ colour model, prediction R2 ranged from 0.85 to 0.96 and bias from −1.18 to −0.08. Updating the calibration model with new samples to extend range in the attribute of interest and in sample matrix is key to better prediction performance. The handheld spectrometry system is recommended for practical implementation in tomato cultivation.


Introduction
Tomato fruit quality is assessed in terms of dry matter content (DM, also known as total solids content), soluble solids content (SSC), and external colour, and less commonly in terms of firmness and titratable acidity [1,2]. There is also interest in the level of lycopene, given suggested health benefits [3].
Starch is stored in immature fruit, with conversion to reducing sugars (e.g., glucose and fructose) around 40 days after anthesis [4]. In fruit near maturity, the DM of tomato pulp consists of soluble sugars (50% w/w), insoluble solids (25%), organic acids (13%), minerals (8%), and others (4%) [5]. In the tomato processing industry, grower payments may be linked to DM content of pulp, as processing to products such as ketchup involves concentration of soluble solids content. Maximisation of fruit solids content is a matter of maximising carbohydrate import to the fruit, through use of appropriate genetic material (varieties), growing conditions, and agronomic practices [2]. For example, the assessment of fruit for DM and soluble solids content is practiced in varietal screening programs [6]. It is also well established that fruit of plants exposed to moderate salinity will accumulate a higher level of solids [7]. Gauging the appropriate time of harvest is a key agronomic decision, balancing the need to leave the fruit on vine to maximise solids content, and the need to harvest fruit before ripening proceeds to the point where loss of firmness means that the fruit will be damaged during harvest, transport, and processing.
Near infrared spectroscopy (NIRS) is a potential tool for nondestructive measurement of several fruit attributes. NIRS assessment of intact fruit (vegetable) DM content was first reported for onion [8]. Numerous reports have subsequently been produced for use of this technology with a range of fruit commodities; however many studies are compromised either in terms of matching the optically sampled volume to the volume of tissue sampled for reference analysis, or in terms of overfitting of a multivariate model, without testing on a truly independent population. Nonetheless, the technology is in commercial use, both for in-line sorting of fruit operating on conveyor systems moving at up to 1 m/sec and for assessment of fruit on tree, based on use of handheld near infrared spectrometers, for example, in noninvasive assessment of DM of mango fruit on tree [9]. Such noninvasive data is of use in decision support systems around assessment of fruit maturation or ripening (e.g., Jordan and Loeffen [10]).

2
International Journal of Analytical Chemistry The first consideration of the use of NIRS to assess intact tomato soluble solids content (SSC) was made by Slaughter et al. [11] using a laboratory bench top system (NIR System 6500) using fibre optics in an interactance geometry. Spectra were acquired at four equatorial positions and at the acropetal end of fruit. Partial least squares regression (PLSR) models based on spectra from the equatorial region were better than that from acropetal end in terms of correlation coefficient and root mean square of errors in calibration (RMSEC). Based on a 800-1000 nm spectral window and use of 12 PLS factors ( ), 2 of 0.85 and RMSEC on SSC of 0.27% were achieved, with validation statistics of 2 = 0.79, root mean square of error of prediction (RMSEP) of 0.33%, and bias of 0.05%. However, the validation set was not independent of the calibration population (of 400 spectra initially taken, 300 were used as training and 100 as prediction set).
Subsequently, a number of researchers have considered use of NIR/IR spectroscopy for assessment of quality parameters of tomato juice, puree, or intact fruit [12][13][14]. These studies have involved a range of equipment, varying in wavelength range, wavelength dispersion mechanism, and optical geometry employed (reflectance, diffusive reflectance, transmittance, or interactance), in data preprocessing (first derivative, second derivative, and multiplicative scatter correction, MSC), and in the calibration model development method (principal component regression, PCR; multiple linear regression, MLR; PLSR1; PLSR2; and modified partial linear square, MPLSR). For example, for a set of transmission spectra of intact fruit acquired with a MMS1 NIR spectrometer, Khuriyati and Matsuoka [15] reported that PLSR models of SSC were better when based on second derivative spectra ( 2 = 0.92, RMSEC = 0.36%) than smoothed log (1/ ) spectra. However, this study lacks documentation of regression coefficients (the smoothness of which allows a judgement of likely model robustness), and again the validation and calibration sets were selected from the one population (i.e., the validation set is not independent of the calibration set).
In general there has been a shift from use of reflectance to interactance or partial transmittance geometries. Indeed, Clement et al. [16] severely criticised use of reflectance spectra for SSC estimation of intact fruit, due to poor validation performance.
A number of workers have used a spectral range of 900 nm and above for PLSR calibration models of tomato juice and puree SSC [13,14,17], but Shao and He [18] attempted use for intact fruit. As the longer wavelengths have high absorbance by water, the effective depth of optical assessment is low (1-2 mm) and thus effectively only the skin will be assessed. It is expected that this model will lack robustness; that is, it will perform relatively poorly in prediction of independent populations of intact fruit that vary in skin properties.
The Herschel wavelength region (720-1100 nm, also known as the short wave NIR or SWNIR) is typically employed due to greater effective penetration through biological tissue and thus optical sampling volume than for longer wavelengths. The miniature Zeiss MMS1 Si photodiode array spectrometer operates over the wavelength range 300-1100 nm and has seen used in several studies involving tomato. For example, in common with Khuriyati and Matsuoka [15] and Walsh et al. [19], Khuriyati et al. [12] used this spectrometer in development of a dry matter PLS model on tomato DM, based on interactance spectra. A coefficient of determination ( 2 ) of 0.88 was achieved, but again the model was not tested on independent fruit populations. Most recently, Radzevicius et al. [20] reported the use of the luggable NIR Case NCS001A (Sacmi Imola S.C., Italy) for estimation of tomato DM, SSC, and firmness, but calibration statistics were not presented.
Other limitations of past studies include a low standard deviation (SD) in the attribute of interest in the training or validation set, as a low range in the population will necessarily be associated with an apparently low RMSEC and a low 2 . For example, poor SSC model performance for a data set based on interactance SWNIR spectra of intact tomato fruit was due to a population SD of only 0.3% [19].
The measurement of tomato colour using portable SWNIR instrumentation (FQA-NIRGUN, interactance optics, 623-1052 nm) was reported by Kusumiyati et al. [21] for tomato both on and off vine, with better results achieved using a PLSR than a PCR model. For the PLSR model of CIE * , calibration ( 2 = 0.92; RMSEC = 3.31; = 4) and prediction statistics of 2 = 0.92, SEP = 3.19, and RPD (ratio of bias corrected SEP to standard deviation) of 4.76 were reported. Clement et al. [16] assessed tomato colour (Hunter a) based on reflectance spectra (400-1500 nm) of intact tomato acquired using a Varian Cary 500 UV-Vis-NIR spectrophotometer (Varian Inc., Palo Alto, CA). PLSR model statistics for Hunter a were 2 = 0.981, RMSECV = 1.126, and standard deviation ratio (SDR, a ratio of RMSECV to SD) = 7.32. The spectral window of 400-1000 nm produced a better result than the 900-1500 nm window, as expected for this attribute. More recently, Camps [22] reported the use of a handheld diffuse reflectance NIR spectrometer using the 900-1700 nm region (Phazir; Analyticon Instruments, Germany) to assess CIE * and * values of tomato, reporting a calibration 2 of 0.903 and 0.846 and RMSECV of 2.26 and 3.02, respectively. It would have been interesting to see comparative results based on the visible-SWNIR region, and prediction of independent test sets.
Thus previous reports indicate the potential of SWNIRS in a reflectance geometry for assessment of intact tomato colour, and interactance or transmission geometries for DM and SSC assessment. A number of studies refer to the potential for field portable instruments, allowing application to the agronomy of the crop. However, a consistent criticism of all previous studies is the use of one population divided into a calibration and a "validation" set, with validation samples often deliberately chosen to be representative of the calibration set (i.e., same mean and SD). Such validation sets are not independent of the calibration set and the validation statistics can be expected to be optimistic in terms of the performance of the model in prediction of future predictions, as must occur in practical use. The current study is undertaken to evaluate the use of portable SWNIRS for field assessment of colour and DM of intact tomato, involving a rigorous verification study across truly independent validation sets.
International Journal of Analytical Chemistry 3

Plant Culture.
One hundred four-week old tomato (Lycopersicum esculentum L., var. Roma) seedlings were transplanted to a hydroponic system consisting of 10 L plastic containers (single plant per container), connected to an irrigation system to supply nutrient solution. The base nutrient solution consisted of a solution of electrical conductivity of 1.36 mS cm −1 and pH 6.0. Temperature of the solution varied between 25 and 31 ∘ C throughout the trial period. To induce a range in fruit dry matter and colour, several treatments were imposed.
At fruit bearing stage, plants were assigned into five treatment groups, each group with four replicates (i.e., 20 plants per treatment) in a randomized complete block design. The five treatments were (i) plants on normal hydroponic solution (base), (ii) base with added KNO 3 (198 mg L −1 ; final solution conductivity of 1.6 mS cm −1 ), (iii) base with added NaCl (2 g L −1 final solution conductivity of 5.5 mS cm −1 ), (iv) base with leaf removal (to one leaf/truss), and (v) base with fruit removal (to four fruit/truss). For the KNO 3 and NaCl treatments, salt was added to the base solution once the plant started producing flowers, and the leaf and fruit removal treatments were started when fruitlets were visible. Four weeks after flowering, the strength of the normal hydroponic solution was increased to 1.63 mS cm −1 while NaCl and KNO 3 treated solution were increased to 6.1 and 1.93 mS cm −1 , respectively.

Fruit Measurement.
Visible-SWNIR spectra of intact fruit were acquired with a "Nirvana" SWNIR spectrometer (no longer produced, but equivalent to the F750, Felix Instruments, WA, USA). This unit employs an interactance geometry, a halogen lamp, and a Zeiss MMS1 Si photodiode array with approximately 3.3 nm pixel spacing and 10 nm wavelength resolution. The interactance probe involves a receiving optical probe placed in front of a collimated light source, such that the probe casts a shadow onto the sample and the probe views the cast shadow on the sample [23]. Thus detected light has passed through part of the fruit.
Five sets of tomato fruit (termed populations 1 to 5) were harvested at intervals of 10 days from the seventh week after planting from all five growing condition treatments. On each harvest date, two plants were chosen at random from each treatment and all fruit were harvested from first to fourth trusses (resulting in between 50 and 215 fruit in each harvest, Table 1). These fruit were numbered in the field, brought into the laboratory (at 22 ∘ C), and marked at a location on the fruit equatorial region. The SWNIR spectrometer was used to collect interactance spectra, in duplicate, at each marked spot. A spot colour reading in CIE1976 * * * space was made at the same position with a Minolta colorimeter (CR-400, D65 illuminant, 2 ∘ angle of observer). Following these measurements, a one centimetre sided cube of fruit flesh from the point of scanning was collected, diced, and placed in a forced air oven at 65 ∘ C for 48 hour for dry matter estimation.
To emulate field use of the handheld spectrometer in monitoring of fruit maturation, five plants were randomly selected from each of the five treatments (total = 25) and four fruit per plant (at least one fruit from each of second to fourth truss) were tagged (total = 100 fruit). The tagged fruits were scanned with the handheld spectrometer at a position on the fruit equator. Scanning was repeated each week of the same location on each fruit, from sixth week after planting until most of the fruit were fully red ripe (11 weeks after planting).

Data Preparation and Analysis.
The absorbance spectral data acquired from handheld spectrometer were transformed to (3 nm) interpolated second derivative (Savitzky-Golay second-order, 7 point window) spectra (referred to as "second derivative" spectra). Partial least squares regression models were developed using the multivariate data analysis software (The Unscrambler V 10.2, Camo Inc., Norway). Models were developed using the NIPALS (nonlinear iterative partial least squares) algorithm and were cross validated with random subsets of four samples.
DM and CIE * models were developed using population 1 and "updated" by sequential expansion of the calibration set with populations 2 to 5. These PLSR models were tested on separate fruit populations (i.e., model based on population 1 used in prediction of population 2, and model based on populations 1 and 2 used in prediction of population 3, etc., with the model based on populations 1 to 5 used in prediction of the populations of fruit from the five growing conditions).

Spectral Features.
Fruit absorbance spectra were characterised by a peak at 675 nm ascribed to chlorophyll, a peak ascribed to lycopene at 575 nm, and peaks ascribed to O-H vibrations of water at 840 and 960 nm (Figures 1(a) and 1(b)). As fruit matured, absorbance spectra were characterised by a decrease in the positive peak at 675 nm related to chlorophyll and an increase in a peak related to lycopene at 575 nm. Similar trends in peak changes related to pigment and water features were also reported by Clement et al. [16].

Calibration Model.
Dry matter was better modelled using second derivative than absorbance spectra (data not shown), with a resulting cross validation coefficient of determination ( 2 cv ) ranging from 0.90 to 0.93 and a RMSECV < 0.5% and a ratio of RMSECV to population SD > 3.0 (Table 1, Figure 2). The intact tomato DM models were superior to that reported by Walsh et al. [19] due to the poor standard deviation (SD) of the population employed in that study (0.30, compared to more than 1.5% in the current study). The results were also superior to that reported by Khuriyati and Matsuoka [15] although that study employed a population with SD = 1.84% for DM and used short wave NIR in an interactance geometry. As expected, the result was inferior to that for tomato puree DM model from mid-infrared reflectance spectra as reported byŚcibisz et al. [14], given the influence of fruit skin on intact fruit spectra. The PLSR model -coefficient weightings were consistent across models (based on different treatments/populations), with weighting around 830, 880, and 910 nm (Figure 4), similar to that reported by Walsh et al. [19], and consistent with involvement of the  second overtone of O-H, fourth overtone C-H, and the third overtone feature of C-H, respectively [24,25]. The presence of relatively "smooth" -coefficients, consistent across models, bodes well for the robustness of the model in prediction of independent sets.
The colour (CIE * ) model based on second derivative spectra was poor for a data set involving population 1 only, due to the low attribute range in immature, green fruit (Table 1). Subsequent populations modelled well for CIE * , with 2 > 0.9, RMSECV∼3.5, and SDR > 4 (Figure 3), although for one population (Pop 2) an excessive number of factors was adopted. The colour PLSR model possessed a negative weighting at 500-600 nm and positive weighting between 600 and 720 nm ( Figure 5). As for DM models, the CIE * model -coefficients were consistent across models based on different populations.

Robustness of Calibration Model. Previous research
reports have separated a subset of a given population for use in validation of a prediction model developed on the remainder of the population. In commercial practice, an existing model must be used in prediction of an incoming population, not included in the calibration set. Samples from the new population may be used to update the model, in preparation for prediction of new sets. To emulate this practice, a model was developed and used in prediction of an independent set, and then this combined data set was used to create  (Table 2); however, a DM model based on combined populations ( Figure 2) was more effective. For example, population 5 was predicted well by a model based on populations 1 to 4, with 2 = 0.92, RMSEP = 0.5, and bias = 0.03. A model used to predict DM of fruit from a given agronomic treatment achieved a lower 2 than for the time series predictions, due to the lower SD of these validation sets. In general, prediction accuracy (RMSEP) of 0.5% was achieved, with a bias less than 0.2% (Table 2). Tomato colour is determined by the content of pigments, that is, chlorophyll and lycopene, in the fruit mesocarp. Given the thin skin of a tomato, the colour measurements made using the diffuse reflectance optics of the Minolta colorimeter will gather information from a depth of several millimetres into the fruit, while the interactance optics of the SWNIR spectrometer will collect information to a greater depth (ca. 10-20 mm). However, given the good performance of the SWNIR instrument based models created using reference data from the Minolta unit, the tissue volumes sampled by the two instruments must have similar pigmentation. CIE * models based on a single or few populations were poor in prediction performance as these populations include a narrow colour range in the calibration set ( Table 2). With addition of fruit of all colour stages in the calibration set, the prediction results were acceptable. For example, prediction statistics for a model based on populations 1 to 4 used in prediction of population 5 and of fruit subject to various agronomic treatments included 2 values from 0.82 to 0.95 and a bias of −1.18 to 0.67.
As an example of practical implementation, PLSR models on DM and CIE * (based on fruit grown under a range of growing conditions) were used in prediction of these attributes given spectra collected nondestructively each week from fruit of the various agronomic treatments ( Figure 6). This emulates how the handheld spectrometry system could be used on farm to monitor fruit maturation.

Conclusion
A handheld interactance geometry visible-SWNIR spectrophotometer suitable for field practice was used to judge maturity of tomato fruit based on pigmentation as indexed by CIE * and eating quality, as indexed by DM, both on and off vine. PLSR models based on visible wavelengths were robust in prediction of new data sets, with low bias, 2 > 0.75 and 0.82, and RMSEP < 0.84 and 6.88 on the attributes of DM (%) and CIE * , respectively. The range of DM in fruit from plants under a given growing condition was relatively limited, so use of a range of growing conditions to achieve a wide range of fruit DM is recommended for DM model development. Fruit of different maturities from a given growing condition is adequate for providing a range of colour. "Updating" a calibration model by addition of current samples to the training population to extend the range of the attribute of interest and the range of chemical and physical matrices to be found associated with tomato fruit is recommended to achieve robust practical use. This technology should have value in tomato breeding programs as a selection tool, to hydroponic producers wishing to increase eating quality by manipulation of solution conductivity, and as a quantitative tool in estimating time to harvest. Time courses of attribute development ( Figure 6) can be used to inform crop management.

Additional Points
Highlights. (i) Application of a handheld visible-NIR spectrometer using an interactance geometry to judge maturity Week after planting and quality of tomato fruit based on colour (CIE * ) and DM, both on and off vine, is demonstrated. (ii) Dry matter predictions achieved 2 ranging from 0.86 to 0.92 and bias from −0.14 to 0.03%. For CIE * , prediction 2 ranged from 0.85 to 0.96 and bias from −1.18 to −0.08.