A Statistical Estimation Approach for Quantitative Concentrations of Compounds Lacking Authentic Standards/Surrogates Based on Linear Correlations between Directly Measured Detector Responses and Carbon Number of Different Functional Groups

A statistical approach was investigated to estimate the concentration of compounds lacking authentic standards/surrogates (CLASS). As a means to assess the reliability of this approach, the response factor (RF) of CLASS is derived by predictive equations based on a linear regression (LR) analysis between the actual RF (by external calibration) of 18 reference volatile organic compounds (VOCs) consisting of six original functional groups and their physicochemical parameters ((1) carbon number (CN), (2) molecular weight (MW), and (3) boiling point (BP)). If the experimental bias is estimated in terms of percent difference (PD) between the actual and projected RF, the least bias for 18 VOCs is found from CN (17.9 ± 19.0%). In contrast, the PD values against MW and BP are 40.6% and 81.5%, respectively. Predictive equations were hence derived via an LR analysis between the actual RF and CN for 29 groups: (1) one group consisting of all 18 reference VOCs, (2) three out of six original functional groups, and (3) 25 groups formed randomly from the six functional groups. The applicability of this method was tested by fitting these 29 equations into each of the six original functional groups. According to this approach, the mean PD for 18 compounds dropped as low as 5.60 ± 5.63%. This approach can thus be used as a practical tool to assess the quantitative data for CLASS.


Introduction
Gas chromatographs (GCs) equipped with either a flame ionization detector (FID) or mass spectrometer (MS) as a detector have been most commonly used for the analysis of diverse organic compounds in environmental media [1,2]. If the MS uses 70 eV EI ionization, the RF (per unit weight) should be generally constant within each compound class (e.g., linear increase of the response with MW) [3]. The quantitative analysis of the reference compounds can be made in the form of the external calibration using standards containing each reference compound. If the number of target compounds contained in standard is limited, it is difficult to derive quantitative data for some important compounds without proper standards material (i.e., standards for many chemicals cannot be available commercially) or to the complexity involved in standard preparation. The number of possible compounds and their isomers rapidly increases with increasing MW, and consequently, synthesis of all proper standards is unrealistic except possibly for the lowest MW analytes. Thus, the development of predictive algorithms for the compounds lacking authentic standards/surrogates (CLASS) is one realistic option to assign semiquantitative values for such chemical species.
Ahn et al. [4] investigated a method for predicting the concentration of various CLASS using liquid phase standards of 54 VOCs based on two contrasting approaches: (1) direct injection (DI) and (2) headspace solid-phase microextraction (HS-SPME). Because the DI approach does not involve any necessary pretreatment procedures for environmental 2 The Scientific World Journal samples, its practicality can be significantly restricted. On the other hand, as the HS-SPME approach is subject to relatively low recovery, it is difficult to quantify trace level samples with reasonable reliability [5]. In addition, as 54 reference VOCs of Ahn et al. [4] basically represent the water quality index, the applicability of such an approach requires testing against chemicals with different characteristics.
In order to extend our efforts to estimate CLASS quantitatively, we explored the reliability of our early model introduced by Ahn et al. [4] by fitting it against several representative odorous VOC classes with the modification of the grouping approaches. Moreover, to widen the applicability of this statistical approach to the trace-level analysis of environmental samples, we conducted a series of validation tests to assure the experimental reliability of our approach based on the calibration experiments with combination of the sorbent tube (ST) and thermal desorption (TD) methods [6,7]. Through a comparative analysis between the measured and predicted RF values of the selected reference compounds, the reliability of this modified estimation approach has been examined in a number of respects. After all, we present a rough but systematic solution for quantification of CLASS in environmental media based on the predictive equations developed in this preliminary study.

Experimental Approaches.
To generate predictive equations for the projected RF value of CLASS, a linear plot was drawn by considering the relationship between the directly measured (actual) RF values of reference odorous VOCs (mostly with low odor threshold) and their physicochemical parameters (e.g., carbon number (CN)). Then, the projected RF values for each compound were evaluated for reliability by direct comparison against the actual RF values. Table 1 shows the conceptual schematic of the experimental approaches used in this study in reference to the previous study of Ahn el al. [4]. As the experimental approach used in this study has been considerably modified from that of the study of Ahn et al. [4], we reevaluated the feasibility of that study as the major reference during the course of this study. To facilitate this reevaluation, the two data sets obtained by Ahn et al. [4] ((1) direct injection of liquid standard (DILS) and (2) solid-phase microextraction (SPME)) were referred to as "Exp-DI" and "Exp-SPME, " respectively. In contrast, as the analytical method used in the present study was based on the thermal desorption (TD) method, it was named "Exp-TD" for comparison with the two previous approaches.

Selection and Preparation of Working Standards.
A total of 19 VOCs were initially selected as the reference analytes for this study: (1) five aldehydes: acetaldehyde (AA), propionaldehyde (PA), butyraldehyde (BA), isovaleraldehyde (IA), and n-valeraldehyde (VA); (2) six aromatics: benzene (B), toluene (T), styrene (S), p-xylene (p-X), m-xylene (m-X), and o-xylene (o-X); (3) four carboxylic: propionic acid (PPA), butyric acid (BTA), isovaleric acid (IVA), and n-valeric acid (VLA); (4) two ketones: methyl ethyl ketone (MEK), and methyl isobutyl ketone (MIBK); (5) one alcohol: isobutyl alcohol (i-BuAl); and (6) one ester: n-butyl acetate (BuAc) ( Table 2). For the reader's reference, all these selected VOCs have low odor thresholds except for benzene [8,9]. Primarygrade chemicals containing these VOCs were purchased at the purity of ≥97%. The liquid-phase working standards (L-WS) were prepared by a gravimetric dilution of the primarygrade chemicals using methanol. Table 1S shows the detailed procedures for making the L-WS (see Supplementary Material available on line at http://dx.doi.org/10.1155/2013/241585). The basic information of 54 reference VOCs selected by Ahn et al. [4] for Exp-DI and -SPME is also provided in Table 2S. The procedures used for the preparation of those working standards have been described elsewhere [4]. The 19 VOCs selected in this study have several functional groups. However, as the number of these target compounds is limited, we have a plan to add more reference compounds to expand the applicability of our method as well as its validation.

Instrumental
System. The analysis of VOC samples in this study was carried out using GC (Shimadzu GC-2100, Japan) equipped with MS (Shimadzu GCMS-QP2010, Japan) and a UNITY thermal desorber (Markes International, Ltd, UK). The sorbent tube, filled with each 100 mg of Tenax TA, Carbopack B, and Carboxen 1000, was used as the collection media to preconcentrate the L-WS. The cryofocusing trap in the TD unit was packed withTenax TA and Carbopack B at a 1 : 1 volume ratio (inner diameter = 2 mm and total sorbent bed length = 50 mm). The reference VOCs were separated on a CP-Wax column (diameter = 0.25 mm, length = 60 m, and film thickness = 0.25 m) with a 50 min running cycle. Detailed information of instrumental system is described in Table 3.

Data Acquisition and Quality
Assurance. In this study, the calibration data for all reference VOCs were drawn by their L-WS with the aid of the sorbent-based tube approach. For quantification of CLASS, we have to use the TIC mode. This is because the RF patterns of CLASS against their CN of VOC can be only obtained by the full scan mode (TIC mode) in MS system. If we select the SIM mode (selected the spectrum), we cannot find the RF patterns of CLASS. Although it is not easy to separate diverse compounds using GC system, we can technically separate not all but many diverse compounds by controlling the TD-GC conditions. Details of the ST approach have been described elsewhere [10]. The L-WS was injected directly into the sampling inlet of the ST using a microsyringe, while a flow of inert backup gas was constantly maintained (flow rate = 50 mL min −1 for 10 min). Once the ST was loaded by the L-WS containing 19 reference VOCs, it was then analyzed by the TD-GC-MS system. Two calibration experiments were conducted using the L-WS prepared at five concentration levels (in case of benzene: 1.22, 6.12, 12.2, 24.5, and 61.2 ng L −1 ). The reproducibility of experimental data was assessed in terms of relative standard errors (RSE; %) using triplicate analyses of the forth calibration point (24.5 ng L −1 in case of benzene). Detectability of reference compounds was calculated as (1) Raw standard phase: liquid phase.
(2) Method of detection: GC (Shimadzu GC-2010, Japan) and MS (Shimadzu GCMS-QP2010, Japan).  The Scientific World Journal the method detection limits (MDLs) by following the relevant US EPA guidelines. Seven repeated analyses were made using the standard of the 61.2 pg L −1 (in case of benzene), which was obtained by diluting the standard of the first calibration point ( Table 1S). The resulting SD values were then multiplied by 3.14 (Student's t-value at the 99.9% confidence interval) to yield the MDL in mass quantity (pg), which ranged from 10.1 pg (o-X) to 3,878 pg (AA) (mean 235 ± 883 pg).

Development of Predictive Equations for Class by Linear
Regression (LR) Analysis. In this study, calibration of liquidphase standards containing 19 reference compounds was initially conducted. The basic calibration data (RF, R 2 , and RSE (%)) for each individual compound were obtained from replicate calibration experiments of the L-WS containing those VOCs. As shown in Table 4, all coefficients of variation (CV, %) for RF were fairly constant and low for all VOCs with their mean CV ( = 19) = 1.30 ± 1.12%. As such, the calibration results of all VOCs were fairly stable and reproducible. As a simple means to develop predictive equations for CLASS, a linear regression (LR) analysis was made between the actual RF and the three key parameters consisting of (1) CN, (2) MW, and (3) BP. As a means to obtain the best predictive equations, we attempted to establish the key variables for the equations. To this end, the compatibility of each of all three key variables was initially checked between their directly measured (actual) and projected RF values (Table 5). In addition, another type of test was also conducted in the next section in which such compatibility is also evaluated through grouping of all target compounds into various segments to yield the best match between prediction and real measurements ( Table 6). In our laboratory, we are currently developing the methods to test the reliability of CLASS method by measuring standards containing chemicals that do not match with target compounds used to derive predictive equations [11]. In the meantime, the validity of our methodology in this work can be optimized through a number of indirect evaluation procedures described in Tables  5 and 6. Because of the unusually low recovery pattern, the predicting equations were however developed using 18 instead of all 19 reference compounds after excluding AA. The unfeasibility of the ST/TD approach for AA had already been demonstrated in our recent study [10]: the RF value of AA is considerably lower (9 to 55 times) than other aldehydes (AA = 1,854, PA = 16,117, BA = 78,028, IA = 102,217, and VA = 88,556). Hence, all of our predictive equations are tested based on 18 reference compounds (without AA). Table 5 shows the predictive equations derived by an LR analysis of 18 reference VOCs between the actual RF versus three key parameters along with their percent difference (PD) values between the actual and projected RF: In Table 5, the projected RF values for each of all 18 reference compounds are summarized for a whole group and three functional groups, the aldehyde, aromatic, and carboxylic groups. If the PD values were calculated using the equations for all the reference VOCs ( = 18) as a single group, the least magnitude of PD (or least bias) was found from the CN (17.9 ± 19.0%). In contrast, the results for MW and BP were 40.6 ± 27.4% and 81.5 ± 112%, respectively. In addition, a comparison of the coefficient of determination (R 2 ) showed that the CN had the highest R 2 value of 0.9396, while the MW and BP were 0.5445 and 0.1404, respectively. It is difficult to derive the predictive equation for quantification of CLASS The Scientific World Journal 5 using only the physicochemical parameters because of such biases in their predictive equations. To find the possibly least biased predictive equations, the PD values and R 2 values derived from 18 VOCs were examined as a whole and three functional groups (aldehyde ( = 4), aromatic ( = 6), and carboxylic ( = 4)). The results shown in Table 5 indicate that the PD and R 2 values of the three functional groups can be improved further than those derived from 18 VOCs as a whole. For instance, the results of three groups against CN showed PD values below 9.83 ± 6.55%, 2.03 ± 0.76%, and 12.7 ± 4.30%, respectively (with all their R 2 values above 0.9). In addition, the PD values examined in terms of MW and BP also improved significantly. In the case of the former, the PD values varied between 1.73% and 12.7%, while those of the latter were between 2.08% and 17.7% (Table 5). Their R 2 values also ranged from 0.9013 to 0.9738 and 0.7813 to 0.9687, respectively. It means that the RF values of VOCs are significantly associated with their physicochemical parameters like MW and BP if the VOCs belong to same functional groups. The projected equations for each functional group are also plotted in Figure 1.

LR Analysis against CN with Arbitrary Chemical Grouping.
As mentioned in Section 3.1, the projected RF values of the reference VOCs were first derived by an LR analysis between actual RF and their three key parameters. As a result, we found that the CN (among all three physical factors) yielded the optimal (minimum) PD values (between actual and projected RF values). However, the results for MW and BP also improved, if the analysis is made after being divided into a number of small groups. Hence, to develop the predictive equations with the least bias (or the smallest PD values), classification of our 18 reference compounds was made further to yield a total of 29 VOC groups, as shown in Table 6. These 29 groups consist of (1) one group consisting of all 18 compounds, (2) three (out of the original six) functional groups consisting of more than 4 compounds, and (3) 25 arbitrary groups formed randomly by the combination of the original 6 functional groups.
The sorting scheme for the aforementioned third case ( = 25) can be explained as follows. If three functional groups (aldehyde, aromatic, and carboxylic) each of which consist of more than four compounds are referred to as "Major, " then the rest of the original functional groups (ketone, alcohol, and ester) with less than 2 compounds are referred to as "Minor. " Hence, the combination of these major and minor groups was made to produce the following 25 different arbitrary groups: (i) Major (single pair) (I or II or III) + Major (single pair) (IV or V or VI) = 9,  Note that the CN was used as the main variable to obtain the predictive equations. Thus, we computed the projected equations for each of these 29 VOC groups as a function of the CN (Table 6).
If each individual of all 29 predicting equations are tested against the total 18 reference compounds as a whole (like a single group), the 29 resulting PD values averaged 14.9 ± 9.90% ( Table 6). The mean R 2 of all projected equations yielded a fairly high value of 0.8955 ± 1.490 ( = 29). Likewise, the 29 predicted equations can also be fitted against each of the six main VOC functional groups of I through VI in Table 6. As we eventually intend to test the applicability of this predictive equation to any CLASS by the identity of its functional group, we attempted to find the best fitting equation by testing all 29 equations to each of the six original functional groups that were sorted from 18 reference compounds. According to this approach, the patterns of PD computation contrasted in two different ways. At first, the projected RF values of aromatic (II), ketone (IV), and ester (VI) were almost similar to their actual RF values derived by the L-WS, yielding mean PD values below 10% (aromatic ( = 17) = 3.26 ± 0.74%, ketone ( = 8) = 8.95 ± 5.36%, and ester ( = 8) = 4.61 ± 3.67%). In contrast, aldehyde (I) and alcohol (V) had relatively high PD values of 33.0 ± 12.1% ( = 17) and 34.1 ± 14.0% ( = 8), respectively. Hence, if the minimum PD values derived by the 29 equations are taken for each of the six groups, the minimum PD values of the six functional groups were greatly reduced (e.g., below 10% in range of 0.27% (ester) to 9.87% (carboxylic) ( Table 6). Table 7 shows the results of a stepwise approach to find the optimal PD of the 18 reference compounds: (1) by matching the best equation for a given functional group and (2) by computing the PD value for each member of each functional group by the equation selected in the first step. As we intend to find and apply the predictive equations for CLASS, we selected the best equations for each of all six functional groups that comprise 18 reference compounds. As shown in Table 7, if the best fit equations for each of the six classes are applied, the PD values for all 18 compounds are dramatically reduced (5.60 ± 5.63%). The feasibility of this grouping scheme on the estimation bias can be tested as follows. If classification of the 18 reference compounds is made in a simple form, for example, three groups instead of six groups, the PD values were again raised near 10% from 5% level (Table 3S). Thus, by following this type of procedure to match the best equation for each member of the original functional groups, one is able to find the optimal projected The Scientific World Journal 7

Comparison of the Projected RF Values with the Previous
Study. In this section, we investigated a simple basic means to test the validity of four predictive equations using the data sets made from other studies (e.g., [4,11]). Through the reevaluation of those previous studies, we explored the validity of our new approach-whether the RF patterns of VOCs against their CNs (for a given VOC group) were formed or not. In the previous study of Ahn et al. [4], some predictive methods were investigated to estimate the concentrations of various CLASS using liquid-phase standards containing 54 VOCs based on the two contrasting experiment approaches of Exp-DI and Exp-SPME (Table 4S). However, the calibration data of Ahn et al. [4] were evaluated in a simplified way without considering chemical grouping of CLASS. Unlike the case of this study. The results of Ahn et al. [4] were hence reinterpreted to allow comparison with this study on a parallel basis as follows.
The raw calibration data of the Exp-DI and -SPME of Ahn et al. [4] presented in Table 3 yielded mean R 2 values of 0.9666 ± 0.0275 ( = 51) and 0.9856 ± 0.0410 ( = 50), respectively. The two methods investigated previously also had a good analytical reproducibility (RSE for all below 3%). Out of 54 compounds analyzed in the previous study, we focused on 49 compounds, excluding five (1,2dichloroethane, 1,1,1,2-tetrachloroethane, methylene chloride, p-xylene, and hexachlorobutadiene) for a parallel comparison (due to the detectability or eccentricity of calibration problems).
To make a meaningful evaluation of the Ahn et al. data, LR analysis was also conducted against three parameters ((1) CN, (2) MW, and (3) BP). In Table 5S (2) intercepts) developed for 29 arbitrary groups (codes) in Table 6 are used. If the projected equations were derived after dividing all chemicals into multiple VOC functional groups, the PD values against three physical parameters improved further into a narrow range. For instance, in the case of Exp-DI, the mean PD results of haloalkane ( = 17) fell in the range of 30.2% (BP) to 35.9% (MW), while those of aromatic ( = 24) fell in the range of 28.3% (CN) to 31.2% (MW). In the case of Exp-SPME, the mean PD results of haloalkane ( = 17) fell in the range of 48.7% (BP) to 50.5% (MW), while those of aromatic ( = 24) changed from 11.8% (CN) to 20.3% (MW).
To derive the best predictive equations using the data sets of Ahn et al. [4], we checked the compatibility between the two RF values from the maximum numbered (13) groups of 49 reference VOCs: (1) one group consisting of all 49 compounds, (2) two functional groups (multiple components with different CNs), and (3) 10 arbitrary groups randomly formed by the combination of the four original VOC functional groups (Table 6S). Then, the PD values were computed for the whole component (as a single group) and each of all four VOC groups by fitting the 13 predictive equations. However, in comparison with the present study, the Exp-DI and -SPME generally had weak linearity with mean R 2 values of 0.4576 and 0.4565, respectively; these values were far lower than the Exp-TD counterpart (0.8955) in this work. The minimum PD values of all reference VOCs of Ahn et al. [4], if assessed by the best fit equations, averaged between 27.5 ± 34.2% (Exp-DI) and 30.3 ± 45.3% (Exp-SPME) ( Table 7S). Although the number of reinvestigated model compounds ( = 49) is much larger than that of this study ( = 18), the compatibility of RF values in the previous study is dramatically low relative to this study (Exp-TD = 5.60 ± 5.63%).
In line with our efforts to develop estimation methods, Allgood et al. [12] also attempted to predict the concentration of CLASS in complex mixtures. These authors examined the ratio of relative sensitivity (comparable to RF in this work) among the 43 reference compounds (VOCs) against their MWs. The RF values of all reference compounds measured by GC-MS system yielded two types of calibration results: (1) molar sensitivity and (2) mass sensitivity. More specifically, these authors used the mole and mass values of each reference compound as the variable of the -axis to derive sensitivity (or RF) ( -axis = peak area). The relative molar and mass sensitivities of all reference compounds were then calculated against n-octane as the key reference (relative sensitivity = reference compounds sensitivity/n-octane sensitivity).
Allgood et al. [12] found that the relative molar sensitivities yielded a stronger linear correlation with the MW (R 2 = 0.7878) than with the relative mass sensitivity (R 2 = 0.0968). Although these authors assessed sensitivity only in relation to molar and mass parameters, their results can also be reevaluated by adopting our approach, wherein the CN is used as the key criteria instead of MW. Hence, the data of Allgood et al. [12] are newly examined by the same procedure of this study. To derive the best results based on our approach, all chemicals ( = 43) tested by Allgood et al. [12] were grouped into five functional groups and plotted against CN. As expected, the R 2 values then improved significantly (R 2 and values = 0.8717 and 2.10 − 03 (haloalkane, = 7), 0.8518 and 5.12 − 05 (aromatic, = 11), 0.9963 and 3.87 − 02 (ketone, = 3), 0.9243 and 5.50 − 04 (aromatic ketone, = 7), and 0.8929 and 2.12 − 01 (phthalate, = 3)). As such, these obtained results are far better than the original results simply examined against MW ( 2 = 0.4427 and value = 1.13 − 06) (Figure 2). The results of our comparative efforts to reevaluate the two previous studies [4,12] consistently confirm that the use of the modified statistical approach tested in this study can be used to produce improved predictions of CLASS in relation to selected reference chemical group.

Conclusions
The external calibration has been the most commonly used method for the quantitative analysis of diverse VOCs in environmental media. If the number of detected VOCs exceeds hundreds to thousands, assessment of all individual components in a quantitative sense is not easy. This is because all detected VOCs cannot be standardized (because of unavailability of a standard material, etc.). In this study, 18 VOCs representing the six functional groups were selected as the reference to develop predictive equations to assess the concentrations of CLASS belonging to any of those functional groups. To find the optimal predictive equations of each reference compound, we conducted a series of LR analyses between their actual RF values (derived by external calibration of the liquid standard by the sorbent tube method) and three physicochemical properties (of the model compounds): (1) CN, (2) MW, and (3) BP.
As a means to validate the applicability of the predicting equations for CLASS, a total of 18 reference VOCs were arbitrarily classified into 29 VOC groups by the combination of the raw six functional groups. Then, the reliability of this approach was evaluated by assigning the six best fit equations to each of all six groups and examined in terms of the PD value between different RFs. If the optimal PD values of each reference compound are derived for each of all 18 compounds, they averaged as low as 5.60 ± 5.63% (range of 0.27% (Ester) to 18.6% (PA)). As a result, we were able to demonstrate the possibility that the projected RF values of the 18 reference VOCs, if assessed by this statistical approach, can comply well with their actual RF values determined experimentally. In other words, if the predictive equations were used to estimate the concentration of CLASS in real environmental samples, it is possible to derive quantitative concentration data for the CLASS with a fairly low experimental uncertainty.
In conclusion, if one can obtain and carry out the external calibration for a number of VOCs with diverse chemical functional groups, those calibration data may be used to develop a predictive equation to quantitatively determine CLASS with reasonably good confidence. As a new task, we are currently putting extensive efforts to validate the experimental performance of this approximation tool against various standard mixtures or real environmental samples. The results of this validation approach will soon be reported. In the meantime, the results obtained in this preliminary study will be able to offer valuable insights into the potential use of this predictive approach for various CLASS.