Ordinal Analysis of Variation of Sensory Responses in Combination with Multinomial Ordered Logistic Regression vs. Chemical Composition: A Case Study of the Quality of a Sausage from Different Producers

Te newly developed statistical technique of two-way ordinal analysis of variation (ORDANOVA) was applied for the frst time to sensory responses in combination with multinomial ordered logistic regression of a response category vs. chemical composition. A corresponding tutorial is provided. As a case study, samples of a sausage from diferent producers, purchased at the same time from a market, were compared based on sensory responses of experienced experts. A decomposition of total variation of the ordinal data and simulation of the multinomial distribution of the relative frequencies of the responses in diferent categories showed a statistically signifcant diference between the producers’ samples, and an insignifcant diference between the experts’ responses related to the same sample. Te capabilities of experts were also evaluated. Te infuence of chemical composition of a sausage sample on the probability of a response category was modeled using multinomial ordered logistic regression of the response on mass fractions of the main sausage components. Tis statistical technique can be helpful for understanding sources of variation of sensory responses on food quality properties. It is also promising for a revision of specifcation limits for chemical composition, as well as for the prediction of sensory properties when the chemical composition of the product is subject to quality control.


Introduction
Despite the development of sophisticated chemical analytical methods and modern equipment, sensory analysis based on human organoleptic examination of food products, such as a sausage, will be always important for the people who are consumers of the products.
Te international guidelines for sensory analysis [1] recommend the use of quantitative scales for human (assessor/expert) responses. For example, sensory responses on properties of a sausage quality can be classifed by the following fve quality categories: (1) very bad; (2) poor; (3) satisfactory; (4) good; and (5) excellent. Analysis of variance (ANOVA) is applied for interpretation of such property values [2][3][4]. Other statistical methods developed mainly for quantitative continuous values, including regression analysis, are also applied [5,6]. An implementation of fuzzy logic for modifcation of sensory responses to obtain comparable scores [7,8] has attracted attention. Statistical methods for sensory analysis, in particular for meat and meat products, are regularly reviewed [9,10].
However, responses on an ordinal property scale are categorical data for which a total ordering relation can be established according to the magnitude, with other quantities of the same kind, but no algebraic operations exist among those quantities [11,12]. Teir legitimate operations can be "equal/unequal" and "greater than/less than" [13]. Te addition of categorical data is not a legitimate operation by defnition, whereas one of the ANOVA assumptions is that the treated quantities are additive [14] and calculation of their mean and variance, is possible. Terefore, statistical techniques based on the ANOVA should not be applied directly to ordinal data.
Sensory responses may also depend on the chemical composition of a food product [15,16]. Relationships between sensory evaluation and the chemical composition or physical properties of meat and meat products are a subject of research [17]. Regression analysis is the known tool for studying and modeling such relationships. However, as in the applications of ANOVA, the additivity assumption should not be violated in the regression analysis use [18].
Te aim of the present paper is to overcome the "additivity" problem by implementation of the newly developed two-way ordinal analysis of variation (ORDANOVA) [19] combined with multinomial ordered logistic regression [20][21][22]. As a case study, samples of a kind of commercially available sausage from diferent producers were compared based on the ordinal data from the examination of quality properties by experienced experts. Te infuence of the chemical composition of a sausage sample on the probability of a response category was modeled.

Statistical Approach-A Tutorial
2.1.1. Two-Way ORDANOVA without Replication. Te approach of ORDANOVA is to calculate for each property the number of expert responses related to the same category, and then to analyze their relative frequency as a fraction of the total number of the responses for all categories.
A vector of expert responses for a given property as a random variable Y � (Y 1 , Y 2 , ., Y K ) on an ordinal scale with K categories (classes or levels k = 1, 2,. . ., K) is characterized by a probability vector p � (p 1 , p 2 , . . . , p K ), where p k is the theoretical probability of responses related to the kth category and K k�1 p k � 1. Let F k denotes the cumulative theoretical probability up to the kth category, F k � k l�1 p l and F K � 1. Te probability P of receiving a set of responses n � (n 1 , n 2 , . . . , n K ), where n k is the number of responses related to the kth category ( K k�1 n k � N), can be calculated based on the multinomial distribution with parameters (N, p) as the probability mass function [23]: Note that the probability of the event Y k � n k is P(Y k � n k ) � p k by defnition. Variability in Y is explained in the present study by two independent factors as follows: sausage producers and experts as random factors. Te examination of the samples of sausage was a properly conducted blind test, in which an expert has no opportunity to favor the sausage of a particular producer. No interaction between the two factors can be analyzed, since only one response at the specifed levels of the factors (at each cell in the cross-over balanced design) was obtained, by analogy with laboratory profciency testing [24]. Te frst factor X1 had I levels (sausage samples of I producers were tested) and the second factor X2 had J levels (J experts examined the samples' quality). Tere were N responses in total for a given property, each of them was of one of the K categories of variable Y. Being a result of assessment of a sausage sample of producer i (one of the I levels of the frst factor X1, i = 1, 2,. . ., I), each response was also associated with the expert j (one of the J levels of the second factor X2, j = 1, 2,. . ., J). Tus, it was assumed that each one of the I × J = N cells (i, j) contains only one response belonging to one of the K categories k. In other words, a response at each cell (i, j) was of one category k chosen by expert j when examined a sample of producer i, and correspondingly, the number of responses of the chosen categories n ijk = 1 and zero for all other categories in the same cell.
Treating N responses as a statistical sample and Y ijk as a random variable, then, for the chosen k, p ijk � 1 and zero for all other categories in the same cell. Tus, F ijk � k l�1 p ijl denotes the sample cumulative frequency of responses of categories l = 1, 2,. . ., k, i.e., up to the kth category in cell (i, j); F i.k is the sample cumulative relative frequency of responses up to the kth category at level i of factor X1; F .jk denotes the cumulative relative frequency at level j of factor X2; F ..k denotes the cumulative relative frequency of all responses up to the kth category: Te total sample variation V T of the response variable Y, normalized to the [0, 1] interval, is defned in the two-way ORDANOVA model [19] as follows: In the model without replication, V T is partitioned between the (inter) covariation component C B and the within (intra) residual variation V W , i.e., V T � C B + V W . For the present study, the variation 2 Journal of Food Quality (4) characterizes the between-producer variation of the responses, while the variation is the within-producer variation. Te individual efects of factors X1 and X2 (producers and experts, respectively) can be evaluated using the Another C B decomposition, helpful for comparing the capability of the participating experts (as a group) to identify diferent categories, consists of evaluating the following parts of C B related to the responses of categories less than or equal to category k: For the frst category k � 1 of a sausage property by standard [25], the corresponding C B part is C B (Y 1 ) as a response of category less than 1 does not exist. For the second category, C B (Y 1 + Y 2 ) is the variation of the expert responses of categories k � 1 and 2 (up to 2) because of the cumulative properties of the relative frequencies in (7). Terefore, the variation of responses of the participating experts for category , and so on. A greater value of C B ( k l�1 Y l ) indicates a weaker capability to identify category k. Note that the capability characterizing dispersion of the responses is analogous to measurement reproducibility [12]. When the cumulative relative frequencies in (7) achieve 1, the variation C B ( k l�1 Y l ) � 0. Tis is always the case for the last category K, but may also be for (K -1) or even for (K -2) [26], depending on the form of the cumulative distribution function.
Te null hypothesis of homogeneity of the responses states that the probability of classifying the responses as belonging to the k-th category does not depend on the levels of the frst factor (levels i) nor on those of the second factor (levels j), i.e., p ijk � p k for all i � 1, 2, ..., I and j � 1, 2, ..., J. Under this hypothesis, the following relations are applicable: where E(. . .) denotes the mathematical expectation of the variation in parentheses; Te numerator of the last term in equation (8) is equal to the population of the total ordinal variation corresponding to the probability vector p � (p 1 , p 2 , . . . , p K ).
To check the statistical signifcance of both the factor efects, the following signifcance indices (test statistics) have been defned: Testing the null hypothesis H 0 on the efect signifcance requires knowledge of at least the asymptotical distribution of the index SI for calculation of the critical values of the indices at a given level of confdence (1 − α) · 100 %.
A calculator tool for this purpose was proposed for the two-way ORDANOVA in reference [19]. Te tool calculates from the empirical data the sample vector of relative frequencies p � (p ..1 , p ..2 , . . . , p ..K ), as well as the variation components (C B X1 , C B X2 , V W , V T ) and the empirical signifcance indices (SI X1 , SI X2 ). Te critical values SI crit for the indices in equation (9) are recovered through a Monte Carlo simulation based on at least 10000 trials. Te null hypothesis H 0 is rejected when the signifcance index SI exceeds the critical value SI crit at the (1 − α) · 100 % level of confdence, concluding that a statistically signifcant efect on the response variable Y is detected. Te calculator is freely available by following the link [27].

2.1.2.
Multinomial Ordered Logistic Regression. Multinomial ordered logistic regression (ordered logit) is quite commonly applied in medicine [28], insurance, actuarial, and fnancial applications [29,30], as well as to describe consumer purchasing behavior [31]. Te ordered logit model can be considered as an extension of the logistic regression model for dichotomous dependent variables, when more than two ordinal response categories are used. It is applied here for analysis of ordinal quality sensory responses vs. chemical composition of a sausage. Tis model is based on the following concepts.
When Y is an ordinal outcome with K categories, the cumulative probability of responses of categories l less than or equal to a category k is P(l ≤ k). Te odds of responses being less than or equal to a category k can be defned as P(l ≤ k)/P(l > k), for k = 1,. . ., K − 1. When k = K, the denominator is zero and the odds are not defned. Te log odds, called logit, is defned as logit(P(l ≤ k)) � log (P (l ≤ k)/P(l > k)). Te ordinal logistic regression model is parameterized as follows: where β k0 is the intercept (cutof point) for category k; c 1 to c m are the contents (mass fractions) of the main sausage components (continuous latent variables); β 1 to β m are the corresponding regression coefcients (slopes), constant across categories. Note that this model is based on the parallel regression (proportional odds) assumption: the logit dependences on sausage compositions are parallel hyperplanes for diferent categories k, and hence, the intercepts are diferent for each category but the slopes are constant across categories. Te odds of being less than or equal to category k are as follows: A computer program developed in the R environment for calculation of model parameters, including their confdence intervals and goodness-of-ft measures for the model, is described, for example, at the web page [32]. Te following logit format is implemented in the R program as follows: where η = − β for all the regression coefcients of components' contents c 1 to c m . Inverting equation (12), the probability of getting a response of a certain category k or below can be obtained [33] as follows: Te "PoLR" function of the "MASS" package [34] was used to ft multinomial ordered logistic models to the experimental data, and function "predictorEfects" of the "effects" package [35] was used to calculate and plot probabilities of obtaining a response equal to a certain category k.
Goodness-of-ft of these models can be evaluated by calculating one of the several pseudo-R values [32], which estimates the variability in the outcome of the ftted model. For example, McFadden's pseudo-R 2 is defned as follows: where M full is a full model with predictors; M intercept is the model without predictors, i.e., containing the intercept only; and L is the estimated likelihood.
When the M full model does not predict the outcome better than the M intercept model, its ln L(M full ) is not much greater than ln L(M intercept ); hence, the corresponding ratio is close to 1 and the McFadden's pseudo-R 2 is close to 0: the model has poor predictive value. Conversely, when the M full model is good, its ln L(M full ) is close to zero (since the likelihood value for each observation is close to 1), and McFadden's pseudo-R 2 is close to 1, indicating a successful predictive ability. If comparing two models on the same data, McFadden's pseudo-R 2 would be higher for the model with the greater likelihood.
Te "PseudoR2" function of the R package "DescTools" [36] was applied for the corresponding calculations. Note that correlations between contents of the sausage main components may afect the regression coefcients and p values, but they do not infuence the predictions, precision of predictions, or the goodness-of-ft statistics [22].

Experimental Methods.
Te comparative testing of a sausage as a consumer product [37] was organized by V.M. Gorbatov Federal Research Center for Food Systems, Russia. Samples of boiled-smoked sausage "Moscowskaya" [38] from I = 16 producers for comparison were purchased from a market in 2021, practically simultaneously. Tis sausage is prepared from beef and pork fat with addition of sodium nitrite curing salt (0.4-0.6 % mass fraction of NaNO 2 in NaCl) and spices. Its main chemical components are protein, fat, moisture, and salt.
All samples were examined before their expiration dates (set by the producers) by J = 3 assessors/experts [12] of the Research Center, women of 45-55 years old, each having more than 15 years' experience in sensory analysis of meat products. Examination of samples was performed in a test room [39] with individual testing cubicles.
Five quality sensory properties of the samples were assessed as follows: (1) appearance and packaging, named here "appearance"; (2) consistency; (3) color and appearance of cut sausage, named here "color"; (4) taste; and (5) smell. An expert response related to each quality property was ordered by K = 5 categories from "very bad" to "excellent" (k = 1, 2,. . .,5). A total number N = I × J = 48 responses was obtained for each property, and 48 × 5 = 240 responses for the fve properties.
Contents (mass fractions) of the m � 4 main components were taken from the certifcates of the producers. In total, I × m � 64 continuous quantitative values were obtained.
Te data set, including both qualitative and quantitative data (RawData Microsoft Excel fle) is available at Zenodo [40].

Methods of Sensory Examination.
Te experts who participated in the examination of the sausage samples were not informed about quantitative characteristics of the samples' chemical composition. Te quality parameters of a sample, which was one whole packaged sausage, were examined by a standard protocol [25], in the following sequence: (a) appearance-by visual external examination of the packaged sausage; (b) consistency-by pressing with a spatula or fngers on the outside of the sausage, and then, after removing the casing from the sausage and cutting with a sharp knife into thin slices perpendicular to the surface of the product; (c) color-visually; (d) and (e) taste and smell-by pressing and chewing a slice.

Methods of Testing Chemical Composition.
Te standard Kjeldahl method was used for measurement of protein content c 1 [41]. Te standard method used for measurement of fat content c 2 [42] is based on multiple extractions from a dried sample with a solvent (hexane, diethyl ether, or petroleum ether) in a Soxhlet fat-extraction apparatus. Ten, the solvent is removed and the fat dried to constant weight. Te standard measurement method for moisture content c 3 [43] consists of drying a sample with sand to constant weight at a temperature of (103 ± 2)°C. Salt content c 4 was measured by Mohr's standard titration method [44]. Measurement results are expressed as percent mass fractions. More details including estimation of measurement uncertainties in testing a sausage composition, its specifcation limits and conformity assessment, are described in the paper [45].

Results and Discussion
A fowchart of the data treatment is presented in Figure 1. ORDANOVA starts from calculation of frequencies of expert responses (of diferent categories) and evaluation of the total variation. Te next step is decomposition of the total variation into components with the purpose to assess the efects of two factors infuencing the variation: "producer" and "expert." Another kind of decomposition allows assessment of the abilities of the experts to determine diferent categories of the same quality property. Te components of variation obtained are used for testing hypotheses on the homogeneity of the producers (i.e., the responses to their sausage quality properties) and homogeneity of the experts (their responses to a property of the same sausage sample). When responses of diferent experts are inhomogeneous, and/or the responses to diferent sausage producers are homogeneous, the analysis is ended. Otherwise, it is assumed that the diference between responses to the sausage quality of diferent producers is caused by the diferences in the sausage chemical composition, expressed as mass fractions of the main components. Tis hypothesis is tested with multinomial ordered regression analysis. If any of the regression coefcients are statistically signifcant (the hypothesis is not rejected), probabilities of obtaining a response related to a specifc category for diferent component contents are calculated. Te last step is plotting such probabilities for visualization of the results and their discussion.

Implementation of Two-Way ORDANOVA without
Replication. Frequencies of the responses from the data set are shown in Table 1 by categories (rows) and experts for each quality property of the sausage (columns). All purchased sausage samples were of higher quality than category 2 for any property. Two out of three experts rated the appearance of the whole sausage samples greater than category 3. Also, all experts reported that the sausage consistency was greater than category 3.
Total variation V T of the responses by equation (3), partitioned into the between-producer variation C B by equation (4) and the within-producer residual variation V W by equation (5) are presented in Table 2, which includes the individual efects C B X1 and C B X2 of factors X1 and X2 (producers and experts, respectively) evaluated using the C B decomposition by equation (6). To check the statistical signifcance of both the factor efects the signifcance indices SI X1 and SI X2 were also calculated by equation (9) at the degrees of freedom df X1 � 15 and df X2 � 2. Te critical index values SI crit at the 95% level of confdence in Table 2 were obtained using simulations of SI distributions [27].
Tere is a statistically signifcant diference at 95 % level of confdence between the producers related to all the quality parameters of the sausage (appearance, consistency, color, taste, and smell). Tis diference is called the "inhomogeneity" of the producers. At the same time, the signifcance index values of the expert factor do not exceed its critical value at the 95 % level of confdence, i.e., the hypotheses on the "homogeneity" of expert responses with regard to each of the fve sausage properties are not rejected.
Decomposition of the between producers' variation C B into kth parts C B ( k l�1 Y l ) according to the response categories up to k by equation (7) and corresponding C B ( k− 1 l�1 Y l ) values are shown in Table 3. Comparing the capability of the participating experts (as a group) to identify diferent categories, it is important to remember that a greater value of variation C B ( k l�1 Y l ) indicates a weaker capability to identify the particular category k. Tis decomposition demonstrates that identifcation of very bad and poor sausage quality (k = 1 and 2, respectively) was the simplest task for the experts: C B (Y 1 ) = 0 and C B (Y 1 + Y 2 ) = 0 as no expert responses of these categories were obtained for any property. Te most complicated part of their examination was to identify a diference between satisfactory and good quality (k = 3 and 4, respectively). Te variation up to the last category C B ( 5 k�1 Y k ) equals zero by defnition of equation (7). However, the number of responses of the category of perfect quality (k = 5) was maximum for each property shown in Table 1, and detection of this category was not a problem for the experts. It is also interesting that the maximum controversial responses here are on the sausage taste and especially on smell. Of course, the results shown in Table 3 do not mean in general that the examined ordinal variables have collapsed into a binary group.

Implementation of the Multinomial Ordered Logistic
Regression. Intervals of the sausage main component contents in the data set, their means, and standard deviations are shown in Table 4. Te multinomial ordered logistic regression model by equation (12) was ftted to the component contents in order to predict appearance, color, taste, and smell, assessed by experts according to the three categories shown in Table 1     Journal of Food Quality k = 3, 4, and 5. A logistic regression for dichotomous (binary) outcome variables was used for prediction of consistency, since the corresponding expert responses ( Table 1) were only of two categories, k = 4 and 5. Since for each categorical variable, the responses were found to be homogeneous among the three experts in the ORDANOVA study, all their outcomes were taken together, constituting the set of values to be used in the regression. Te calculated results are presented in Table 5, where the estimates for coefcients β k0 and η related to each component content are Table 3: Decomposition of the between producers' variation component according to the response categories.   Journal of Food Quality 7 reported with their standard errors and 95 % confdence intervals (from the 2.5 % to the 97.5 % quantile). Te η subscripts from 1 to 4 correspond to the subscripts of the component contents c.
Te estimated odds ratios, derived by exponentiating the coefcients, and McFadden's pseudo-R 2 values by equation (14) are also shown in the table. For example, the model by equation Note if a confdence interval does not cross zero, the parameter estimate is statistically signifcant. However, the confdence interval for the estimate η 4 � 2.22 (of the regression coefcient of the salt content c 4 ) crosses zero and this means that η 4 is statistically not signifcant in this case. In other words, the salt content values in the interval shown in Table 5 do not infuence the probability of the category appearance of a whole sausage.
Probabilities P of obtaining a response of category k by dependence on protein content c 1 , calculated at the mean values of contents of other main components (Table 4), are shown in Figure 2(a). In the case of three categories of the observed responses (k � 3, 4 and 5), the probability P(l � 3) � P(l ≤ 3), P(l � 4) � P(l ≤ 4) − P(l � 3), and P(l � 5) � 1 − P(l � 3) − P(l � 4), where P(l ≤ 3) and P(l ≤ 4) can be evaluated by equation (13). Te P values for k � 3, 4, and 5 are shown by lines 1, 2, and 3 in Figure 2(a), respectively. Te P dependences on fat content c 2 and on moisture content c 3 are shown in Figures 2(b) and 2(c), respectively. Tey were also calculated at contents of other main components equal  (Table 4). Line 1 is for the category "satisfactory," line 2 for the category "good," and line 3 for the category "excellent." to their observed mean values in Table 4. Te infuence of all the component contents on the probability values is very similar, but the probability curves vs. contents of protein and moisture are cut at the lower limits of the content intervals (their minimum values observed in the comparison). Te full picture is shown in Figure 2(b) for the probabilities vs. contents of fat, where the probability values of the appearance categories vary from zero to the maximum, or from 1 to 0 and vice versa. Note that increasing c 1 , c 2 , and c 3 leads to an increasing probability of the responses of the highest appearance category k � 5 (excellent quality).
Te salt content c 4 does not infuence probabilities of responses of consistency dichotomous categories k � 4 and 5; hence, it was removed from the list of regressors. Te confdence interval of the intercept β k0 in the consistency model in Table 5 crosses zero. However, the intercept closeness to zero does not refect infuence of a component content on the probabilities of consistency categories. Since responses of two categories were obtained, the probability of one of them is the complement to the other, i.e., P(l � 5) � 1 − P(l � 4) and vice versa. Terefore, P(l � 5) only is shown in Figure 3 to illustrate the probability dependence on component contents.
Te probabilities of responses of diferent color categories do not depend on the contents of fat and salt, c 2 and c 4 , in their observed intervals. Model "Color * " without these two variables has the same McFadden's pseudo-R 2 value 0.09 as full model "Color," and practically, the same regression coefcients for c 1 and c 3 in Table 5. Probabilities P of responses of diferent color categories in dependence on  Table 4.  Table 4. Te line numbers are as in Figure 2. content of protein c 1 are shown in Figure 4(a), and on moisture content c 3 in Figure 4(b).
Te full models for taste and smell are the best ftting models among the qualitative sausage properties; their McFadden's pseudo-R 2 values in Table 5 are about two to three times greater than those for appearance, color, and consistency. Dependences of probabilities P of responses of diferent taste and smell categories on the contents of main components, calculated at contents of other main components equal to their observed mean values, are shown on Figures 5 and 6, respectively.
In general, the maximum probability of responses of each category of taste and smell is reached at increasing contents of the infuencing main components. Similar  Table 4. Te line numbers are as in Figure 2.  Table 4. Te line numbers are as in Figure 2.
efects are also observed in the plots in Figure 2 for appearance as follows: the frst category reaching its maximum probability in the studied ranges of the component contents is 3, then 4, and fnally 5, i.e., higher categories are more probable with greater contents of components. However, the salt contents in the interval considered in this study do not signifcantly infuence responses on appearance, color, and consistency. At the same time, taste ( Figure 5) and smell ( Figure 6) are infuenced by the salt contents in a reverse order than contents of other main components; the greater the salt content, the lower category is the more probable. Te probabilities of responses of the excellent quality category P(l � 5) of both, taste and smell, increase with contents of protein up to c 1 of about 17 %; fat, c 2 of about 26 %; and moisture, c 3 of about 58 %, while the minimal salt content c 4 = 2.2 % is preferable. Tese estimates could be helpful for a revision of the specifcation limits of the sausage composition, necessarily taking into account the mass balance constraint; the sum of actual values of mass fractions of the four main components should be equal to 100 % or to a positive fraction less than 100 % [45].

Conclusions
A data set of ordinal responses of three experienced experts who assessed fve quality properties of samples of a boiled, smoked sausage from sixteen producers was analyzed. Te responses were ordered using fve categories. Implementation of ORDANOVA allowed decomposition of total variation of the ordinal data and simulation of the multinomial distribution of the relative frequencies of the responses in diferent categories. A statistically signifcant diference in quality properties of the sausages from diferent producers was detected, while the diference between responses of the experts was insignifcant.
Capabilities of the experts to identify diferent categories of the quality properties were also evaluated. It was shown that identifcation of "very bad" and "poor" quality, as well as "perfect" quality is the simplest task for the experts. Te most complicated part of their examination was to identify a diference between "satisfactory" and "good" quality-the closest categories.
Infuence of chemical composition of a sausage sample on the probability of a response category was modeled using the multinomial ordered logistic regression of the response category on mass fractions of four main sausage components. Obtained estimates could be helpful for a revision of the specifcation limits of the sausage composition, as well as for prediction of the product sensory properties when its chemical composition is under quality control.

Data Availability
All relevant data are included in the paper or the data set is available at Zenodo [40]. https://doi.org/10.5281/zenodo. 7213008.

Conflicts of Interest
Te authors declare that they have no conficts of interest to disclose.