Quality Control of Olive Oils Using Machine Learning and Electronic Nose

The adulteration of olive oils can be detected with chemical test. This is very expensive and takes very long time. Thus, this study is focused on reducing both time and cost. For this purpose, the raw data has been collected from olive oils by using an e-nose from different regions in Balikesir in Turkey. This study presents two methods to analyze quality control of olive oils. In the first method, 32 inputs are applied to the classifiers directly. In thesecond, 32-input collected data are reduced to 8 inputs by Principal Component Analysis. These reduced data as 8 inputs are applied to the classifiers. Different machine learning classifiers such as Na¨ıve Bayesian, 𝐾 -Nearest Neighbors ( 𝑘 -NN), Linear Discriminate Analysis (LDA), Decision Tree, Artificial Neural Networks (ANN), and Support Vector Machine (SVM) were used. Then performances of these classifiers were compared according to their accuracies.


Introduction
The olive industry is a very important income source in the Mediterranean region. More than 9.4 million tons of olive fruit are produced per year in the world. Approximately, 805 million olive trees which are 98% of these trees are in the Mediterranean region. Every year, the olive oil production value is about $2.5 billion [1]. People consume 60 million tons of seed oil and 2 million tones of olive oil per year [2]. Olive oil and virgin olive oil are the most usable elements in Mediterranean kitchens. This type is frequently more expensive than vegetable oils; therefore, adulteration with cheaper or lower quality oils may afford important advantages economically. Most frequent adulterations in olive oils can be seen with sunflower oil, maize oil, and coconut oil [3][4][5][6] and even with hazelnut oil [7]. Thus, continuing being careful is required to control the adulteration of olive oil products and to guard the attention of consumers and as well as industry in general [8].
Electronic nose (e-nose) is an instrument which imitative the sensation of smell. E-nose has array of sensors to detect and recognize different odors with low cost [9]. E-nose is too beneficial for various applications such as food, cosmetics, pharmaceuticals, and environmental disciplines [10]. E-nose is too beneficial for various applications as a food, cosmetic, and pharmaceutical with these features. Also e-nose is used in environmental discipline [10] or clinical diagnostics [11]. The use of electronic noses is described by many papers in food science and technology. The field of food control is considered as the most important one [12][13][14][15][16][17]. Electronic noses have been used to analyze several food and drinks. Taurino et al. have studied a particular application of a semiconductor slim film based on sensor array for the differentiation of dissimilar olive oils by using Principal Component Analysis (PCA) [18]. Yu   differentiation these honeys with their aroma combination by using PCA and canonical discriminant analysis (CDA) [22]. Shaw et al. have analyzed and separated orange juice samples with an electronic nose by discriminant analysis [23]. Zhang et al. have studied a sensor array to detect a quality index model evaluating the peach quality index using various techniques such as linear regressions, quadratic polynomial step regression, and backpropagation network [24]. They have also used the evaluation of the pear aroma of pear along varied picking dates by using multiple linear regressions (MLR) and ANN [25]. Gómez et al. have studied scoring the ability of electronic nose to monitor the modification in volatile production of ripeness degrees for tomato by using PCA and LDA [26]. Ordukaya and Karlik have proposed identification of fruit juice and alcohol mixtures by using different machine learning methods [27].
The objective of this study is focused on identification and classification of olive oil. For this purpose, two new methods have been proposed for identification and classification of quality control of 12 different types of olive oils by using an electronic nose and a machine learning algorithm.

Materials and Methods
2.1. Collecting the Dataset. 12 different kinds of olive oil samples were collected from various regions in Balikesir in Turkey. There are 10 different samples for each kind of olive oil. Every sample is 50 ml. These collected raw data are digitized them from each kind of olive oils with e-nose which has 32 sensors ( 1, 2, . . . , 32) to generate train and test sets. The name of used e-nose is Cyranose 320 which consists of the carbon and polymer sensors elements that change the resistances when exposed to different vapors of each class of olive oils. Quality control measurement and collected samples of olive oils are shown in Figure 1.
In this study, 12 different olive oil classes type have been constituted for both training and testing sets without mixing themselves. So each class of olive oil is pure 100% that means only one kind of olive type. This study has proposed two different methods to analysis of olive oils for quality control. With this technique, doing quality control of olive oils will be easy and the adulteration of olive oils will be decreased. The setup of the first method has been illustrated in Figure 2.
Firstly, e-nose smells the odor of the each olive oil samples' aroma. It consists of 32 odor sensors. Then, these odor data are normalized by using -transformation method after digitized. Finally, the types of olive oils are characterized by a machine learning algorithm using 32 inputs normalized data. Different machine learning classifiers such as Naïve Bayesian, -Nearest Neighbors ( -NN), Linear Discriminate Analysis (LDA), Decision Tree, Artificial Neural Networks (ANN), and Support Vector Machine (SVM) were used.
In the second method, the collected data of each sensors of e-nose has reduced from 32 inputs to 8 inputs by using Principle Component Analysis as a feature extraction method, and then the data was normalized using same normalization method to analyze with a machine learning classifier. The second method is illustrated in Figure 3.
For the first method, each class of olive oils has 10 different samples. So, we had 120 × 33 train and test matrix for 12 classes. The training and test data samples were collected from olive oils for quality control. For the second method, each class of olive oils has 10 different samples. So, after feature extracting, we had 120 × 9 train and test matrix for 12 classes. The training and test data samples were also collected from olive oils. The test data samples for the first method and the second method can be seen in Table 1.

Naïve Bayesian Classifier.
In supervised learning technique, a labeled training set is presented by the learning algorithm. The learning rule uses the training set to build a model that maps unlabeled samples to class labels. The model serves two aims. The first one can be used to predict the labels of unlabeled samples. The second one can be ensured valued insight for people trying to understand the area. This model serves two purposes: (a) It can be used to predict the labels of unlabeled instances.
(b) It can provide valuable insight for people trying to understand the domain.
This basic method is especially useful if the method is to be understood by user in machine learning. The probability of a class label value is for an unlabeled sample = ⟨ 1 , . . . , ⟩ which consists of attribute values given by Bayes rule: ( ) is the same for all values by conditional independence assumption by The above probability is calculated for each class and the estimation is made for the class with the largest posterior probability. This model is very strong and maintains to perform well even in the face of obvious spoiling of this independence assumption [28].   For the first method For the second method

K-Nearest Neighbors (K-NN
Journal of Food Quality classifying similarity-based objects. Similar objects are near the other objects in the class and dissimilar objects are far from other objects in the class. So, the measurement of their dissimilarity is the distance between two-object digitalized data. In the training time, Nearest Neighbor algorithm comprises the computing process which has been calculating the distances between objects' digitalized data from the feature set. The nearest neighbors give us the smallest distances from that objects. Euclidean, Minkowski, or Mahalanobis measuring types are used to calculate the distance between the objects [29,30].
In this study, we have used = 12 for quality control application of olive oils. We used the same parameters for the two methods. The parameters of used -NN are as follows: (i) Measure type is Numerical Measures.
(ii) Numerical measure is Kernel Euclidean Distance.
(iii) Kernel type is Anova.

Linear Discriminant Analysis (LDA).
LDA is a signal based classification technique which maximizes class separability, creating projects where the samples of each class form compact clusters and another clusters are far from each other. These projections are alternatively defined by the first eigenvectors of the matrix separately where they are inside the class and between the class covariance matrices [31]. LDA maximizes the rate between the class variance to the inside of the class variance in any particular dataset, thereby guaranteeing the greatest separability [32].

Decision Tree.
Much of the study depends on replacing human decision-making ability by automatic decisionmaking algorithms. The decisions under consideration involve identifying builders and builder labels in classification applications.
Decision Tree classification algorithms account for both of these tasks. By assigning a possibility dispensation to the possible selection, decision trees ensure a ranking system which not only specifies the order of preference for the possible choices but also gives a measure of the relative possibility that every selection is the one which should be selected [33]. In this study, the minimal size of spilt was 4 and the size of leaf was 1. We used the same parameters for both methods. The parameters of used Decision Tree are as follows: (i) Criterion is gain ratio.
(vi) Minimal size for split is 4.

Artificial Neural Networks (ANN). ANN is a machine
learning technique which is implemented in hardware or software based on operation of the human brain. ANN can provide meaning from the data which is intricate or incomprehensible to define patterns and discover tendency which are difficult to identify by humans or computer application [34,35]. Artificial Neural has been primarily used to analyze the confusing data relations in many academic and industrial fields [36,37]. We have used conventional Backpropagation (BP) algorithm in our sample application. The Backpropagation algorithm is an iterative gradient algorithm to decrease the root mean square error. Every layer connected to the previous layer [30,38]. In this study, two different multilayered perceptron (MLP) network architectures were used. The first one consists of 32 numbers of input nodes, 23 numbers of nodes in a hidden layer, and 12 numbers of outputs for application in quality control of olive oils. The second MLP architecture consists of reduced 8 inputs by PCA, 11 nodes of a hidden layer, and 12 outputs for application in quality control of olive oils. The parameters of the learning rate and the momentum coefficient were the same for both of them. The optimum values of these parameters were found as 0.3 and 0.2, respectively, for 500 iterations.

Support Vector Machines (SVM)
. SVM is specifically supervised using binary classification to solve problems. The learning problem is formulated as a quadratic optimization problem [39]. SVM is the structure to an optimum separating hyperplane in such a way where the distance of separation between two classes is maximized [30,40]. The same parameters for both methods have been used. The parameters using SVM are as follows: (i) SVM type is C-SVC.
(ii) Kernel type is linear.

Results and Discussion
For the first method which has 32 inputs, we have created a confusion matrix of olive oils to quality control for Artificial Neural Network according to data in Table 2. This confusion matrix was calculated from train and test data for the first method as seen in Table 2.
For the second method, which has been reduced to 8 inputs, we created confusion matrix of olive oils to quality control for Naïve Bayes according to data in Table 3. This confusion matrix was calculated from train and test data for the reduced second method as seen in Table 3.
In the second method, obtained 32 inputs were reduced from 32 to 8 by using Principal Component Analysis (PCA) technique. Table 4 illustrates the best success rates for both ANN and SVM according to different input datasets which reduced from 32 inputs to 7, 8, 9, and 10 inputs. The optimum number of input was found as 8. According to test results of TP (sensitivity) by using both 32 inputs for ANN and 8 inputs Journal of Food Quality 5  for Naïve Bayes classifiers, olive oil types as 1, 3, 8, 9, 10, 11, and 12 are correctly identified compared to the others. Moreover, according to test results of TN (specificity) olive oil types as 2, 5, and 6 are least correctly identified compared to the others. Using the first method, the results of accuracies of different types of machine learning classifiers for both of the training and the test datasets of olive oils are illustrated in Table 5. As seen in this table, the highest accuracy of training results using the first method was obtained as 98,33% for Decision Tree. Similarly, the highest accuracy of test results using the first method was found as 65,83% for ANN.
Using the second method which reduced the inputs, the results of accuracies of different types of machine learning classifiers for both of the training and the test datasets of olive oils are also illustrated in Table 6. As seen in this table, the highest accuracies of training results using the second method were obtained as 98,33% and 95% for Decision Tree and ANN, respectively. Similarly, the highest accuracies of test results using the first method were found as 70,83% and 70,0 for Naïve Bayes and ANN, respectively.
As a result, for the first method, as seen in the Table 5, we can say that the best technique for olive oils to quality   control with 32 inputs is Artificial Neural Network (ANN) by 65.83%. For the second method, as seen in the Table 6, we can say that the best technique of olive oils for quality control with 8 inputs is Naïve Bayes by 70.83%. But in general, As shown in Tables 5 and 6, the success rates in Table 6 are better than Table 5. Redacting the 32 inputs to 8 inputs provide us with saving on process time and techniques simplicity. Comparison of the success rates of classifiers for both methods has been shown in Figure 4. Then true positive (TP), false positive (FP), false negative (FN), and true negative (TN) have been calculated from the confusion matrix. Later on achieved rates such as TP, FP, FN, and TN are calculated class precision ( ) and error rates (ER) by using (3), (4), (5), (6), and (7), respectively.
precision ( ) = TP TP + FP , error rates (ER) = 1 − ACC.  Max sensitivity is a test with no FN (TP/TP = 1, or 100%); therefore any negative result must be a true negative. Similarly, Max specificity is a test with no FP (TN/TN = 1, or 100%); therefore any positive result must be a true positive.

Conclusion
In this study, we have presented two approaches for classification of olive oils for quality control using e-nose and a different machine learning algorithm. To see performances of different machine learning algorithms, Naïve Bayesian, K-NN, LDA, Decision Tree, ANN, and SVM were used to compare their accuracy results of both methods. Both of methods are used with the same machine learning algorithms. There is only one difference that the second one is used with a data reduction algorithm named PCA. After comparison to the performance of both methods, we found the best accuracy for the second method which was better than the first method. So we thought that the second method is appropriate for this application. The best success rate in the second method was found with Naïve Bayes classifier as 70.83%. Also, we wanted to compare the success rates of olive oils for quality control using different machine learning techniques. In this study, we have aimed at helping better doing quality control and cost analysis for olive oils process. In this study, the results show that both proposed methods are faster and very cheaper than classical chemical analysis techniques for identification and classification of quality control of different types of olive oils.
For the future works, we can use the other data reduction and some hybrid algorithms, and we can compare their performances according to accuracies. Moreover it is possible to use these methods for identification of quality control of different types of olive oils which are collected from the other Mediterranean regions.

Additional Points
Practical Application. In this application, we want to reduce time and cost of the real-time olive oil quality control process. We aim to develop a portable control device. We want to develop portable control device with machine learning classifiers such as Naïve Bayesian, -Nearest Neighbors ( -NN), Linear Discriminate Analysis (LDA), Decision Tree, Artificial Neural Networks (ANN), and Support Vector Machine (SVM). Thus, we can do more rapid and less costly control of olive oil quality without need for laboratory and analysis.

Conflicts of Interest
The authors declare that they have no conflicts of interest.

Authors' Contributions
Emre Ordukaya carried out the studies, participated in collecting data, performed the machine learning technics to the study, and drafted the manuscript. Bekir Karlik helped to draft the manuscript. All authors read and approved the final manuscript.