Application of Artificial Neural Network in Simulation of Supercritical Extraction of Valerenic Acid from Valeriana officinalis L .

Application of artificial neural network (ANN) has been studied for simulation of the extraction process by supercritical CO2. Supercritical extraction of valerenic acid from Valeriana officianalis L. has been studied and simulated according to the significant operational parameters such as pressure, temperature, and dynamic extraction time. ANN, using multilayer perceptron (MLP) model, is employed to predict the amount of extracted VA versus the studied variables. Three tests, validation, and training data sets in three various scenarios are selected to predict the amount of extracted VA at dynamic time of extraction, working pressure, and temperature values. Levenberg-Marquardt algorithm has been employed to train the MLP network. The model in first scenario has three neurons in one hidden layer, and the models associated with the second and the third scenarios have four neurons in one hidden layer. The determination coefficients are calculated as 0.971, 0.940, and 0.964 for the first, second, and the third scenarios, respectively, demonstrating the effectiveness of the MLP model in simulating this process using any of the scenarios, and accurate prediction of extraction yield has been revealed in different working conditions of pressure, temperature, and dynamic time of extraction.


Introduction
Valerian essential oils or extracts of valerian root have since long been used as sedatives.Therefore, extensive studies have been performed on the extract of the valerian root in recent years [1,2].These studies have revealed antispasm and sedative properties of the valerian [3][4][5], and attributed medical properties of the valerian mainly to valerenic acid (VA) [5].Due to these findings, VA is used in formulation of many drugs and cosmetic products.Different methods have been employed for the extraction of the valerian root extract [6,7], which are divided into two major categories.Hydrodistillation as the first category is inexpensive and easy to implement.However, its main disadvantage lies in operating at the boiling point of water, leading to the loss of many water-sensitive or temperature-sensitive combinations.Supercritical fluid extraction (SFE) in the second category has been recently used for the extraction of the natural materials to a great extent [6][7][8][9].Working at lower temperatures, fast speed of the process, use of clean inexpensive fluid, and easy separation of the products are the main suitable features of SFE.Although the SFE is more costly compared to other methods such as hydrodistillation, its distinctive properties have made it so popular in food and drug industries.
The previously mentioned features of the SFE have drawn attention of scientific and industrial bodies to this technique.Owing to the importance of accurate simulation in design and control of the extraction process, different models have been proposed for simulation and prediction of the supercritical fluid extraction process.Traditional modeling methods, which are based on the balance of mass, energy, and momentum, require the accurate data of the process [10][11][12][13].These are time-consuming analytical methods which encounter considerable error when there is insufficient knowledge about processes.On the other hand, computational intelligence-(CI-) based methods do not require exact description of the process mechanism [ [14][15][16] and can simulate the system's output with reasonable accuracy, only using experimental data.These models are mainly applied when process complexities hinder analytical modeling of the process.Artificial neural networks (ANNs) as a type of CI-based models were inspired by parallel structure of the neural computations in human brain.The overall structure of the ANN model is determined by an algorithm or by the operator.The network's parameters are tuned by learning algorithms and experimental data in order to minimize the output error.In this paper, three different MLP-based models in three scenarios are proposed for simulation of supercritical extraction of the roots of valerian (Valeriana officinalis L.).These models are constructed to perform predictions at the new time, temperature, and pressure (which are not present in training data).Temperature, pressure, solvent flow rate, and particle size are considered as the input variables, while the yield of the extracted valerenic acid is regarded as the output variable.In each scenario, the training, validation, and testing data sets are selected considering the scenario's goal.Here, Levenburg-Marquardt algorithm is used to train the network [17].Different scenarios are studied to approve the capability and reliability of ANN in simulation of the studied process.The literature survey rarely revealed the studies about application of ANN for simulation of supercritical extraction of essential oil [18,19].

Supercritical Fluid Extraction
Compressing a fluid over its critical pressure (P C ) or heating a gas over its critical temperature (T C ) produces supercritical fluid.In this state, the fluid looks like a compressed gas or a dispersed liquid.Such fluids do not possess the specific properties of the gasses or liquids.Their viscosity and density are close to those of gasses and liquids, respectively.Separation technique which uses this type of fluids is termed supercritical fluid extraction (SFE).The general schematics of SFE process are depicted in Figure 1.
As shown in Figure 1, the extraction fluid such as carbon dioxide is first cooled in condenser and turned into liquid state.The obtained liquid is then pumped into the heater

The Neural Network
Inspired by biological natural systems, ANN is comprised of a series of simple processing units, called neurons, and intended to solve complex problems.Figure 2 shows the general architecture of a neuron.
It is evident in this figure that the input x i is formed by the weighted summation of the input variables plus a constant value of (I × ω 0 ) referred to as bias.Then, the neuron's output signal, y i , is determined by an activation function Φ(x).The activation function is normally in the form of linear, sigmoid, or tangent hyperbolic functions.The activation functions of all hidden layer neurons are considered to be same [17].Hence, the mathematical description of the calculation of the ith neuron's output with p input variables is stated as follows: where u j = [u 0 , u 1 , u 2 , . . ., u p ] T and ω j = [ω 0 , ω 1 , ω 2 , . . ., ω P ] are input and weighting vectors, respectively.In Figure 2, w 0 is the bias constant, considered as unity.

Multilayer Perceptron Model.
Multilayer perceptron (MLP), as the most well-known ANN model, consists of an input layer, one or several hidden layers, and an output layer.The MLP's structure is illustrated in Figure 3.
In MLP network, the input units just transfer the incoming data without performing any processing task on them.The input data after passing the input layer reach the hidden layer which can be formed by one or more layers.Having processed in hidden layer neurons, these processed data are transferred to the output layer.A tangent hyperbolic or sigmoid transfer function is normally employed in MLP's hidden layer.In this paper, a tangent hyperbolic activation function has been used.Consider where x i is the weighted summation of the inputs and Φ(x i ) is the output of the ith neuron.The activation function in the output layer is usually considered to be linear, computing the linear combination of its inputs.Now, the total output of the network with one hidden layer, M neurons in hidden layer and one neuron in output layer, is calculated as follows: where (ω 0 × 1) is associated with the output neuron's bias.Obviously, all possible structures must be tried to find the best performing network.Therefore, the different numbers of the network layers and neurons in each layer and different activation functions must be considered.The number of neurons in input layer and output layer are same as the number of the input and output variables, respectively.Number of layers and hidden layer neurons is determined depending on the complexity of the problem and desired accuracy.Hence, the weighting coefficients and bias constants are the only unknown parameters of the network.The procedure of tuning these unknown parameters in order to minimize the output error is termed training.Neural network training is normally carried out in a supervised manner.In this training method, input data are passed through the network, and the output is calculated.Next, the difference between the network output and the real values, regarded as network error, is computed.Then, the training algorithm improves the weighting coefficients and bias constants such that the obtained error is reduced.This process continues up to achieving the desired error.In this paper, Levenburg-Marquardt training algorithm is employed to train the MLP network [20].

Levenburg-Marquardt Algorithm. Newton optimization methods have been established based on the second-order
Taylor expansion around the old weight vector.Direct Newton method encounters difficulty in computation of the Hessian matrix.Therefore, scaling factor methods such as Levenbrug-Marquardt algorithm have been proposed.Although the Levenbrug-Marquardt algorithm and Newton method both have been set up on the Hessian matrix, the former has been designed to approach second-order training speed without having to compute the Hessian matrix.Consider the function V (w).This function should be minimized with respect to the vector of networks parameters (w).Vector w is updated as follows: where ∇ 2 V (w) is Hessian matrix and ∇V (w) represents V (w) gradient.If V (w) is regarded as squared sum of error, then we have Now, the Hessian matrix ∇ 2 V (w) and the gradient ∇V (w) are defined using Jacobean matrix as follows: where and w = [w 1 , w 2 , . . ., w NP ] is the vector of network parameters.
It is obvious from (8) that the size of Jacobean matrix is N × N p , where N and N P are the number of algorithm executions and network parameters, respectively.In Gauss-Newton method, the second term in ( 7) is assumed to be zero.Hence, the w is updated based on the following relation: The difference between Levenburg-Marquardt algorithm and Gauss-Newton method can be stated as follows: where I represents an N p × N p identity matrix.In each iteration when V (w) decreases, μ is multiplied by μ inc .However, in case of decrease in V (w), μ is divided by μ dec .Therefore, w is stated as follows: Levenburg-Marquardt algorithm as an appropriate optimization tool for the small-and medium-sized networks (with less than hundred parameters) has been employed in this paper.Different structures, including different number of layers, neurons, as well as different activation functions, must be examined in order to find the best performing network.The type of activation functions must be determined in next step.We have used tangent hyperbolic and linear activation functions in hidden layer and output layer, respectively.The network structure is decided on in the third step.Number of input layer neurons equals the number of input variables.On the other hand, number of output layer neuron is same as the number of output variables.Hence, only the number of hidden layer and their neurons must be determined.To do this, first, a network with one neuron in its hidden layer is constructed.Then, the other neurons are added one by one until the predetermined desired value of error is reached.Now, the network's parameters are tuned by training algorithm.For the networks with one hidden layer, total number of weighting coefficients and biases is computed based on the following equation: where M, P, and Q are the number of hidden layer neurons, inputs, and outputs, respectively.As the size of the network increases, the number of network parameters also increases.Therefore, there is an optimal value for the network's size.
The closer the number of parameters to the number of data, the less general network will be produced.Therefore, the number of network parameters should not exceed half of the number of training data.Theoretically speaking, training error in ANNs can approach to zero.However, this situation is not desirable at all because it reduces network's flexibility.It means that the knowledge acquired through the training process is not applicable to other data, resulting in a large value of error.This is called overtraining.To prevent this undesirable circumstance, validation error is computed after each execution.When this error begins to rise, the training process is terminated.1.

Application of MLP to Simulation of Valerenic Acid Extraction Process
Regarding the experimental data, it is expected that predicting the extraction yield at new time, new pressure, or new temperature (which were not present in training process) exhibits small error.On the other hand, prediction at new particle size or new solvent flow rate brings about more error.Therefore, three different testing scenarios were designed to forecast the amount of extraction yield at new time, pressure, and temperature.In the first scenario, the amount of extracted VA at 15 minutes of each execution was picked out as testing set.Validation data were selected randomly from other data.The remaining data were allocated to training set.In the second scenario, the fourth execution was chosen for testing purposes.Validation data were selected randomly from other data.The remaining data were assigned to training set.The fifth execution was chosen as the testing set in the third scenario.Like other scenarios, validation data were selected randomly from other data, and the remaining data were allocated to training set.Obviously, the first scenario was focused on prediction at the new time, while the second and third scenarios were chosen to predict the yield at new pressure and temperature, respectively.
Then, different network structures for each scenario were trained by Levenburg-Marquardt algorithm [20].The final MLP networks associated with each scenario were evaluated by R2 error criterion, defined as follows:

First Scenario.
In this scenario, obtained data at 15 minutes of the each execution were chosen as test set.Validation data were selected randomly from other data.The remaining data were allocated to training set.Training, validation, and testing data sets consist of fifty, twenty, and ten members, respectively.The best performing MLP structure for this scenario had three neurons in one hidden layer.The constructed network is comprised of 22 parameters, which is reasonable regarding the number of training data.The value of R 2 criterion for this network was 0.971.Figure 3 presents an illustrative comparison between obtained and real values of extracted VA associated with the first scenario.

Second Scenario.
The fourth execution was selected for the second scenario.Validation data were selected randomly from other data.The remaining data were allocated to training set.Training, validation, and testing sets consist of 54, 18, and 8 members, respectively.This scenario was intended to predict the amount of the extraction yield at new pressure.The best MLP structure in this scenario has 4 neurons in one hidden layer and 29 parameters.The obtained value of R 2 in this scenario was 0.94. Figure 5 and Table 3 present a detailed comparison between predicted and real values.4.4.Third Scenario.In our last scenario, the fifth execution data were used.Validation data were selected randomly from other data.The remaining data were assigned to the training set.In this scenario, training, validation, and testing sets consist of 54, 18, and 8 members, respectively.This scenario was designed to predict the amount of extraction yield at new temperature.The optimal network for the third scenario was constructed with 4 neurons in one hidden layer and 29 parameters.The R 2 value for this scenario was 0.964.The comparison between real and predicted values has been presented in Figure 6 and Table 4.
The performed comparisons in Figures 3, 4, and 5 and Tables 2-4 show that the network's outputs are quite accurate.Thus, the MLP model can be applied to predict extraction yield at new time, pressure, and temperature.

Discussion
As stated in previous section, the best MLP models for the first, second, and third scenarios had three, four, and four   neurons in one hidden layer, respectively.In first scenario, several data points from each execution were present in training set, leading to a more simple structure.However, the whole data of the fourth execution were chosen as the testing set for the second scenario.Similarly, the fifth execution was assigned to the third scenario testing data.Hence, the prediction in second and third scenarios was more difficult, resulting in more complex structures.On the other hand, the first and third scenarios can be interpreted as interpolations.Quite contrary, the second scenario was an extrapolation.Referring to Table 1, it can be seen that the time variable takes the values of 1.5, 5, 9.5, 15, 21.5, 37.5, and 47 minutes The temperature variable takes the values of 130, 318, 326, and 334 K, and the pressure takes the values of 15, 22, 29, and 36 MPa.The data points related to 15 minutes of all executions were selected as the testing set in the first scenario.The whole data of the fourth execution, that is, at temperature of 334 K, were assigned to the second scenario testing set.In the third scenario, the whole data of the fifth execution, that is, at pressure of 29 MPa, were chosen as the testing data.Obviously, the extrapolation error is higher than that of interpolation.Therefore, the R 2 value for the second scenario is less than the first and third scenarios.
The error of MLP model can be attributed to three factors.The experimental error is the first factor.As shown in Figure 7, the experimental data curves do not have a smooth slope, and there are some breaking points.These breaking points are probably due to the measurement error in extraction yield.Network estimation is the second component of error.The predicted curves by the MLP network have a smooth slope, crossing through the middle of the experimental data points.This leads to a difference between network's output and the experimental data.If the training error is minimized, the outputs of the network will fit the breaking points present in experimental data curves.Therefore, the measurement errors are introduced into network structure, resulting in inaccurate prediction of the testing data.Some other experimental factors that have been ignored are the third source of error.

Conclusion
Simulation of the complex processes such as supercritical extraction of valuable components from pharmaceutical plants requires the stochastic methods.In this work, application of multilayer perceptron model in ANN has been conducted for simulation of VA extraction by CO 2 supercritical fluid.To approve the reliability and capability of ANN, three testing sets in three different scenarios were selected.In order to find the optimal structure for each scenario, Levenburg-Marquardt training algorithm was employed.The R 2 values for these three scenarios were 0.971, 0.940, and 0.964, respectively.Small error values in all scenarios demonstrated the effectiveness of the MLP model in simulation of SFE process.It was concluded that MLP model can be well applied to simulate extraction process and optimize the operational conditions in the complex processes where there are no theoretical models.

Figure 2 :
Figure 2: General architecture of a neuron.
and is heated up to T C .The heated fluid is then pumped to extractor.The extraction vial is filled with dried plant powder, and its temperature is constant.After the extraction, the fluid goes into the collecting vial.Then, its temperature and pressure are reduced in collecting vial.Therefore, the extraction fluid is exhausted, and the extraction yield remains in the collecting vial.In this study, the experimental data of the extraction process of VA from valerian plant by supercritical fluid of carbon dioxide has been used.In this process, changes in pressure and temperature cause changes in fluid's density and viscosity and finally result in the different extraction efficiencies.Change in particles' size affects mass transfer coefficients, and extraction rate is affected by fluid's flow rate.Generally, increase in pressure, temperature, and solvent flow rate and decrease in particles' size improve the extraction efficiency.

Figure 3 :
Figure 3: The general structure of an MLP network with 5 inputs, 3 neurons in one hidden layer, and an output layer.

3. 3 .
Structure of the Multilayer Perceptron Model.Simulation process by MLP model includes several steps.Experimental data are divided into three sets in first step.The largest set, termed training set, is comprised of 70 percent of the whole data.Validation set, as the second set, consists of 20 percent of the whole data.The last 10 percent of the experimental data are allocated to test set.Training set is used for tuning the network parameters.To validate our model, the validation set is employed, and the test set is used to perform prediction.

Figure 4 :
Figure 4: Comparison between real and predicted yield (%) after 15 min extraction in first scenario.

Figure 5 :
Figure 5: Comparison between real and predicted yield (%) in second scenario.

Figure 6 :
Figure 6: Comparison between real and predicted yield (%) in third scenario.

Figure 7 :
Figure 7: The experimental results (yield %), from the initial time to the end for all executions.

Table 1 :
Operational conditions of the experiments.

Table 2 :
The results of the constructed networks with their corresponding error.

Table 3 :
Numerical comparison between real and predicted yield values in second scenario.

Table 4 :
Numerical comparison between real and predicted yield values in third scenario.