Fault Isolation for Desalting Processes Using Near-Infrared Measurements

Due to the important role of crude oil desalting for the whole petroleum refining process, the near-infrared spectroscopy resulting from molecular vibration is used to detect and isolate potential faults of the desalting process in this paper. With the molecular spectral data reflected by the near-infrared spectroscopy, the principal component analysis is adopted to monitor the process to see if it is in a normal operating condition or not. Considering the feature that the dimension of near-infrared spectroscopy is much larger than the sample size, the least absolute shrinkage and selection operator is employed to achieve an automatic variable selection procedure of the observed spectral data. Simultaneously, if some faults occur, the least absolute shrinkage and selection operator can be used to locate the spectral region affected by the failure. In such a way, the roots of faults can be tracked according to the change of the wavelength numbers. Performances of the proposed fault detection and isolation approaches are evaluated based on the near-infrared spectroscopy sampled for the crude oil desalting process to show the effectiveness.


Introduction
Due to different reasons, different types of inorganic salts are commonly detected in crude oil, for example, NaCl, CaCl 2 , and MgCl 2 . During the refining process, however, all these chlorides may lead to severe problems such as decreasing the efficiency or the life of catalysts (which is costly), equipment corrosion, pipeline corrosion, fouling, and plugging, reducing the benefits of the petroleum refining industry further [1][2][3]. Consequently, the crude oil desalting process is an indispensable step in the petroleum refining industry. On the other hand, the modern petroleum refining process is highly integrated. Its possible failure may cause an overall plant to shut down and even lead to a catastrophic accident [4][5][6]. Besides, due to the crucial role of desalting, a real-time monitor is necessary. Accordingly, the existing monitoring techniques can be conveniently classified into two main categories: the model-based approaches and the data-driven approaches [7][8][9][10][11][12]. Compared with the model-based methods, the data-driven monitoring is based on analysing the conventionally measured variables of the industrial process, such as temperature, pressure, rate of flow, and level [13][14][15][16], which is more suitable for practices.
With new analytical techniques, near-infrared (NIR) spectroscopy has been widely developed in many fields due to its unique features of noninvasiveness, less pollution, and being suitable for online analysis [17][18][19][20]. Essentially, a vibrational spectroscopy technique allows the probing of overtones and combination bands of the fundamental frequencies, mainly hydrogen bonds, such as -OH, -NH, and -CH [21]. e information presented in NIR spectra can be used to determine the composition of the sample or its physical properties. Generally speaking, NIR spectroscopy provides new perspectives for the analysis and synthesis of industrial processes from the level of molecules. Because of these, therefore, a significant number of works have exploited NIR for process monitoring with several effective results. For example, taking the NIR as an in-site monitor, Santos et al. proposed a method to monitor the coffee roasting process analytically in an online manner [22]. With the same purpose but in different fields, NIR is employed to monitor the simulated moving bed and control the fluidized bed granulation and coasting processes [23]. To monitor the production of 2G ethanol from lignocellulosic sugarcane residues, Pinto et al. combined NIR with partial least squares (PLS) regression to predict glucose and ethanol concentration [24].
Nevertheless, to the best of the knowledge, the application of NIR to monitor the crude oil desalting process has not been detailed in the existing literature yet. To be specific, the oil desalting unit separates the salts from the raw feed by the meaning of solvent addition and extraction equipment. A simple desalting process with the NIR analyser layout is illustrated in Figure 1. e primary and significant task is to monitor the process and make sure the desalted oil transported to the following processing stage is qualified. If the function is abnormal, it is essential to capture the faults earlier and show the reason more accurately to allow the operators to do troubleshooting as soon as possible.
Fault diagnosis techniques are one of the essential methods to ensure the safety of the industrial process. e overall concept of fault diagnosis includes fault detection, fault isolation, and fault analysis. Many different methods of fault diagnosis have been introduced in the literature [25][26][27]. By analysing the current results, we can find that there are a few results of the application of NIR for diagnosing or isolating the process faults. Recently, Sales tried to use the contribution plotting method to remove the disturbance affected by the disturbances [28]. Although this method is well-understandable and easy to follow, the results are vulnerable and easily affected by the smearing phenomenon. To be specific, a fault occurring at a process variable may cause a significant increase of the indexing contributions to fault-free variables [29]. is phenomenon is more evident for NIR. For example, taking the additional water as a fault can change the concentration so that all the spectral regions will change [21]. Consequently, the contribution plots cannot serve as a satisfying index that points out the original root causes of the failures anymore.
Based on the above motivations, the primary task of this paper is to monitor the crude oil desalting process using the NIR measurements. e contributions of this paper are specified as follows: (1) the proposed method can isolate the fault and identify the variables primarily responsible for the failure after the detection; (2) considering the characteristic that the dimension of NIR spectral is highly more significant than the sample size, the minor absolute shrinkage and selection operator (LASSO) is employed to reduce the dimension of spectra by imposing a bound on the L 1 -norm of regression coefficients; and (3) with the molecular spectral data collected by NIR, the least angle regression (LARS) algorithm is explored to find out the spectral region most affected by the failure, showing the causal origins of the fault clearly. e proposed fault detection and isolation approaches to the crude oil desalting process demonstrate the effectiveness and advantages of the theoretical results proposed. e rest of this paper is organized as follows: in Section 2, the fault detection based on principal component analysis (PCA) for the oil desalting process is reviewed. en, the fault isolation method based on LASSO is elaborated in Section 3, where the motivation and advantages are discussed. Section 4 presents the application of the results to the crude oil desalting process. At last, the conclusions are presented in Section 5.

PCA Algorithm for Fault Detection
e PCA method has been widely used for the purpose of fault detection, due to its simple structure and user-friendly features [30][31][32][33][34][35][36]. In this section, we briefly review the PCA method in the framework of fault diagnosis based on the spectra data.
Following the standard step of modelling, the obtained spectra data are split into two parts: the training data set and the testing data set. e training data set is to identify a nominal model in a fault-free environment. Once the faultfree feature of the process is extracted, any abnormal condition will deviate from the normal conditions and show inconsistency.
Collect all the spectra data in a matrix X with dimension n × m, where n denotes the number of samples, and the column number m shows the wavelength values. Basically, m ≫ n. To build a PCA model, each column of X is needed to be normalized to remove mean and trend to have unit variance. en, the PCA model can be formulated based on the normalization data as follows: where the number of principal components (PC) in the model is denoted by R, p i is the loading vector for the corresponding PC, t i is the score vector, and E represents the residual. Omitting the error matrix E, the remaining part is named as the structured results X and is named as the PC model. To show the uncertainties of the PC model, accordingly, the covariance matrix C is commonly computed by the following equation: in which Λ � diag(λ 1 , λ 2 , . . . , λ R ) is a diagonal matrix containing all the eigenvalues λ 1 > λ 2 > · · · > λ R of X, and P � [p 1 , p 2 , . . . , p R ] with PP T � P T P � I being the eigenvectors, where I denotes the identical matrix. To show the uncertainties of each PC, the cumulative percent variance (CPV) index can be used to determine the PC. at is, we can compute it by To reduce the dimension of original data, only a part of PC is selected. Basically, the number of the PC chosen is determined by the principle that how may percentage of the variance would like to be captured. Generally, the threshold number used is 95%. Once the PC are selected, we can use the Hotelling T 2 and SPE statistic as the fault detection indices [37]. e SPE statistic represents the change in the residual space projection of the observed data at the moment k.
e T 2 statistic is a measure of the difference in the primary metric space. It reflects the distance of the magnitude or trend of each data sampling point relative to the origin of its primary metric subspace. In other words, the SPE is used to reflect the possible failures of all variables in the data matrix. In contrast, the T 2 statistic demonstrates the shortcomings of the variables associated with a variable in the data matrix relative to the principal element. Here, the T 2 statistic is computed as follows: where T 2 0 is the target statistic performance index, k the number of PC determined, T � [t 1 , t 2 , . . . , t k ] is the corresponding PC selected, and To operate the fault diagnosis task, the control limit is needed to be determined. For the Hotelling T 2 , since it follows the F-distribution, we have where T 2 is the resulting control limit and F α (k, n − k) represents the F-distribution. Here, k and n − k are the degrees of freedom, and α denotes the degree of confidence. SPE statistic is expressed with the PC of the residual subspace; the expression is written as where Q 0 presents the statistic and P k � [p 1 , p 2 , . . . , p k ] is the resulting PC. According to its property, the control limit for SPE follows the Chi-distribution. erefore, we can obtain the control limit as where Q denotes the control limit, λ is a parameter that shows the vector containing the eigenvalues of the covariance matrix of E, and C β is the standardized normal variable. By using this index, the confidence limit is 1 − β, which has the same sign as h 0 . In simple terms, the training set is used to get the PCA model. en, the statistics (Hotelling T 2 and SPE) are applied to detect deviations from normal behaviour using the control limits obtained from handling the training set. Finally, the test set is processed to calculate the new statistics firstly and then, compared with the control limits, to judge its status. Two statuses for normal and fault are defined as follows: Sometimes, T 2 0 < T 2 and Q 0 > Q (or T 2 0 > T 2 with Q 0 < Q) can also be observed. In these cases, we can also state that a fault is detected. However, in this paper with the consideration of robustness, a fault is detected only when these two performance indexes go beyond the thresholds simultaneously.

Fault Isolation Based on LASSO Algorithm
Once faults have been detected, some changes of NIR spectra definitely happen. If we could lock which specific ranges of the spectra or molecular structure varying accordingly, the fault might be located or isolated subject to the chemical properties of the desalting process and NIR spectrometric knowledge. However, the dimension of the preprocessed NIR spectra data used for the fault detection is high, much larger than the sample size, making it difficult to figure out the spectral region affected by the fault. erefore, a suitable method to deal with this issue needs to be developed [15].

Fault Isolation Using LASSO.
Considering that an observation x can be detected as a fault using the reconstruction method, the corresponding normal value x can be expressed as follows [38]: where ∘ is the Hadamard product (if J � ω ∘ υ, then i th element in J is the product of the i th elements of the original two matrices ω and υ), ω is a direction vector with elements 1 or 0 representing whether the values are influenced by the fault or not, and the elements in υ denote the degree of the corresponding values influenced by the fault. e purpose of this method is to eliminate the effects of the fault by making x closest to the normal region. According to the monitoring statistics of the PCA-based fault detection approach in this paper, the above method can be formulated as an optimization problem which is shown as follows:

Mathematical Problems in Engineering
Since the information reflected by the NIR spectra within a certain band is the same, the dimension can be reduced by incorporating some constraints. erefore, the fault isolation problem can be considered as the selection of subsets. Introducing the L 1 -norm penalty, the optimization problem can be reformulated as where w � ω ∘ υ is a redenoted notation, ‖ · ‖ 1 represents the L 1 -norm, and α is the penalty factor. Consider the following linear regression model: where y is the response matrix, Z is the predictor matrix, ε is the residual matrix, and β contains the unknown regression coefficients to be estimated. e LASSO algorithm introducing the L 1 -norm penalty term into the objective function is expressed as follows [39]: is algorithm can not only compress the regression coefficients but also select the variables automatically. is characteristic makes it possible to solve the issue of high dimension for the NIR spectra. Noticing the form of equation (12) and conducting the Cholesky decomposition onΛ − 1 , it gets is equation can then be transformed into the form as follows: min α (PK) T x − (PK) T w T (PK) T x − (PK) T w + α‖w‖ 1 .

(15)
Comparing equations (14) and (15), LASSO can be applied to solve the fault isolation based on the reconstruction, which is formulated in equation (12) by denoting y � (PK) T x, Z � (PK) T , and β � w.

Algorithm of LASSO.
It is seen that if the penalty factor α is introduced, equation (15) can be easily solved. Nevertheless, different values of α will work out different results of w . Larger values of α will cause fewer nonzero items in w. In other words, with the increasing value of α, fewer bands are treated as the spectral regions affected by the potential fault. If α is too large, some bands affected by the fault may not be able to be isolated. On the contrary, if α is too small, some bands not affected by the fault may be selected.
us, a suitable algorithm of selecting α should be given. According to the research results in [40], there exists a finite sequence: α 0 > α 1 > · · · > α H � 0. For α > α 0 , all values of w α j are shrunk to zero, where w α j is the j-th entry of w calculated by solving the optimization problem in equation (12). In each interval(α h+1 , α h ), the active set Γ(α) α h are named as transition points. In other words, the first transition point can isolate the first value most affected by the fault, the second transition point can isolation the second value most affected by the fault, and the rest can be done in the same manner. Using the LARS algorithm proposed by Efron et al. in [41], the spectral region most affected by the fault can be located. e specific Algorithm of LARS is as follows (Algorithm 1).
Note that the effectiveness of the PCA and LASSO methods has been well-documented [35,42], the effectiveness of the proposed method can be expected, and we will illustrate this point in Figure 2 Figure 1, the NIR spectrometer probe was inserted into the desalted oil output pipe. e NIR spectra were continuously collected during the oil desalting process for 7 days including the fault part. Each spectrum was collected with a resolution of 2.0 cm − 1 . e raw NIR spectra are shown in Figure 3. Preprocessing methods such as standard normal variate (SNV) and the first derivative are applied to eliminate the spectra shift along the process temperature and light source life [43,44]. In this study, the first derivative was used, and the preprocessed spectra are shown in Figure 4.

Fault Detection and Isolation
Results. According to the collected spectra, the first 3 days' spectra data are normal (training set) and the remaining 4 days' spectra data are abnormal including the fault part (test set). e preprocessed spectra data are used to do the fault detection based on the PCA algorithm. One preprocessed spectrum constitutes a row of the matrix; the value loading on each wavelength constitutes the elements of the matrix. e data analysis is performed using the Unscrambler 9.6 (CAMO, Oslo, Norway) and MATLAB 7.5 software (MathWorks, MA, USA). e detection results for different faulty scenarios are shown in Figures 5-8. Specifically, we test the proposed method for the case that a fault occurs at sample � 300 in Figure 5, while an abnormal (not fault) operating condition exists from sample � 300 to sample � 700 in Figure 6. In Figure 7, fault occurs at sample � 300 and disappears at sample � 700. On the contrary, a normal operating condition only happens from sample � 300 to sample � 700 in Figure 8. All these results give us a consistent conclusion that the proposed method can be used with the NIR measurements to give a satisfying monitoring performance for the desalted oil process.
The variables in the active set are isolated as the spectral region affected by the fault End Yes s i = sgn(c~i) Figure 2: LARS algorithm block diagram.
(2) Get the correlation vector c � PΛ − 1 P T (x − w j ) and find the greatest absolute correlation value C � max i � |c i | , where c i is the i-th element in c. en, update the active set as Γ j � Γ j− 1 ∪ i: |c i | � C .
(3) Calculate the equiangular vector u j � S j L j , where S j � (. . . s i z i . . .) i∈Γ j for s i � sgn(c i ), z i is the i-th column in the matrix Z, L j � (1 T j S T j S j ) (− 1/2) (S T j S j ) − 1 1 j , and 1 j is ones vector. (4) Get the step size as θ j � min + i∈Γ com and min + indicates that the minimum is taken over only positive components within each i. (5) Let j � j + 1 and update w j for Zw j � Zw j− 1 + θ j− 1 u j− 1 . (6) Return to Step 2 until (x − w j ) T PΛ − 1 P T (x − w j ) < T 2 which also means that the monitoring statistics fall within the corresponding control limit in Section 2.  Figure 5: e detection results based on the NIR spectra data for the case that a fault happens at sample � 300.   Figure 6: e detection results based on NIR spectra data for the case that an abnormal operating condition exits from sample � 300 to sample � 700.
Furthermore, the moment 300 is the earliest spectra data containing the early information of the fault, as shown in Figure 5. Using this sample data can acquire more accurate results and provide earlier advice for the operators to take appropriate measures. e fault isolation result based on the data (samples � 300) using the algorithm mentioned in Section 3 is obtained. By reducing the value of α, the affected spectral region is selected one by one. Compared with the wavelength number of preprocessed NIR spectra, the first band isolated in the black area ranges from 1711 to 1757 nm (5845 − 5692 cm − 1 ), and this region is the first overtone of C-H stretching. e second band isolated in the red area ranges from 1408 to 1453 nm (7102 − 6882 cm − 1 ), and this region is the first overtone of O-H stretching. e significant changes in O-H stretching illustrate that a water-related event may cause the process issues. After consulting with process reliability engineers, it was found that the incident happened due to the corrosion of a separator heated by hot water. e leakage of water into the oil stream caused extremely high water concentration which may bring huge increase of chlorides and damage the process equipment quickly. is result of fault isolation based on LASSO algorithm shows the effectiveness for the oil desalting process and similar processes.

Conclusions
e potential of NIR associates with the LASSO algorithm to establish monitoring schemes for the crude oil desalting process has been investigated in this paper. e fault detection method based on PCA is adopted to monitor the process. en, the LASSO algorithm is utilized to find the spectral region affected by the disturbances to provide some advice for the root cause diagnosis. With the help of the LARS algorithm, the spectral area affected by the fault from primarily to less is located to work out the problem of fault isolating accurately. Because the changes in the molecular level can be identified earlier than the physical appearances on the process, NIR spectra-based monitoring has the advantage of more sensitivity to early failure, which allows the operators to capture the faults earlier and deal with the problem with enough time. In our future work, the adaptive LASSO algorithm will improve the fault isolation performance by applying adaptive weights to L 1 -norm penalty.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.  Figure 7: e detection results based on NIR spectra data for the case that a fault occurs from sample � 300 to sample � 700.  Figure 8: e detection results based on NIR spectra data for the case that faults occur from sample � 0 to sample � 300 and from sample � 700 to sample � 100.
Mathematical Problems in Engineering 7