Recognition Algorithm of Acoustic Emission Signals Based on Conditional Random Field Model in Storage Tank Floor Inspection Using Inner Detector

Acoustic emission (AE) technique is often used to detect inaccessible area of large storage tank floor with AE sensors placed outside the tank. For tanks with fixed roofs, the drop-back signals caused by condensation mix with corrosion signals from the tank floor and interfere with the online AE inspection. The drop-back signals are very difficult to filter out using conventional methods. To solve this problem, a novel AE inner detector, which works inside the storage tank, is adopted and a pattern recognition algorithm based on CRF (Conditional Random Field) model is presented. The algorithm is applied to differentiate the corrosion signals from interference signals, especially drop-back signals caused by condensation. Q235 steel corrosion signals and drop-signals were collected both in laboratory and in field site, and seven typical AE features based on hits and frequency are extracted and selected by mRMR (Minimum Redundancy Maximum Relevance) for pattern recognition. To validate the effectiveness of the proposed algorithm, the recognition result of CRF model was compared with BP (Back Propagation), SVM (Support Vector Machine), and HMM (Hidden Markov Model). The results show that training speed, accuracy, and ROC (Receiver Operating Characteristic) results of the CRF model outperform other methods.


Introduction
Acoustic emission (AE) is a beneficial method to test the corrosion of the floor without opening the storage tank [1][2][3][4][5].In conventional online tank floor tests [6], sensors are fixed by magnets outside the tank wall to collect signals.However, the AE test is susceptible to outside intervention, such as sand collision and external vibration.To solve this problem, a newly invented AE detection equipment is adopted in tank floor inspection and it could work inside the tank to collect the AE signals to avoid external disturbance [7].Meanwhile, the condition of acoustic field inside storage tank is complicated.The characteristics of many noise signals inside tank are quite similar to the corrosion signals of tank floor, which would seriously influence the result of the evaluation of tank floor.
For those tanks with fixed-roofs, warm gas in the tank condenses to droplets when it meets the cold roof.The droplets fall down from the roof to the water/oil surface and generate interference AE signals [8].The interference signals caused by the droplets should be filtered out to secure the accuracy of corrosion source location and the efficiency of the tank floor evaluation.Guard sensors are usually employed with the aim of shielding droplets noise signals during AE test for tank bottom.However, the space inside the inner AE detector is small and the hardware system of guard sensors is complicated.So it is not suitable for inner AE detector to use guard sensors.For this reason, a specific pattern recognition algorithm is proposed to filter out the interference signals.
Pattern recognition is often applied to identify AE signals caused by different sources.In 2008, Riahi et al. [9] used an artificial neural network system to differentiate between 2

State sequence
Observation sequence leakage and corrosion signals in AE testing of aboveground storage tank floors.Zhang et al. [10] proposed a method to detect the leakage of the gas pipeline valve by using AE technique and SVM (Support Vector Machine) was applied to recognize the leak level of the valve accurately.And in the field of tool wear monitoring, Zhu et al. [11], Varma and Baras [12], Zhang et al. [13], and Chen et al. [14] both used HMM (Hidden Markov Model) to recognize the different tool wear states.
In this study, an algorithm based on CRF (Conditional Random Field) model is proposed to differentiate drop-back noise from corrosion AE signals.Seven typical AE parameters, such as amplitude, counts, duration time, rise time, true energy, average frequency, and peak frequency, are extracted to create the classifier model by CRF, BP (Back Propagation), SVM, and HMM.The result showed that CRF model is better than the other three models in training speed, accuracy, and ROC (Receiver Operating Characteristic) results.
This paper is organized as follows.Section 2 introduces the basic principles of CRF model.The experimental setup and procedure are illustrated and feature extraction method for the AE signals is presented in Section 3. Section 4 shows the establishment procedure of CRF model creating and the results, which are obtained by the comparison between CRF and other three classifiers.Section 5 presents results of application of CRF model in the field experiment.And the summary of the paper is given in Section 6.

CRF Model
CRF model is a typical discriminant model which was proposed by Lafferty et al. in 2001 [15].A CRF may be viewed as an undirected graphical model, or Markov random field, which defines a single log-linear distribution over output variable sequences given a particular input random variable [16].
Linear chain conditional random field (LC-CRF), shown in Figure 1, is one of the most commonly used forms of the CRF model.The input random variable  = { 1 ,  2 ,  3 , . . .,   } and the output random variable  = { 1 ,  2 ,  3 , . . .,   } denote the observation sequence and the state sequence, respectively.If the conditional probability of  given  is known,  tends to satisfy the maximum global conditional probability  * ; that is, In this model, for the observation data , the probability of the state sequence  can be represented as (  , , )) .
(2) () is a normalization factor which can be described as where For the AE testing on tank floors, the features extracted from the AE signals can be viewed as the observation sequence and the signal types can be viewed as the state sequence.Then, the CRF model can be created and the signals can be classified.

Experimental Preparation and Feature Extraction
3.1.Experimental Setup.The experimental system consists of a water tank, the inner AE detector, and a specimen for corrosion experiments, shown in Figure 2. The water tank in Figure 3, with the dimension of 1.4 m × 1.4 m × 1.5 m (length × width × height), is used to simulate a storage tank in the laboratory.The inner AE detector, which is utilized to collect AE signals, includes AE sensors, the amplifier, the data acquisition system, and batteries (see Figure 4(a)) [17].The detector can actuate itself to get close to the tank floor and collect AE signals, so it could weaken the interference caused by external disturbance and improve the signal-to-noise ratio (SNR) compared with the conventional AE testing method on tank floors.Four AE sensors are mounted in the holes on the bottom of the detector to collect AE signals.And the data acquisition system including processing circuits, the AD sampling card, and the PC104 computer is placed inside the shell to sample and save the collected signals.
The specimen, shown in Figure 4(b), is corroded by acid to simulate the corrosion in tank floor.The material of the specimen is the Q235 carbon steel sheet, which is identical with the material of the storage tank floors.The specimen is machined by the dimension of 180 mm × 180 mm × 5 mm (length × width × thickness) with the surface roughness of 0.02 mm.A round, hollow vessel with the inner diameter of 50 mm is fixed on the specimen by epoxy.The surface of the specimen was grinded by abrasive papers through 400-grade to 2000-grade, rinsed with acetone, degreased with deionized  water, and dried in air.Before the experiments, the acid would be poured into the vessel and sealed with a lid wrapped with a matching ribbon.

Collection of Corrosion Signals.
To collect the corrosion signals, 5 mol/L H 3 PO 4 was used as the test solution to react with the specimen to simulate the corrosion in tanks.R15 piezoelectric AE sensors produced by Physical Acoustics Corporation (PAC), with operating frequency range of 50-400 kHz, were used in the experiment.The gain of the charge preamplifier is set to 60 dB, and the cut-off frequencies of the analog band pass filter are 100 kHz and 400 kHz [18,19].The sampling rate is 3 MHz and the sampling precision is 10-bit.During the experiment, the threshold level was fixed at 35 mV, which was slightly above the previously measured background noise.
A series of experiments were conducted in the laboratory.The specimen, which was handled in terms of the procedures mentioned before, was placed on the tank floor with a distance of 15 cm under the inner detector in the water tank and corrosion signals were collected for about 1 hour.

Collection of Drop-Back Signals.
The field experiment was conducted in a new fire-resistant water tank in good condition.The diameter of the tank is 6 m and the height is 10 m.The experiment preferences were the same as that in the lab test.The temperature during the experiment in the tank was 23 ∘ C, while the outside temperature was −15 ∘ C. The drop-back signals were rich due to the difference between the warm gas in the tank and the fixed cold roof of the tank.After measuring the environment noise level, the threshold is set higher than the background noise.During the experiment, drop-back signals were collected without the eroded specimens (see Figures 5 and 6).mRMR is a new method to select good features proposed by Peng et al. [22].It is going to find out the features with the highest relevance to the target class while still having  ( The operator Φ(, ) is defined to combine  and  and consider the following simplest form to optimize  and : max Φ (, ) , Φ(, ) =  − .
And the result of the formula ( 6) is called Mutual Information Difference (MID) and it is used to rank features.In practice, seven features are ranked by mRMR as follows: F7, F4, F1, F3, F2, F6, F5.The first 4 features (peak frequency, rise time, amplitude, and duration time) are determined as the optimal feature set to train and recognize the samples.

Classification Results and Discussions
In this section, the classifier models of CRF, BP, SVM, and HMM are adopted to recognize the corrosion signals from interference signals based on the extracted features, respectively.And the results of the different models are compared and discussed.

Establishment of CRF Classification Model.
In the LC-CRF model, the feature vectors of input sequences are considered to be positive integers.And thus, the extracted features are normalized to 1∼101 and used as the observation sequences.The state of the sample for corrosions and dropback interferences is labeled as 1 and 2, respectively.The application of the LC-CRF model includes two steps: training and recognition.The features of the training samples are used to calculate the model parameters (, ).The conditional probability model is obtained by means of the maximum likelihood estimation, while the limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is used to get the optimal parameters for the model.Here, the initial model parameters are set to 0 and the convergent accuracy limit is 0.0001.During recognition, features of the test samples are taken as input variables while the state sequences for the model, obtained in the first step, are calculated by the Viterbi algorithm.The method can be summarized by the flowchart illustrated in Figure 7.

Establishment of BP, SVM, and HMM.
As stated, BP, SVM, and HMM are commonly used methods for classification of AE signals.In order to compare with the recognition results, the three classifiers use the same training and test data as LC-CRF.The models of BP, SVM, and HMM are established as follows.
BP is a common method of training artificial neural networks.The structure of a typical BP classifier is shown in Figure 8.To design a BP model, these parameters should be determined: function of the output layer, function of the hidden layer, training rate, and the number of hidden layer nodes.In the tests, "" function is selected as the activation function of the hidden layer, and "" function is chosen as the transfer function of the output layer.And the learning values are tuned by a gradient descent manner.The number of hidden layer nodes, set as 14 in this test, was twice the number of input nodes.In addition, learning rate is 0.01.
The SVM uses the central concept named kernel for a number of tasks.Kernel machines provide a modular framework that can be adapted to different tasks and domains by using different kernel function and base algorithm.The structure of SVM is shown in Figure 9, where  is the kernel function.Three parameters need to be determined to design a SVM model: the kernel function, the cost (), and the gamma ().In this paper, the classical RBF kernel function is chosen, where kernel parameters  and  could be determined by fivefold cross-validation methodology.In that case, the optimal solution  is 2, and  is 22.627.HMM is composed of Markov chain and stochastic process.The Markov chain corresponds to the state sequence, which is described by  and .The stochastic process is depicted as the observation sequence, which is described by .So a HMM model can be described as  = (, , , , ) , (7) where  is the state number of the Markov chain and  is the possible number of observed value in each state. is state transfer probability matrix with sizes  ×  and  is the probability matrix of the observed values whose sizes are equal to  × . is the initial probability distribution vector with length .So  and  must be confirmed to establish a HMM classifier.The value of  and  is decided as 6 and 8, respectively, using the trial and error method.The model parameters are calculated by Baum-Welch algorithm, while the convergent accuracy limit is 0.0001.

Results and Discussions.
To validate that if the first 4 features are the optimal feature set, samples are trained and tested by CRF model using the first 3, the first 5, and all seven features, respectively.The accuracy rates are shown in Table 3.Using the first 4 and the first 5 features can make an accuracy rate of 100%, higher than the other two feature sets.So the first 4 features (peak frequency, rise time, amplitude, and duration time) are selected as the optimal feature set to train and recognize the samples.Using same training and test set, the recognition results of CRF model and the other three algorithms (BP, SVM, and HMM) are compared based on a PC (Core 2 Duo E6300 with 3.2 G memory), respectively.The results are compared in the training time, the accuracy, and the ROC (Receiver Operating Characteristic) curve.The maximum training time and accuracy rate are shown in Table 4.
It shows that the accuracy rate of the CRF is higher than BP, SVM, and HMM model and the training time of CRF is the shortest.Gradient descent algorithm is utilized to adjust the parameters of BP, so it needs to iterate to get the optimal parameters.Moreover, the selection of the maximum  iteration, learning rate, and number of the hidden layer nodes are often determined by experience or method of trial and error.So the training speed and the accuracy rate of BP are lower and it is difficult to get the optimum network.The training speed and accuracy of SVM are higher than BP and HMM but it is more suitable for the situation of small sample data rather than AE testing field, which is a large sample data situation.HMM is widely employed in many fields, but one of the disadvantages of HMM is that this model assumes that the observation value at one point is only dependent on the state of Markov chain at this time and the observation sequences are independent of each other, while the features of

Application of CRF Model in Field Experiments
In In Figure 12, the corrosion signals were collected in the laboratory and the quantity of AE hits (corrosion signals) varies with time.It is observed that the corrosion process can be divided into 4 zones.At the beginning (Zone 1), the phosphoric acid began to react with the steel plate.Because of the large contact area and high hydrogen ion concentration, the quantity of AE hits increases fast.Then, the hydrogen created during the reaction accumulated on the surface of the plate and formed bubbles so that the contact area was decreased (Zone 2).As the reaction progressed, the bubbles converged into large bubbles and then burst out.The acid was fully contacted with the steel plate again and the reaction rate and the growth rate of AE hits increased dramatically (Zone 3).While the concentration of hydrogen ion fell, the acid reacted with the steel plate slower than before and the quantity of AE hits grew slowly (Zone 4).
Figure 13(a) shows the relation between the quantity of AE hits and time before being classified by CRF model and it is almost linear.It does not reflect the statistical law of the corrosion tests.also has 4 zones with same characteristics in Figure 12.There is a subtle difference between turning point of the zones on time axis in Figure 13(b) and that in Figure 12 because the setup time for inner detector to start collecting signals in the field test was a litter longer than in the laboratory test.The result shows that the data processed using CRF model could reflect the statistical law of the corrosion test and the CRF model performs well in the field test application.

Figure 1 :
Figure 1: Structure of linear chain conditional random field.

Figure 2 :
Figure 2: Diagram of the experimental system.

Figure 3 :
Figure 3: The water tank in the laboratory.

3. 2 . 3 .
Collection of Mixed Signals.After collecting the dropback signals, in the field water tank, the eroded specimen was placed at the same position with the lab test.And both the corrosion signals and the interference signals were acquired.During the experiment of one hour, 7475 groups of AE signals were collected for further analysis.

3. 3 .
AE Feature Extraction and Sample Set.The feature of AE parameters represents characteristics of the corrosion signals, and seven typical feature parameters of AE signals are extracted to build the classification model [20, 21].The features consists of five hit based features, one comprehensive feature and one frequency feature, shown in Table 1.In order to realize the classification by pattern recognition, 260 groups of corrosion signals and 260 groups of dropback interference signals were selected as samples to establish the classification model.The signals were randomly divided into 2 sets.200 groups signals were used as the training set while the other 60 groups were used as test set, respectively.The formation of the training set and the test set are listed in Table 2 .

Figure 5 :
Figure 5: The fire-resistant water tank in field test.

Figure 6 :
Figure 6: Testing in the field.
∈  (  , ) , min  () ,  = 1 || 2 ∑   ,  ∈  (  ,   ) , (4) where  is the initial feature set and  is the target class set, (  , ) is the mutual information of feature   and class ,  is the mean value of all mutual information values between individual feature   and class , (  ,   ) is the mutual information of features   and   , and  means the mutual information between different features.Mutual information is defined in terms of their probabilistic density functions, given two random variables  and :  (, ) = ∬  (, ) log  (, )  ()  ()  .

Figure 7 : 2 Figure 8 :
Figure 7: Flowchart of the identification process based on CRF.

Figure 10 :
Figure 10: The ROC curve of the recognition results for corrosion signals.

Figure 11 :
Figure 11: The ROC curve of the recognition results for drop-back interferences.

6 Figure 12 :
Figure 12: The relation curve of AE hits varying with time in laboratory test.

Figure 13 :
Figure 13: The relation curve of AE hits varying with time in field test.(a) Signals unclassified by CRF.(b) Signals classified by CRF.

Drop-back signals
which are caused by condensation in storage tanks with fixed roofs are a big problem in AE online storage tank floor inspection.In this paper, a new inner AE detector and a recognition algorithm based on CRF model are applied to differentiate corrosion AE signals from drop-back interferences.AE parameters, amplitude, counts, duration time, rise time, true energy, average frequency, and peak frequency, were selected as feature parameters for recognition.Experiments were carried out in water tanks both in laboratory and in the field to collect corrosion AE signals and drop-back signals.The recognition results of CRF are compared with other 3 models of BP, SVM, and HMM.The comparisons of the accuracy, training speed, and the AUC of the ROC curve show that the CRF outperforms the other three models for the recognition of corrosion signals and drop-back interference signals.
( −1 ,   , , ) is a transition feature function of the entire observation sequence and the states at positions  and  − 1 in the state sequence;   (  , , ) is a state feature function of the state at position  and the observation sequence;   and   , which needed to be estimated from training data, denote the weight values of the transition feature function and the state feature function respectively.

Table 1 :
Seven typical characteristic parameters of corrosion AE signals.

Table 2 :
Constitution of training set and test set.

Table 3 :
Classification results using different features.
the last step of field experiment, the inner detector collected both corrosion signals and drop-back noise.During the experiment of one hour, 7475 groups of AE signals were collected and classified using CRF model.1105groups were classified as corrosion signals and the other 6370 groups were identified as drop-back interferences.And the quantity of corrosion signals is approximately equal to the quantity of corrosion signals collected in the laboratory in the same duration.To test the effect of CRF model in the field environment, statistical analysis method was used to compare the results.The relation curves, in which the cumulative quantity of signals varies with time, were obtained and showed in Figures 12 and 13.