A Pinning Actor-Critic Structure-Based Algorithm for Sizing Complex-Shaped Depth Profiles in MFL Inspection with High Degree of Freedom

One of the most efficient nondestructive methods for pipeline in-line inspection is magnetic flux leakage (MFL) inspection. Estimating the size of the defect fromMFL signal is one of the key problems of MFL inspection. As the inspection signal is usually contaminated by noise, sizing the defect is an ill-posed inverse problem, especially when sizing the depth as a complex shape. An actor-critic structure-based algorithm is proposed in this paper for sizing complex depth profiles. By learning with more information from the depth profile without knowing the corresponding MFL signal, the algorithm proposed saves computational costs and is robust. A pinning strategy is embedded in the reconstruction process, which highly reduces the dimension of action space. (e pinning actor-critic structure (PACS) helps to make the reward for critic network more efficient when reconstructing the depth profiles with high degrees of freedom. A nonlinear FEM model is used to test the effectiveness of algorithm proposed under 20 dB noise. (e results show that the algorithm reconstructs the depth profile of defects with good accuracy and is robust against noise.


Introduction
Magnetic flux leakage (MFL) is one of the most widely used NDT techniques, which has been widely used for inspection of oil and gas pipeline since the 1960s. It is efficient in finding defects caused by corrosion and mechanical damage and other metal loss defects for pipelines and storage tanks [1][2][3]. It is helpful to give the health condition of the working facilities to the operators, which prevents disasters to environment, industry, and human being due to the leakage of explosive or dangerous chemicals. Estimating the shape of defect is the key problem of inspection.
ough MFL is efficient in finding the defects and anomalies, the reconstruction process from inspection signal to defect depth is not an easy task, as it is usually contaminated by sampling noise [4]. Reconstruction results with more details such as the detailed shape of the defect rather than only length, width, and depth are more helpful to estimate the health condition of the tested material [5]. Among length, width, and depth, depth reconstruction is the most challenging part as it is highly ill-posed. Unfortunately, reconstructing the defect shape with details makes the ill-posed inverse problem even harder to be solved. e solutions of MFL inverse problem could be classified as either non-model-based methods or model-based methods. Non-model-based methods solve this inverse problem by building a mapping between sampled signal and the shape of defect. Neural networks are usually used to build this mapping [6][7][8][9][10]. e input of this neural network can be the signal of MFL inspection, and the sizing information of the defect is set as the output. ese methods are fast but highly rely on the data set used to train the neural network. e accuracy is highly impacted by the quality of training data set. A forward physical model is involved in the model-based methods. e forward model is used to give simulated signal to a given depth profile. e simulated signal is used for comparison with the reference signal. e residual error between simulated signal and reference signal is used to give information for the iteration strategy. By minimizing the residual error, the size of the defect is repetitively computed [11][12][13][14][15][16]. Numerical models and analytical models are two categories of methods involved as forward model of MFL. Analytical models are fast but have more limitations as the model is derived with many simplifications, making it less accurate [17,18]. Numerical model provides accurate results, but it is computationally very costly especially when a fine model is needed. e design of the iteration policy for numerical model is another problem that is hard to design. Classic methods design the policy with gradient information to minimize the residual error [19,20]. ese methods usually have some limitations to assume the shape of the defect a priori. Another kind of solution uses a mapping which is trained to replace the numerical forward model. A novel iterative method of inversion using adaptive wavelets and radial basis function neural network are proposed in [5]. A RBF neural network is used as a forward model in [21]. Heuristic methods are the third kind of solution of designing the iteration policy. Han et al. proposed a particle swarm optimization method to solve this problem [22]. Li et al. proposed a modified harmony search algorithm as the iteration policy [23]. As these heuristic methods are not deterministic, they usually need a vast amount of forward model evaluations.
Considering the state-of-the-art solutions, there are still some common problems in solving the problem of sizing the defects. First, for the non-model-based method, the mapping is trained according to the data without exploration to data not included in the training set. It makes the mapping highly rely on the distribution of the training data set. As the MFL inverse problem is ill-posed, the mapping from signal to defect profiles can also be troubled by the nonuniqueness of the mapping. Second, for the model-based method, the iteration strategy is designed based on the forward model in use and highly relies on it. For numerical model, it has high performance in simulating the inspection signal, but it is hard to build an iteration strategy based on it.
ough the RL algorithm is basically a machine learning technique which needs training, the general structure has similarity with the classic iteration method which makes it possible to design an iteration strategy for numerical forward physics model. An actor-critic structure is adopted to design the iteration strategy. e actor network gives the iteration strategy, and the performance is evaluated by the critic network which improves the strategy given by actor network in the coming steps. For the problem with high dimensions in its action space, the "reward" which is used to improve the performance by critic network does not perform as efficient as it does with the problem of lower action space dimensions [30][31][32]. A pinning-based strategy is given in this paper to reduce the dimension of action space, which helps to make the critic network more efficient. e principle of actor-critic based structure is introduced in Section 2 along with the principle of MFL inspection. e detail of PACS algorithm is described in Section 3. Simulated inspection signal from a nonlinear FEM model is used to test the performance of the algorithm proposed under 20 dB noise in Section 4. e conclusion is drawn in Section 5.

Physics Model.
e principle of MFL inspection is based on electromagnetic theory. By magnetizing the test material into saturation, a magnetic flux leakage can be detected by Hall-effect sensors where a defect is located. Strong permanent magnets are usually used to magnetize the testing material.
e Hall-effect sensors are usually located close to the surface of the tested material. e magnetizing and sensing principles are illustrated in Figure 1.
e principle of MFL inspection is magnetic, where Maxwell's equation can be used to describe its behavior: where μ, J, and A represent the permeability of the media, the source current density, and the magnetic vector potential. In (1), μ is usually not a constant due to the property of the material and can be described as a function of magnetic flux density B as μ � μ(B). e magnetic flux density which can be collected with the Hall-effect sensors is B � ∇ × A. For a simple defect, the magnetic flux density B is illustrated in Figure 2. e x-axis component is usually sampled as the inspection signal. ere are two ways to solve this Maxwell's equation formulated as (1). Numerical solution such as finite element method (FEM) is widely used to solve this partial differential equation. Another kind of solution is dipole model, which makes some simplification of the forward model, which gives an analytical solution to this forward model [19,33,34].
As the analytical solution cannot provide enough accuracy for complex defects, numerical methods are usually used to get a numerical solution for these problems. FEM is a widely used method to get a numerical solution for partial differential equation. e general process of a FEM solution is as follows. First, the partial differential equation is transformed into corresponding variational functional equation. en, the domain that needs to be computed is divided into certain number of finite elements. By assembling all the variational functional equations of all the elements within the domain, the solution can be obtained by solving where a represents the nodal solutions of elements of discrete approximation in the form of a vector, K is the sparse element stiffness matrix, and f is the source vector containing the boundary conditions and model inputs.

Complexity
Commonly, the MFL inspection model is built with components as described in Figure 1. As the senor position is fixed with the magnetizing components, it needs to rebuild the defect location n times if a sequence of data with n points are sampled. It means to repeat the forward model n times, which is computationally very costly. A simplified model is adopted in this paper, which is proposed in [14]. e simplified model is illustrated in Figure 3. As the principle of MFL inspection is magnetizing the material into saturation, a pair of paralleled current layers are adopted to magnetize the testing material. e commonly used permanent magnets and yokes are removed. By adopting this model, the principle and nonlinear character are kept. A sampled signal with n sampling points only needs to run the forward model once, which saves a lot computationally. e region of interest (ROI) shown in Figure 3 represents the domain where the defect is going to be reconstructed by the algorithm proposed. e depth profile within ROI consists of several subdefects, which make up a complex depth profile.

Principle of RL.
Reinforcement learning considers the paradigm of an agent interacting with its environment aiming to learn a behavior which maximizes the reward. e agent consists of an actor network and a critic network. e actor network is trained to decide which action should be taken at current state. e critic network evaluates each action based on its current state with reward and improves the strategy of the actor network. ere are four definitions in RL, state x t , reward r t , action a t , and environment E. An agent takes action a t at current state x t , where t is discrete time-step. e action a t interacts with certain environment and obtains a new state x t+1 with a reward r t , which evaluates the performance of this action. e action is defined by a deterministic policy π: S ⟶ A(A) which is a mapping from states to actions. S represents the state space, and A represents the action space. A discounted sum of future rewards is called a return as It is also called Q-function, which represents the expected return after taking an action a t in state s t and thereafter following the policy π. e critic is updated by minimizing the loss L as follows: where e actor policy is updated using sampled gradient as follows: In the problem of sizing the depth profile of MFL inspection, four parameters are involved in the reconstruction process, the reference depth profile d ref , reconstructed depth profile d t at time-step t, reference signal s ref , and signal of reconstructed depth profile s t at time-step t. Reference depth profile is the target of the reconstruction, which is not observable during the entire reconstruction process. e other three parameters are fully observable all the time. During training of non-model-based method, the reference depth profile and corresponding reference signal are used to train the mapping. e model-based method only involves the reference signal during the reconstruction process. In this paper, three parameters, reference depth profile, reference signal, and reconstructed depth profile, are used to train the actor-critic structure-based algorithm proposed. e involvement of reconstructed depth profile gives more information of the depth profile space. e signal of reconstructed signal is not utilized as it is computationally costly.  Complexity 3 is paper is inspired by the similarity between the training process of RL and model-based iteration method. e similarity is illustrated as shown in Figure 4. For the iteration structure, it starts with an initial defect depth profile. According to the iteration strategy, a reconstructed depth profile is given. e signal of corresponding depth profile is generated with the forward model. By comparing the signal of reconstructed depth profile with reference signal, a residual is obtained. is process iterates until the residual is smaller than a threshold when a final reconstructed depth profile is obtained. e learning process of the actor-critic structure-based RL method proposed in this paper has similarities with the model-based solution mentioned above. e state can be that the depth profile needs to be reconstructed in a certain form. e agent is the strategy controlling the iteration process. Action is the output of the agent, which controls how to change the state until termination criteria are satisfied. e strategy that controls the iteration process is learned from the data given and generated during the iteration process. It solves the problem that it is hard to design iteration strategy for numerical forward models. e performance of the strategy is evaluated by the critic network with rewards from each step. By involving the data generated during the iteration process, more data is given, which means the training data is not limited to the given data pairs of depth profile and corresponding signal. It improves the problem that the non-model-based solution highly relies on the distribution of the training data.

Algorithm
In this paper, an actor-critic structure-based RL method for complex depth profile reconstruction is proposed. e algorithm of Deep Deterministic Policy Gradients (DDPG) is adopted to train the actor-critic structure [35][36][37]. e definitions of parameters for the problem of sizing the depth profile of MFL inspection are described as follows. e state is defined as s t � (d t , s ref ), which consists of two parts, the normalized reconstructed depth profile and the reference signal. d t is the normalized reconstructed depth at time-step t. s ref is the sampled reference signal. As the signal outside the ROI has less characteristics than signal within the domain of ROI, most of the sampling points are selectively removed to reduce the state dimension. e change of d t at each time-step is taken as action. Different from the modelbased method that uses the residual between reference signal and signal of reconstructed depth profile to evaluate the performance, the performance of the actor network is evaluated with reward at each time-step. Reward is designed as minus value of Euclidean distance between reference depth profile and reconstructed depth profile as shown in the following equation: As the target of this MFL inverse problem is sizing the depth profile as precisely as possible, it means the subdefect d(i) needs to approach its corresponding reference subdefect with small error. en, the complexity is associated with the dimension of the degree of freedom of this inverse problem. e problem when encountering a high-dimensional degree of freedom comes that the reward that evaluates the performance of the actor network becomes less efficient. It is because the measurement of distance becomes less efficient in high-dimensional problems [30][31][32]. In this paper, instead of giving each subdefect an action to control its depth, limited subdefects are selected to accept action given by the actor network. Subdefects accepting control from actor network are called pinning subdefects in this paper. As the defects are usually caused by corrosion or mechanical damage, the difference between adjoining subdefects is usually not sharp. It means that it is possible to use some pinning subdefects, which are controlled by the action to represent the depth profile. Subdefects between two pinning subdefects are interpolated. By adopting this pinning strategy, the dimension of action space is reduced significantly. e reward is still calculated using the full information of depth profile. As the dimension of action space is reduced, the measurement of distance becomes more efficient than using full space. e entire depth profile with pinning subdefects is illustrated in Figure 5. e flowchart of PACS learning process within one episode is illustrated in Figure 6. e entire algorithm including the pinning strategy with the learning process of the actor-critic structure-based reconstruction algorithm is given in Algorithm 1.
From Algorithm 1, it can be seen that, within one episode, s ref as part of the states is combined with many depth profiles d t generated during the iteration process. It means that, despite the relationship between ultimate reconstructed depth profile and reference signal, the function in the depth profile space is also explored by PACS proposed in this paper. It helps to have better

Complexity
Pinning subdefects Interpolated subdefects Generate an action a t � μ(s t |θ π ) + N t from the output of actor network and exploration noise process (vii) Execute action a t , obtain new depth of pinning sub-defects d p t

(viii)
Interpolate to get the full depth profile d t within the ROI, calculate reward r t and new state s t+1 If capacity of replay buffer R is full then (xi) Randomly sample N piece of data (s t , a t , r t , s t+1 ) from R (xii) Update the critic network and actor network with (5)

Model and Error Definitions.
To test the accuracy of the algorithm proposed along with robustness, a simplified nonlinear numerical forward model is adopted as in [14]. e detail of the forward model is illustrated in Figure 3 and the dimension of the model can be found in Figure 7. ere are 49 subdefects within the ROI. e adjoining subdefects stay tight and the span between centers of adjoining two subdefects is 2 mm. 11 subdefects are selected as pinning subdefects. e position of each pinning subdefect is shown in Figure 5. Subdefects between two pinning subdefects are interpolated with cubic interpolation. e x-component of the signal sampled with lift-off value of 1 mm above the surface is adopted as reference signal. e current density carried in the paralleled layer is 40000kA/m 2 with opposite direction. e material is set as 1010 cold rolled steel. e property of the material including the B-H curve can be found in [14].
In order to test the effectiveness of the algorithm proposed in this paper, three error measurements are given. ese measurements are root mean squared error (RMSE), peak depth error (PDE), and maximum deviation (MD). ese measurements are described in (7)- (9) and illustrated in Figure 8. d ref (i) and d(i) are the ith depths of subdefect for reference depth profile and reconstructed depth profile, respectively. From (7)- (9) and illustration in Figure 8, the error definitions can be understood in an easy way. MSE is commonly used in error measurements. PDE is the error between the maximum depths of reconstructed depth profile and reference depth profile. e subdefects from the reconstructed subdefect and the reference subdefect may not come from the same location within ROI. e value of 0.1 with regard to PDE means 1 mm error between peak depths if the wall thickness is 10 mm. MD is the maximum error between reconstructed subdefect and reference subdefect with the same location of the subdefects. e subdefects used to calculate MD value may not be the maximum depth of neither reconstructed depth profile nor reference depth profile. e value of 0.1 with regard to MD means 1 mm error if the wall thickness is 10 mm.

Computing
Results. e structure of actor network and critic network of PACS algorithm is described as follows.
ere are 82 neutrons in the input layer of actor network: 11 neutrons for the pinning subdefect normalized depth vector and 71 neutrons for the normalized reference signal. e sampling position of signal used as part of the state is illustrated in Figure 9. In the first hidden layer, there are 128 neutrons, and in the second hidden layer, there are 80 neutrons. 11 neutrons are set as the output layer, which controls the action for pinning subdefects. Besides, the activation function of output layer is set as "tanh", and all the activation functions are set as "ReLU." e input layer of critic network has two separated parts: one is 82 neutrons as the input layer of actor network and the other is 11 neutrons for the action. ere are 128 neutrons, respectively, in the first hidden layer of critic network connecting the corresponding parts. e number of neutrons for the second and third hidden layer is 50, respectively. e output of critic network is Q value with one neutron. e number of episodes M is 5000 with 200 time-steps for each episode as T. e size of replay buffer R is 1000000. e number of pieces of of sampling data at each time-step N is 128. e stop criterion ε is 0.5. e soft updating parameter τ is 0.01. e discounting factor c is 0.1.
ere are 10000 randomly generated complex defects with corresponding sampled signal generated with COM-SOL Multiphysics 5.3a with MATLAB. 5000 of the pieces of data are used as training data set and 5000 others as testing data set. e algorithm proposed is coded with Python and TensorFlow 1.15. All the data and algorithm are run on a laptop with Intel i7 10750H processor and 16 GB RAM.
From error definition (7)-(9), the reconstructed results can be shown in different aspects. e results are shown with selected reconstruction results from the different ranking with MD value. e results of the 10%, 30%, 70%, and 90% are shown in Figure 10. e MD value is sorted from the smallest to the largest, which means Figure 10 6 Complexity plotted in Figure 10. e corresponding values with ranking of each error definition are listed at the bottom of each subfigure. From Figure 10, it can be seen that all the reconstructed results follow the depth profile well. e worst result from PDE is in Figure 10(a) with the PDE ranking of 80.7% and value of -0.0786. e worst result from RMSE is in Figure 10(d) with the RMSE ranking of 87.2% and value of 0.1461. ese are relatively small errors, proving the accuracy of the algorithm proposed in this paper. From the results in Figure 10, it can also be seen that the reconstructed results from signal of 20 dB noise are close to the results reconstructed from noise-free signal, which means the algorithm proposed in this paper is robust against noise. e signals for corresponding reconstructed depth profile in Figure 10 are shown in Figure 11. e noise-free signal and signal with 20 dB noise are plotted in different color. e values of three error measurements are plotted in Figure 12 for the first 20 times-steps of reconstruction. ey are results generated from reconstructing process of results in Figure 10 with noise signal of 20 dB. It can be seen from Figure 12 that all the error measurements converge in less than 10 time-steps. e results in Figure 12 show that the algorithm proposed in this paper converges fast with limited steps to the final reconstruction results.
To show the robustness of the algorithm proposed in this paper, the algorithm is trained with different size of training data sets. e size of the testing data set is 5000, which is the same as that in Figure 10. e sizes of training data sets are set as 2000, 3000, 4000, and 5000, respectively. A 20 dB noise is also added to the testing data. e results are shown as error distribution in Figure 13. e error distributions of results are shown as histogram. To make the figure clear, they are plotted with markers in different color instead of bars. e y-axis value of the markers represents the probability that the results fall into corresponding span. e width of the span is equal to that of the span between two adjoining markers from x-axis direction. e marker is located at the center of its corresponding span. e result is better if the error distributions are more concentrated and closer to zero. From Figure 13, it can be seen that, in contrast to the results from training with 2000 pieces of data, results from 3000, 4000, and 5000 have similar error distributions.    From Figures 13(b) and 13(c), the results from 3000 to 4000 are even a little bit better than results from 5000. e results from 2000 have similar error distributions too, but it can also be seen that the performance is obviously not as good as results from larger training data sets from all aspects. e results from Figure 13 show that the quantity requirement of training data is not high and the algorithm is robust. e actor-critic structured DDPG, the direct Gauss-Newton optimization (DGNO) in [14], and RBF neural network based iteration (RBFNNI) in [21] are selected as representative methods to show the accuracy and robustness of the algorithm proposed. e results are shown in Figure 14 in the form of error distribution too. e means of markers are the same as those in Figure 13. From  Figures 14(b) and 14(c), it can be seen that the reconstruction results from PACS have obviously the best performance that the error distributions are more concentrated and closer to zero. From Figure 14(a), results from DDPG and DGNO are slightly better than those from PACS. Considering their performance on MD and RMSE, the results from PACS are still better than those from these methods.

Conclusion
In this paper, a pinning actor-critic structure-based solution for sizing complex depth profile with high degree of freedom of MFL inspection is studied. By involving the actor-critic structure, a novel way of utilizing the fine numerical based forward model in reconstructing the depth profile for MFL inspection is given. To solve the problem of the performance of reward deficiency, which is measured as Euclidean distance, a pinning strategy is given. By introducing the pinning subdefects, the action space has less variability than giving every subdefect an action. e robustness of the reconstruction results is improved by involving PACS. e effectiveness of PACS proposed in this paper is tested with simulation results from nonlinear numerical forward models of MFL inspection with FEM. e results that are shown in a statistic way show the effectiveness of PACS proposed in this paper. e depth profiles reconstructed from signal with 20 dB noise are close to depth profiles reconstructed from noise-free signal, proving the robustness of PACS proposed. e results also show good accuracy compared with representative solutions of depth profile reconstruction.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest. 10 Complexity