Welding Diagnostics byMeans of Particle SwarmOptimization and Feature Selection

In a previous contribution, a welding diagnostics approach based on plasma optical spectroscopy was presented. It consisted of the employment of optimization algorithms and synthetic spectra to obtain the participation profiles of the species participating in the plasma. A modification of the model is discussed here: on the one hand the controlled random search algorithm has been substituted by a particle swarm optimization implementation. On the other hand a feature selection stage has been included to determine those spectral windows where the optimization process will take place. Both experimental and field tests will be shown to illustrate the performance of the solution that improves the results of the previous work.


Introduction
Welding processes play an important role in today's industry as they are employed in a wide range of industrial scenarios.Some typical examples to be mentioned are the fabrication of heavy components for nuclear power stations (e.g., steam generators), automobiles, engines for aeronautics, and tubes for different energy applications or civil engineering.In some of these applications the demands in terms of welding quality are very restrictive: a porosity produced during the tube-totubesheet welding process of a steam generator is a good example in this regard.
One of the main problems to be faced by engineers in the early stages of the definition of a specific welding procedure is the complexity of the physics involved in the process [1,2].Although both theoretical and experimental works have been attempted, experience indicates that the determination of the optimal parameters for a given scenario requires to perform previous studies in the laboratory and, afterwards, to carry out welding trials on coupons to verify the predicted behavior.In spite of all these efforts, defects will appear during the process even if all the variables are carefully controlled.This implies the use of both destructive and nondestructive evaluation techniques to examine the resulting seams and to verify that they comply with the required standards.
Different monitoring approaches have been proposed for both laser and arc-welding processes based on the use of electric [3][4][5], acoustic [6,7], and optical sensors [8,9].Industrial cameras within the visible range have been also employed, typically with the aid of filters and illumination sources [10,11] and infrared thermography has been also used for both online and offline inspection [12,13].Among all these alternatives, only the first has been seriously commercialized as it allows to establish a reliable process window.However, some defects, like the identification of spurious materials in the joint, are impossible to be detected with this approach.Apart from the sensor technology chosen, a great effort has been also developed in processing strategies designed for defect detection [14,15] and classification [16][17][18].
Plasma optical spectroscopy has been also studied for its application in welding diagnostics [19,20], and it is currently one of the most promising solutions in this area.The immunity of the optical fiber to the strong electromagnetic interference generated during the process, the robustness of the spectroscopic analyses of the different species to be found in the plasma, and the possibility to identify spurious materials in the weld pool are some of its most relevant advantages.The typical approach when using plasma spectroscopy in this context has been the determination of the plasma electronic temperature by means of two or more emission lines of the same species [19][20][21][22].However, it has its limitations, as the uncertainty in the identification of the plasma emission lines, what has led to the exploration of other monitoring parameters [23][24][25].
Recently new analysis alternatives have been proposed: for example Sibillano et al. [26] introduced the so-called Covariance Mapping Technique to the analysis of the plasma dynamics in laser welding and Groslier et al. [27] studied the application of the pitch analysis to the voltage and current signals of a lap-welding (MIG-MAG) process.Another method is based on the generation of synthetic spectra to be matched to real experimental data by means of optimization algorithms [28].In this way the resulting participating profiles of the chosen species showed a clear correlation to the quality events.However, some issues have given rise to a revision and improvement of the previous model.On the one hand the optimization algorithm previously selected, the CRS6 (Controlled Random Search-6), has been substituted by a simple implementation of the PSO (Particle Swarm Optimization) [29].On the other hand, it was remarkable that the Ar II profiles exhibited a lack of sensitivity to some defects in some experimental results discussed in [28], what was supposed to be related to the use of the relative intensities from the NIST [30] local database to generate the synthetic spectra.To solve this problem a feature selection algorithm has been considered within the model to provide a selection of a narrower spectral range where the optimization will take place.

Spectroscopic Monitoring Parameters for Online Welding Inspection
The plasma electronic temperature T e is the spectroscopic parameter commonly employed as monitoring parameter in this framework.Although a more precise estimation of this temperature can be obtained by means of the Boltzmannplot [31], this solution, which implies the consideration of several emission lines and an additional regression process, is typically substituted by a simplified expression [19]: where E m is the upper level energy, k the Boltzmann constant, I the emission line intensity, A the transition probability, g the statistical weight, and λ the wavelength associated with its corresponding emission line.For the particular case of arc-welding, (1) varies, including in the logarithm of the denominator the quotient between the emission line upper level energies [32].
The appearance of defects is related to the occurrence of perturbations on the T e profile, but, although the correlation between this spectroscopic parameter and the quality of the seams has been proved [19][20][21][22], there are some issues, like the selection of the emission lines to participate in the T e estimation, that have led to the investigation of alternative approaches.
The analysis of the wavelength associated with the maximum intensity of the plasma continuum radiation [23], the plasma RMS signal [25], and the line-to-continuum method used with a feature selection algorithm [24] are some solutions that have been recently investigated.A completely different approach was suggested in [28], where a model based on the determination of the so-called participation profiles of the plasma species was built by generating synthetic spectra and, afterwards, using optimization algorithms to try to match the real welding spectra.The synthetic spectra are created after the identification of the most significant species participating in the process and employing a local copy of a database with spectroscopic information of the required elements.Both central wavelengths and relative intensities are used in this process, but the latter give rise to convergence problems of the optimization stage if a wide spectral range is considered.This problem was identified in [28] as the Ar II, the predominant species in our scenario in the wavelength range under analysis (195-535 nm), did not show the expected response to some defects, while other profiles (Fe I, Mn I, Ar I) allowed a correct flaw detection.
A possible solution to this issue lies in the definition of narrower spectral windows where the optimization process and, consequently, the generation of the participation profiles will be performed.Obviously, this gives rise to the uncertainty in the selection of the most suitable spectral ranges in terms of defect detection.A similar problem was studied in [24], where a feature selection algorithm (SFFS) [33] was used to determine those emission lines most discriminating in terms of defect detection.Results showed a high dependency between the selected spectral band and the associated output monitoring profile.Apart from this modification, a simple implementation of the PSO (Particle Swarm Optimization) algorithm will be used instead of the original CRS6, as it will be demonstrated that the former exhibits an improved computational performance.

Modifications to the Original Model
3.1.Optimization Algorithm: PSO.In the original implementation (see Figure 1) a controlled random search algorithm, the CRS6, was employed to perform the optimization stage.A natural evolution of the model lies in the inclusion of a better algorithm in terms of the computational performance of the whole solution.In this regard, it is worth mentioning that this model is not originally intended to be used in a real-time analysis scenario, but to better understand the dynamics of the different species within the plasma and their behaviour when different defects appear in the welding process.However, it could be used as a support for other spectroscopic approaches for online monitoring (e.g., in feasibility studies), what justifies the search for more efficient implementations.
After some initial studies the PSO was chosen as a good candidate, given its simplicity and widespread use in several scenarios.In the field of welding some authors have chosen PSO algorithms to solve the optimization of key welding parameters [34] or for the training stage of neural networks [35].PSO was originally proposed in 1995 [29], and it is inspired in the social behaviour of bird flocking and fish schooling, having suffered many changes since its original formulation, with new versions and applications.Apart from those already mentioned, typical fields of application for PSO have been image and video analysis, antenna design or power generation and systems, just to mention some examples.The original PSO algorithm can be summarized as follows.
(1) Initialize a population array of particles with random position and velocities on D dimensions in the search space.
(2) Evaluate the predefined optimization fitness function for each particle.
(3) Compare the latest fitness evaluation of the current particle with its "previous best" p best .If the current value is better, then p best will be updated and p i (previous best position) will be updated to the current location x i .
(4) Determine the particle within the swarm with the best success so far (g best ) and assign its location to p g .
(5) Proceed to change velocity and position of each particle within the swarm according to the following expression: x id (t + 1) = x id (t) + v id (t + 1). ( (6) If the stopping condition is met, then exit with the best result so far; otherwise repeat from point 2.
Each particle within the swarm is defined by its position X i and velocity V i within the D-dimensional search space, where In (2) w is the inertia weight, c 1 , and c 2 are positive constants, typically defined as learning rates, and r 1 and r 2 are random functions in the range [0, 1].Equation (2) describes a basic PSO algorithm, where the values of parameters w, c 1 and c 2 may significantly affect the behaviour of the algorithm [15], even making it unstable.The inertia weight can be interpreted as the fluidity of the medium where the swarm particles move, and typical values can be found between 0.4 and 0.9.Parameters c 1 and c 2 are typically assigned to 2, although they may have a significant influence on the search results.In addition, it is recommended to keep particle velocities within the range [−V max , +V max ], but the optimal value of V max depends on the specific problem under analysis.An alternative to (2) is the use of the so-called constriction method [36]: where Typical values for these parameters are φ = 4.1, φ 1 = φ 2 , and χ = 0.7298.Although not necessary, it is recommended to establish V max = X max .
Once the solution described by ( 5) was implemented, some tests were performed to compare the performance of PSO with the results offered by CRS6 that are summarized in Table 1, where Condition is the stopping condition ε of the algorithm: where f (x) is the function to minimize, f * the minimum, x * the value to be found in the optimization process, and x * an approximation to x * .Using a welding plasma spectrum captured during the experimental tests in the laboratory, both the convergence and the processing times of the PSO were determined under different conditions described in Table 2. Particles is the number of particles considered in the swarm for the optimization process, Iterations the number of iterations considered in each search, Participation the relative concentration of the species (neutral atoms and ions) participating in the plasma, and Processing time the overall estimated computational time of the optimization process.Both mean and standard deviation (std) values of the Ar I and Ar II participation have been calculated, indicating the ability of PSO to converge to the expected solution.It should be mentioned that the optimization process was performed over a set of 150 identical spectra, thus simulating a perfect seam without any defect.
In terms of the computational performance, it can be observed that PSO offers in this case processing times from 0.035 to 0.248 s (using 20 particles), while the results for CRS6 in Table 1 ranged from 0.11 to 0.79.It is also worth mentioning that the convergence values for both Ar I and Ar II are quite similar, but the standard deviation (std) is clearly higher for CRS6 and, although the parameters to be adjusted in both cases are different, it seems clear that the computational performance of PSO exceeds the one presented by CRS6, what justifies the inclusion of the former in the model under analysis.

Use of the SFFS Algorithm for Spectral Range Selection.
In the art of pattern recognition, that is in the automatic recognition, description, classification, and grouping of patterns in disciplines ranging from biology and psychology to computer vision or remote sensing [37], dimensionality reduction techniques are employed prior to recognition/classification.These attempts to find the minimum number of dimensions a data set can be expressed in without significant loss of information reduces the number of variables of the pattern representation (i.e. the number of features) required for the analysis.There are two main reasons to keep the number of features as small as possible: measurement cost and classification accuracy.A small number of features can alleviate the curse of dimensionality [38] if the number of training samples is limited, but what is more classification hit-rate could be greatly enhanced too if class separability or the distance among patterns belonging to different clusters is simultaneously maximized.There exists a wide variety of characterization methods [37] that achieve these objectives essentially by two different ways.Feature selection algorithms select the (hopefully) best subset of the input feature set while methods that create new features based on transformations or combination of the original feature set are called feature extraction algorithms.Although both alternatives are aimed at maximizing class separability, feature selection is preferable when dealing with spectral data since it also provides a physical insight of the problem [39].Moreover, dimensionality reduction could be performed inversely or in advance to identify the spectral bands that best separate the classes (correct seams and flaws) and use them to construct the monitoring signal.In this way the signal to noise ratio of the latter, and as a consequence defect sensitivity, would be clearly increased.The feasibility of this approach was demonstrated in a previous work [24], where the lineto-continuum method (i.e., the ratio between intensity lines and their adjacent background radiation) was used to generate the output monitoring profiles and Bhattacharyya distance [40] was employed as the criterion to measure class separability for wavelength selection.This probabilistic distance is very convenient to evaluate class separability for normal distributions, but even for nonnormal cases it seems to be a reasonable equation [41].The Mahalanobis distance, given by ( 8), is a particular case of the Bhattacharyya distance that assumes equal covariances of the classes: where μ i is the mean of the i class and Σ the covariance matrix.It is widely used as dissimilarity measure too because it requires about p 2 flops for a multivariate feature characterized by its mean vector μ ∈ R p and covariance matrix Σ ∈ R p×p , while the computation of the Bhattacharyya distance involves p 3 /3 + 2p 2 flops [42].
Given the necessity mentioned above of constricting the optimization process to spectral ranges narrower than the one provided by the spectrometer, the use of the SFFS algorithm to identify suitable spectral regions seems interesting in this scenario, as it constitutes an automatic procedure instead of having to perform specific studies for new processes or spectrometers.In this case, the performance of the employment of the Mahalanobis distance for retrieving the most appropriate wavelengths that will make up the output monitoring signal will be evaluated here.Let X be the number of spectral regions where the optimization process and, consequently, the generation of the participation profiles will be performed, that is, X is the number of bands to be selected.At a certain point of the selection process, S is the current set of previously selected bands and R is the set of remaining or unselected bands.The selection process starts being S = Ø.The pseudocode that describes the selection procedure is as follows: where "SUS inc " denotes that the band S inc is included into the set S, "S \ S exc " denotes that the band S exc is excluded from set S, and φ is the empty set.

Experimental and Field Test Validation
The first studies were aimed at improving the results obtained for the Ar II species in [28], given the already commented lack of response for some defects.After an initial analysis via SFFS, some spectral bands were chosen by the algorithm as the most suitable in terms of discrimination among spectra associated with correct seams and with defects, respectively.The details of the experimental tests are described in the previous work, but it should be mentioned that a GTAW (Gas Tungsten Arc Welding) process was employed to weld AISI-304 stainless steel plates.Defects  were provoked intentionally by introducing perturbations on the shielding gas (argon) flow rate.The first spectral bands chosen by the SFFS algorithm using the Mahalanobis distance are presented in Table 3. From these wavelengths, those in the range from 460 to 490 nm are related to Ar II emission lines, what suggests the suitability of selecting that spectral window for the optimization process.Figure 2 depicts the result of using the windows between 470 and 483 nm and 470 and 480 nm, respectively, in comparison to the original Ar II participation profile derived from the use of the whole spectral range of the spectrometer (Ocean Optics USB2000: 195 to 535 nm).It can be observed that the correlation between the defect in the seam (provoked by a perturbation on the shielding gas flow rate) and the resulting Ar II participation profile is significantly enhanced if the spectral range of the optimization process is reduced.The result is better for the narrower spectral range, what can be explained by the poor match obtained during the optimization process between the synthetic and the real spectra for the Ar II emission line located at 480.5 nm.A similar comparison is established in Figure 3, where another seam with two discontinuities can be observed.Again, the defects at x ≈ 4.5 and 6.5 cm are not clearly detected using the whole spectral range, but the employment of the 470 to 480 nm window gives rise to a more sensitive monitoring signal.
To extened the analysis to other processes and materials several studies have been performed on data from field tests [25].In this case the materials to be welded were Inconel-718 and Titanium 6Al-4V, with 2 and 1.6 mm of thickness, respectively.Filler wire was used for the former, and Ar was used as shielding gas (10 L/min), being also guided to the bottom side of the plates (30 L/min).The optical setup was basically constituted of a 600 μm core diameter optical fiber connecting the spectrometer (again the USB2000) and the fiber end acting as input optics located at approximately 10 cm from the electrode tip.Apart from correct seams, different defects were provoked during the analyses to obtain the desired spectroscopic data.
Figure 4 shows an Inconel-718 seam cataloged as correct after visual and X-ray inspection.The Ar II participation profiles depicted does not show any clear perturbation, although both signals exhibit a significant noise level.It is worth mentioning that other spectroscopic parameters, like the plasma RMS profile [25], also show that behavior.A possible explanation to this can be found in perturbations affecting the process that do not give rise to defects.The associated heat input profile (acquired by Tecnalia [43] with an electric sensor system) is also constant (Figure 4(c)), as expected for a seam free of defects.A defective seam is analyzed in Figure 5, where the trajectory of the welding torch over the joint was deviated (Figure 5(a)).It can be appreciated that the heat input signal gives an indication of defect at x ≈ 25 to 30 s, while the rest of the profile is almost constant.Two different participation profiles have been depicted in Figure 5(b), corresponding to two spectral ranges: 340 to 350 nm and 470 to 480 nm.The first band was chosen taking into account that the feature selection algorithm indicates the 344.15 band, being in this case the Fe I species the one selected for the process.This window generates a monitoring signal with a strong perturbation correlated to the one observed in the heat input signal, although other regions also indicate the occurrence of defects.In comparison, the Ar I profile do not exhibit in this case so clear perturbations.
The seam showed in Figure 6 was performed to join two Inconel-718 plates with a misalignment of approximately 1 mm, being the maximum allowed in this case 0.3 mm (15% of the plate thickness).The heat input signal depicted in Figure 6(c) does not exhibit any perturbation, being constant during the whole process.Almost the same situation can be observed in the T e profile presented in Figure 6(d), calculated using the Ar II emission lines located at 460.96 and 487.99 nm, respectively.However, the participation profile in this case is somewhat noisier, suggesting that a defective situation has taken place.
Defects provoked by lack of cleanliness have also been studied on Ti6Al-4V plates, simulating this situation by applying oil on the joint before the welding process.In the test described in Figure 7 the presence of oil gives rise to a clear defect at x ≈ 30 s, which is signaled by the heat input profile.The reduction of the spectral range to generate the Ar II participation profile (Figure 7 the change on the Ar II signal at x ≈ 18 s, what can be associated with the application of oil at the middle of the welding path.The interpretation to the appearance of the defect later in the seam can be explained by the dragging of the oil by the welding arc up to the defect location. The same defect was tried to be repeated for the seam depicted in Figure 8, but no defects were observed in this case after the visual inspection.Again, the heat input signal remains constant through the process, while the T e profile exhibits a clear slope and some subtle perturbations.The signal offered by the Ar II species (470-480 nm) clearly indicates the occurrence of defects, what seems to be in good agreement with the scenario under analysis.

Conclusion
An evolution on a spectroscopic model proposed in a previous contribution for welding diagnostics has been presented and discussed in this paper.The original proposal was based on the generation of synthetic spectra and the employment of optimization algorithms to generate participation profiles of those species contributing to the welding plasma.It was demonstrated that a direct correlation existed between these profiles and the resulting seam quality, that is, appearance of defects.However, the experimental tests demonstrated that Ar II, the predominant species within the spectral range under analysis, did not exhibit the same response associated with some defects correctly signaled by other species.
A revision of the proposed model suggested that the problem could be motivated by the use of the relative intensities from the NIST spectroscopic database for the creation of the synthetic spectra.Particularly, the use of wide spectral ranges with those intensities seemed to give rise to the mentioned lack of sensitivity to be found in the Ar II participation profiles.A possible solution to this issue lies in the reduction of the spectral window where the optimization process takes place, what has been implemented in this paper with the aid of a feature selection algorithm that helps to indicate the suitable spectral bands to be used.It has been demonstrated that this new approach has significantly improved the results obtained in the original work, given that now the Ar II participation signal shows a good correlation with the defects studied in the experimental tests.In addition, to extened the validity of the model, field tests on both Inconel-718 and Ti6Al-4V samples have been included in the analysis, also allowing to detect different weld defects: trajectory deviation, misalignment, and lack of cleanliness.
Apart from the use of the SFFS algorithm with the Mahalonobis distance to perform the spectral range reduction for the optimization process, the CRS6 algorithm used to perform this task in the original contribution has also been substituted by a simple implementation of the PSO, improving in this way the computational performance of the processing scheme.Some issues remain still unsolved and should be dealt with in the future to improve the proposed model.On  the one hand the employment of the relative intensities in the generation of the synthetic spectra should be avoided: a solution to be explored might be based on a feedback scheme where the intensities of the chosen emission lines could be calculated from the estimation of a spectroscopic parameter, like the plasma temperature T e , using different species in the process.It could be also interesting to try to relate the relative participation profiles of consecutive ionization stages for a given element via the Saha equation, although it should be studied whether this approach would be excessively costly in terms of the computational performance of the model.An application of this method might lie in the framework of LIBS (Laser Induced Breakdown Spectroscopy), where it might be used for a quantitative estimation of the composition of samples.

Figure 1 :
Figure 1: Schematic representation of the generation of the participation profiles from the creation of the synthetic spectra and the optimization process.

Table 2 :
Performance of PSO algorithm.

Table 3 :
Spectral bands chosen by the SFFS algorithm.