A set of designed experiments, involving the use of a pulsed Nd:YAG laser system milling 316L Stainless Steel, serve to study the lasermilling process of microcavities in the manufacture of drugeluting stents (DES). Diameter, depth, and volume error are considered to be optimized as functions of the process parameters, which include laser intensity, pulse frequency, and scanning speed. Two different DES shapes are studied that combine semispheres and cylinders. Process inputs and outputs are defined by considering the process parameters that can be changed under industrial conditions and the industrial requirements of this manufacturing process. In total, 162 different conditions are tested in a process that is modeled with the following stateoftheart datamining regression techniques: Support Vector Regression, Ensembles, Artificial Neural Networks, Linear Regression, and Nearest Neighbor Regression. Ensemble regression emerged as the most suitable technique for studying this industrial problem. Specifically, Iterated Bagging ensembles with unpruned model trees outperformed the other methods in the tests. This method can predict the geometrical dimensions of the machined microcavities with relative errors related to the main average value in the range of 3 to 23%, which are considered very accurate predictions, in view of the characteristics of this innovative industrial task.
Lasermilling technology has become a viable alternative to conventional methods for producing complex microfeatures on difficulttoprocess materials. It is increasingly employed in the industry, because of its established advantages [
Although the manufacturing industry is interested in laser micromilling and some research has been done to understand the main physical and industrial parameters that define the performance of this process, the conclusions show that analytical approaches are necessary for all real cases due to their complexity. Datamining approaches represent a suitable alternative to such tasks due to their capacity to deal with multivariate processes and experimental uncertainties. Datamining is defined in [
The experimental apparatus described in this study gathered the data needed to create the models. The experimentation consisted of milling microcavities in a 316L Stainless Steel workpiece using a laser system. A Deckel Maho Nd:YAG Lasertec 40 machine with a 1,064 nm wavelength was used to perform the experiments. The system is a lamppumped solidstate laser that provides an ideal maximum (theoretically estimated) pulse intensity of 1.4 W/cm^{2} [
Sphere geometry dimensions.
Geometry  Depth ( 

Volume ( 

Sphere 1 (e1)  50  166  721414 
Sphere 2 (e2)  70  140  718377 
Sphere 3 (e3)  90  124  724576 
Cylinder geometry dimensions.
Geometry  Depth ( 

Length ( 
Volume ( 

Cylinder 1 (c1)  50  130  55  723220 
Cylinder 2 (c2)  70  110  46  721676 
Cylinder 3 (c3)  90  100  36  725707 
Cavity geometries used in the experiments.
A full factorial design of experiments was developed, in order to analyze the effects of pulse frequency (PF), scanning speed (SS), and pulse intensity levels (PI, percentage of the ideal maximum pulse intensity) on the responses. Some screening experiments were performed to determine the proper parametric levels. Three different levels were selected from the results of each input factor, which are presented in Table
Factors and factor levels.
Factors  Factor levels  

Scanning speed (SS) (mm/s)  200  400  600 
Pulse intensity (PI) (%)  60  78  100 
Pulse frequency (PF) (kHz)  30  45  60 
Having performed the experimental tests, the inputs and outputs for the datasets had to be defined, to generate the data sets for the datamining modeling. On the whole, the selection of the inputs is easy, because they are set by the specifications of the equipment: the inputs are the parameters that the process engineer can change in the machine. They are the same as those considered to define the experimental tests explained above. Table
Input variables.
Variable  Units  Range  Relationship  


Programmed depth 

50–90  Independent 

Programmed radio 

50–83  Independent 

Programmed length 

0–55  Independent 

Programmed volume 

718–726 


Intensity  %  60–100  Independent 

Frequency  KHz  30–60  Independent 

Speed  mm/s  200–600  Independent 

Time  s  9–24  Independent 

Programmed MRR 

30–81 

The definition of the data set outputs takes different interests into account that relate to the industrial manufacturing of DES. In some cases, a productivity orientation will encourage the process engineer to optimize productivity (in terms of the MRR) keeping geometrical accuracy under certain acceptable thresholds (by fixing a maximum relative error in geometrical parameters). In other cases, the geometrical accuracy will be the main requirement and productivity will be a secondary objective. In yet other cases, only one geometrical parameter, for example, the depth of the DES, will be critical and the other geometrical parameters should be kept under certain thresholds, once again by fixing a maximum relative error for these geometrical parameters. Therefore, this work considers the geometrical dimensions and the MRR that is actually obtained as its output, their deviance from the programmed values, and the relative errors between the programmed and the real values. Table
Output variables.
Variable  Units  Range  Relationship  


Measured volume 

130–1701  Independent 

Measured depth 

25–230.60  Independent 

Measured diameter 

118.50–208.80  Independent 

Measured length 

0–70.20  Independent 

Measured MRR 

12–121 


Volume error 

−980–596 


Depth error 

−180.60–56.90 


Width error 

−62.80–−16.63 


Length error 

−25.50–208.80 


MRR error 

−61–66 


Volume relative error  Dimensionless  −1.36–0.82 


Depth relative error  Dimensionless  −3.61–0.63 


Width relative error  Dimensionless  −0.55–0.10 


Length relative error  Dimensionless  −0.71–0.15 

In our study we consider the analysis of each output variable separately, by defining a onedimensional regression problem for each case. A regressor is a datamining model in which an output variable,
The aim of this work is to determine the most suitable regressor for this industrial problem. The selection is performed by comparing the root mean squared error (RMSE) of several regressors over the data set. Having a data collection of
We tested a wide range of the main families of stateoftheart regression techniques as follows.
Functionbased regressors: we used two of the most popular algorithms, Support Vector Regression (SVR) [
Instancebased methods, specifically their most representative regressor,
Decisiontreebased regressors: we have included these kinds of methods because they are used in the ensembles as regressors, as explained in Section
Ensemble techniques [
One of the most natural and simplest ways of expressing relations between a set of inputs and an output is by using a linear function. In this type of regressor the variable to forecast is given by a linear combination of the attributes, with predetermined weights [
We used an improvement to this original formulation, by selecting the attributes with the Akaike information criterion (AIC) [
The decision tree is a datamining technique that builds hierarchical models that are easily interpretable, because they may be represented in graphical form, as shown in Figures
Model of the measured volume with a regression tree.
Model of the measured volume with a model tree.
As base regressors, we have two types of regressors, the structure of which is based on decision trees:
In our experimentation, we used one representative implementation of the two families, reducederror pruning tree (REPTree) [
An ensemble regressor combines the predictions of a set of socalled base regressors using a voting system [
Ensemble regressor architecture.
Random Subspaces follow a different approach: each base regressor is trained in a subset of fewer dimensions than the original space. This subset of features is randomly chosen for all regressors. This procedure is followed with the intention of avoiding the wellknown problem of the
Bagging is in its initial formulation for regression.
Iterated Bagging combines several Bagging ensembles, the first one keeping to a typical construction and the others using residuals (differences between the real and the predicted values) for training purposes [
Random Subspaces are in their formulation for regression.
AdaBoost.R2 [
Additive regression is this regressor that has a learning algorithm called Stochastic Gradient Boosting [
This regressor is the most representative algorithm among the instancebased learning. These kinds of methods forecast the output value using stored values of the most similar instances of the training data [
how many nearest neighbors to use to forecast the value of a new instance?
which distance function to use to measure the similarity between the instances?
In our experimentation we have used the most common definition of the distance function, the Euclidean distance, while the number of neighbors is optimized using crossvalidation.
This kind of regressor is based on a parametric function, the parameters of which are optimized during the training process, in order to minimize the RMSE [
The norms of
We have an optimization problem of the convex type that is solved in practice using the Lagrange method. The equations are rewritten using the
From the expression in (
We used the multilayer perceptron (MLP), the most popular ANN variant [
We compared the RMSE obtained for the regressors in a 10
All the ensembles under consideration have 100 base regressors. The methods that depend on a set of parameters are optimized as follows:
SVR with linear kernel: the tradeoff parameter,
SVR with radial basis kernel:
multilayer perceptron: the training parameters
kNN: the number of neighbours is optimized from 1 to 10.
The notation used to describe the methods is detailed in Table
Methods notation.
Bagging  BG 
Iterated Bagging  IB 
Random Subspaces  RS 
Adaboost.R2  R2 
Additive Regression  AR 
REPTree  RP 
M5P Model Tree  M5P 
Support Vector Regressor  SVR 
Multilayer Perceptron  MLP 


Linear Regression  LR 
Regarding the notation, two abbreviations have been used besides those that are indicated in Table
Root mean squared error 1/3.
Volume  Depth  Width  Length  MRR  

RP  216482.27  26.89  5.81  5.79  17166.73 
M5P  210773.29  23.01  4.51  5.53  16076.33 
LR  197521.89  23.84  7.04  14.73  15015.93 

193756.62  23.57  5.42  6.84  15842.42 
SVR linear  197894.5  24.19  7.07  16.23  14800.32 
SVR radial basis  200774.24  18.85  4.38  5.35  14603.37 
MLP  207646.99  22.98  4.7  5.39  16292.43 
BG RP P  200212.92  21.49  4.97  5.08  15526.79 
BG RP U  205290.8  19.98  4.66  4.9  15603.89 
BG M5P P  200636.57  20.61  4.43  5.27  15293.92 
BG M5P U  197546.75  19.5  4.31  5.38  15022.1 
IB RP P  202767.85  22.05  4.87  5.14  15677.57 
IB RP U  219388.93  21.7  5.01  5.11  15911.96 
IB M5P P  197833.83  20.8  4.42  4.88  15318.01 
IB M5P U  195154.16  19.65  4.3  4.78  14830.83 
R2L RP P  191765.23  22.27  4.94  5.07  15788.47 
R2L RP U  206369.82  21.71  5.25  5.29  17004.31 
R2L M5P P  186843.92  20.81  4.37  4.84  15031.37 
R2L M5P U  181587.4  20.51  4.34  4.84  15209.01 
R2S RP P  193908.39  23.03  5.1  5.18  16007.43 
R2S RP U  200453.21  21.21  5.11  5.31  16401.24 
R2S M5P P  173117.72  20.83  4.49  4.71  15070.05 
R2S M5P U  173914.36  20.87  4.52  4.72  15245.81 
R2E RP P  192529.31  22.59  5.02  5.07  15854.03 
R2E RP U  205920.86  21.44  5.22  5.21  17318.45 
R2E M5P P  171056.22  21.25  4.53  4.66  15078.01 
R2E M5P U  172948.95  21.32  4.55  4.67  15090.23 
AR RP P  215750.39  25.51  5.42  5.56  17249.82 
AR RP U  274467.43  24.71  5.83  6.15  18628.28 
AR M5P P  208805.84  22.27  4.44  5.08  16076.5 
AR M5P U  194572.79  19.58  4.51  4.94  15538.29 
RS 50% RP P  200526.36  26.58  5.68  7.74  15854.27 
RS 50% RP U  201946.24  25.33  5.35  7.54  15669.28 
RS 50% M5P P  201013.27  26.6  5.38  15.42  15402.77 
RS 50% M5P U  199166.84  25.65  5.39  15.45  15349.84 
RS 75% RP P  199270.03  23.33  5.27  5.98  15861.58 
RS 75% RP U  207845.67  21.61  5.04  6.25  16251.64 
RS 75% M5P P  199648.96  23.55  4.65  8.29  15227.87 
RS 75% M5P U  197420.35  22.11  4.6  8.39  15295.71 
Root mean squared error 2/3.
Volume error  Depth error  Width error  Length error  MRR error  

RP  216875.74  30.66  5.91  5.62  16355.57 
M5P  214500.91  19.8  6.19  5.09  16009.84 
LR  197521.93  23.92  6.76  14.67  14963.09 

193696.36  23.69  4.98  6.6  14636.51 
SVR linear  197817.88  24.17  7.1  16.23  14784.77 
SVR radial basis  200785.96  18.98  4.42  5.37  14504.61 
MLP  206753.61  21.96  4.93  5.37  17382.69 
BG RP P  201037.57  23.18  4.62  4.98  15122.57 
BG RP U  205341.53  21.6  4.4  4.87  15327.2 
BG M5P P  200594.31  19.66  5.32  5.08  15162.5 
BG M5P U  197575.49  19.19  5.22  5.15  14860.54 
IB RP P  207848  23.05  4.78  5.18  15525.56 
IB RP U  216631.14  23.79  4.79  5.2  16532.31 
IB M5P P  201450.65  19.73  4.44  4.71  15177.67 
IB M5P U  198261.22  19.44  4.33  4.58  14933.58 
R2L RP P  201365.5  24.08  4.64  5.03  15442.66 
R2L RP U  208500.97  23.95  4.87  5.38  16778.3 
R2L M5P P  184799.5  20.61  4.68  4.77  14732.4 
R2L M5P U  183740.85  20.71  4.7  4.81  14817.65 
R2S RP P  195592.43  24.75  4.65  5.24  16107.62 
R2S RP U  201017.69  22.87  4.71  5.39  15871.67 
R2S M5P P  172775.15  21.09  4.53  4.72  14459.98 
R2S M5P U  173892.35  20.88  4.52  4.74  14497.38 
R2E RP P  195657.69  24.26  4.62  5.12  15645.02 
R2E RP U  206275.71  24.38  4.82  5.35  16721.26 
R2E M5P P  172196.89  21.61  4.57  4.75  14324.48 
R2E M5P U  173356.89  21.61  4.58  4.79  14371.36 
AR RP P  214978.3  27.77  5.49  5.43  16208.07 
AR RP U  271926.55  27.43  5.29  6.1  20571.27 
AR M5P P  211329.63  19.8  4.55  4.66  15995.84 
AR M5P U  195497.5  19.55  4.78  4.82  15448.38 
RS 50% RP P  200775.17  28.54  5.41  6.93  15217.65 
RS 50% RP U  202354.2  26.84  5.24  6.94  15074.78 
RS 50% M5P P  201054.45  25.91  5.98  11.81  15387.34 
RS 50% M5P U  199246.1  25.61  5.79  11.75  14940.89 
RS 75% RP P  199501.89  25.86  4.9  5.73  15165.01 
RS 75% RP U  206467.56  24.5  4.75  6.06  15743.5 
RS 75% M5P P  200160.47  22.05  5.76  6.66  15185.2 
RS 75% M5P U  197431.78  21.79  5.53  6.75  14729.75 
Root mean squared error 3/3.
Volume relative error  Depth relative error  Width relative error  Length relative error  

RP  0.3  0.54  0.04  0.09 
M5P  0.3  0.33  0.04  0.09 
LR  0.27  0.43  0.05  0.13 

0.27  0.38  0.04  0.1 
SVR linear  0.27  0.44  0.05  0.15 
SVR radial basis  0.28  0.29  0.04  0.1 
MLP  0.29  0.34  0.04  0.1 
BG RP P  0.28  0.4  0.04  0.08 
BG RP U  0.29  0.36  0.04  0.08 
BG M5P P  0.28  0.32  0.04  0.09 
BG M5P U  0.27  0.31  0.04  0.09 
IB RP P  0.29  0.38  0.04  0.09 
IB RP U  0.3  0.38  0.04  0.09 
IB M5P P  0.28  0.32  0.04  0.09 
IB M5P U  0.28  0.3  0.04  0.09 
R2L RP P  0.27  0.39  0.04  0.08 
R2L RP U  0.29  0.38  0.04  0.1 
R2L M5P P  0.25  0.32  0.04  0.09 
R2L M5P U  0.26  0.32  0.04  0.09 
R2S RP P  0.27  0.39  0.04  0.09 
R2S RP U  0.28  0.38  0.04  0.1 
R2S M5P P  0.24  0.32  0.04  0.1 
R2S M5P U  0.24  0.33  0.04  0.1 
R2E RP P  0.27  0.4  0.04  0.09 
R2E RP U  0.29  0.38  0.04  0.11 
R2E M5P P  0.24  0.33  0.04  0.1 
R2E M5P U  0.24  0.33  0.04  0.1 
AR RP P  0.3  0.47  0.04  0.09 
AR RP U  0.38  0.44  0.05  0.11 
AR M5P P  0.29  0.33  0.04  0.09 
AR M5P U  0.27  0.29  0.04  0.09 
RS 50% RP P  0.28  0.48  0.04  0.09 
RS 50% RP U  0.28  0.44  0.04  0.09 
RS 50% M5P P  0.28  0.43  0.04  0.09 
RS 50% M5P U  0.28  0.42  0.04  0.09 
RS 75% RP P  0.28  0.45  0.04  0.08 
RS 75% RP U  0.29  0.41  0.04  0.08 
RS 75% M5P P  0.28  0.36  0.04  0.09 
RS 75% M5P U  0.27  0.35  0.04  0.09 
Finally, a summary table with the best methods per output is shown. The indexes of the 39 methods that were tested are explained in Tables
Index notation for the nonensemble methods.
1  2  3  4  5  6  7 

RP  M5P  LR 

SVR linear  SVR radial basis  MLP 
Index notation for the ensemble methods.
BG  IB  R2L  R2S  R2E  AR  RS 50%  RS 75%  

RP P  8  12  16  20  24  28  32  36 
RP U  9  13  17  21  25  29  33  37 
M5P P  10  14  18  22  26  30  34  38 
M5P U  11  15  19  23  27  31  35  39 
Summary table.
Best method  Statistically equivalent  

Volume 

3, 4, 5, 
20, 22, 23, 27, 31, 35, 38, 39  


Depth 

9, 10, 11, 13, 14, 
23, 25, 31  


Width 

2, 
23,  


Length 

4, 
22, 23, 24, 25, 27, 30, 31, 37  


MRR 

2, 3, 4, 5, 7, 8, 
14,  
25,  


Volume error 

3, 4, 5, 
27, 31, 34, 35, 36, 38, 39  


Depth error 

2, 10, 11, 14, 
30, 31  


Width error 

6, 8, 
21, 22, 23, 24, 25,  


Length 

4, 8, 
30, 31  


MRR error 

1, 2, 3, 4, 5, 
13, 14,  
24, 25, 27, 28, 30, 31, 32, 33, 34, 35, 36  
37, 38, 39  


Volume relative error 

3, 4, 5, 
27, 31, 34, 35, 36, 38, 39  


Depth relative error 

2, 10, 11, 14, 
30, 31  


Width relative error 

2, 


Length relative error 

1, 2, 8, 10, 11, 12, 13, 14, 
28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 
The performance of each method may be ranked. Table
Linear models like SVR linear—index 5— and LR—index 3—do not fit the datasets in the study very well. Both methods are ranked together at the middle of the table. The predicted variables therefore need methods that can operate with nonlinearities.
SVR with Radial Basis Function Kernel—index 6—is the only nonensemble method with competitive results, but it needs to tune 2 parameters. MLP—index 7—is not a good choice. It needs to tune 3 parameters and is not a wellranked method.
For some ensemble configurations, there are differences in the number of statistically significative performances between the results from pruned and unpruned trees, while in other cases these differences do not exist; in general, though, the results of unpruned trees are more accurate, specially with the topranked methods. In fact, the only regressor which is not outperformed by other methods in any output has unpruned trees. Unpruned trees are more sensitive to changes in the training set. So, the predictions of unpruned trees, when their base regressors are trained in an ensemble, are more likely to output diverse predictions. If the predictions of all base regressors agreed there would be little benefit in using ensembles. Diversity balances faulty predictions by some base regressors with correct predictions by others.
The topranked ensembles use the most accurate base regressor (i.e., M5P). All M5P configurations have fewer weaker performances than the corresponding RP configuration. In particular, the lowest rank was assigned to the AR  RP configurations, while AR M5P U came second best.
Ensembles that lose a lot of information, such as RS, are ranked at the bottom of the table. The table shows that the lower the percentage of features RS use, the worse they perform. In comparison to other ensemble methods, RS is a method that is very insensitive to noise, so it can point to data that are not noisy.
In R2 M5P ensembles, the loss function does not appear to be an important configuration parameter.
IB M5P U is the only configuration that never had significant losses when compared with the other methods.
Methods ranking.
Indexes  Methods  Number of defeats 

15  IB M5P U  0 
31  AR M5P U  1 
6, 14, 18, 19, 22, 23  SVR radial basis, IB M5P P, R2L M5P P,  2 
R2L M5P U, R2S M5P P, and R2S M5P U  
11, 26, 27  BG M5P U, R2E M5P P, and R2E M5P U  3 
10, 30  BG M5P P and AR M5P P  4 
9  BG RP U  5 
39  RS 75% M5P U  6 
2, 4, 16, 38  M5P, 
7 
12, 13, 35  IB RP P, IB RP U, and RS 50% M5P U  8 
3, 5, 8, 17, 20, 24, 25, 34, 36  LR, SVR linear, BG RP P, R2L RP U,  9 
R2S RP P, R2E RP P, R2E RP U,  
RS 50% M5P P, and RS 75 % RP P  
7, 21, 37  MLP, R2S RP U, and RS 75% RP U  10 
32, 33  RS 50% RP P and RS 50% RP U  11 
1, 28  RP and AR RP P  12 
29  AR RP U  14 
Once the best datamining technique for this industrial task is identified, the industrial implementation of these results can follow the procedure outlined below.
The best model is run to predict one output, by changing two input variables of the process in small steps and maintaining a fixed value for the other inputs.
3D plots of the output related to the two varied inputs should be generated. The process engineer can extract information from these 3D plots on the best milling conditions.
As an example of this methodology, the following case was built. Two Iterated Bagging ensembles with unpruned M5P as their base regressors are built for two outputs, the width and the depth errors, respectively. Then, the models were run by varying two inputs in small steps across the test range: pulse intensity (PI) and scanning speed (SS). This combination of inputs and outputs presents an immediate interest from the industrial point of view because, in a workshop, the DES geometry is fixed by the customer and the process engineer can only change three parameters of the laser milling process (pulse frequency (PF), scanning speed, and pulse intensity). In view of these restrictions, the engineer will wish to know the expected errors for the DES geometry depending on the laser parameters that can be changed. The rest of the inputs for the models (DES geometry) are fixed at 70
3D plots of the predicted depth and width’s errors from the Iterated Bagging ensembles.
In this study, extensive modeling has been presented with different datamining techniques for the prediction of geometrical dimensions and productivity in the laser milling of microcavities for the manufacture of drugeluting stents. Experiments on 316L Stainless Steel have been performed to provide data for the models. The experiments vary most of the process parameters that can be changed under industrial conditions: scanning speed, laser pulse intensity, and laser pulse frequency; moreover 2 different geometries and 3 different sizes were manufactured within the experimental test to obtain informative data sets for this industrial task. Besides, a very extensive analysis and characterization of the results of the experimental test were performed to cover all the possible optimization strategies that industry might require for DES manufacturing: from highproductivity objectives to high geometrical accuracy in just one geometrical axis. By doing so, 14 data sets were generated, each of 162 instances.
The experimental test clearly outlined that the geometry of the feature to be machined will affect the performance of the milling process. The test also shows that it is not easy to find the proper combination of process parameters to achieve the final part, which makes it clear that the laser micromilling of such geometries is a complex process to control. Therefore the use of datamining techniques is proposed for the prediction and optimization of this process. Each variable to predict was modelled by regression methods to forecast a continuous variable.
The paper shows an exhaustive test covering 39 regression method configurations for the 14 output variables. A
Future work will consider applying the experimental procedure to different polymers, magnesium, and other biodegradable and biocompatible elements, as well as to different geometries of industrial interest other than DES, such as microchannels. Moreover, as micromachining is a complex process where many variables play an important role in the geometrical dimensions of the machined workpiece, the application of visualization techniques, such as scatter plot matrices and start plots, to evaluate the relationships between inputs and outputs will help us towards a clearer understanding of this promising machining process at an industrial level.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors would like to express their gratitude to the GREP research group at the University of Girona and the Tecnológico de Monterrey for access to their facilities during the experiments. This work was partially funded through Grants from the IREBID Project (FP7PEOPLE2009IRSES247476) of the European Commission and Projects TIN201124046 and TECNIPLAD (DPI200909852) of the Spanish Ministry of Economy and Competitiveness.