Using Artificial Intelligence Techniques to Improve the Prediction of Copper Recovery by Leaching

Copper mining activity is going through big changes due to increasing technological development in the area and the influence of industry 4.0. These changes, produced by technological context and more controls (e.g., environmental controls), are also becoming visible in Chilean mining. New regulations from the Chilean government and changes in the copper mining industry (such as a trend to underground mining) are fostering the search for better results in typical processes such as leaching. This paper describes an experience using artificial intelligence techniques, particularly random forest, to develop predictive models for copper recovery by leaching, using data from an enterprise present in northern Chile for more than 20 years. Two models, one of them with actual operational data and another one with data generated in a controlled environment (piling) are presented. Well-classified values of 98.90% for operational data and 98.72% for pile/piling data were obtained. The methodology devised for the study can be transferred to piling columns or piles with other characteristics, though the operation must focus on copper leaching. It can even be transferred to other leaching processes using another type of mineral, with proper adjustments.


Introduction
The Chilean mining industry, as in the whole world, is experimenting with big changes due to the rapid technological advance in the so-called industry 4.0 [1]. According to Pietrobelli et al. in [2], big mining companies typically tended to control their operations from remote centers located in multinational corporations, thus resulting in little local innovation and development. This way of operating helps the macroeconomy, but it makes difficult diversification, knowledge transfer, and regional innovation in the value chain [3]. Another factor producing changes in the abovementioned trend is the significant fall of copper price since 2015, fostering both technological advances enabling companies to face production costs [4] and also greater regional innovation and development.
Chilean copper production represents 35% on a world basis [5,6]. On a local basis, the copper production industry is the country's most profitable, providing almost 15% of Chilean GDP and representing 50% of exports [7,8]. This Chilean predominant position in the copper industry is also complemented with leadership in other mineral products, such as lithium. To keep this leadership in the world's mining activity, Chile must ensure mining profitability in the short term. A valid strategy for this may be investing in technology and innovation, together with mining industry diversification.
Recent papers [5][6][7] report a trend to technological diversification in the sector, even from mining suppliers. Furthermore, as stated in [6], a recent report from the Chilean government declares the objective of promoting the establishment of 250 local suppliers for the mining sector in 2035. This strategy is expected to create knowledge about business and technology appropriate for current challenges, both elements being directed to local mining development and exports as well. This would result in an income of about US$10 thousand million.
For the aforementioned technological development and innovation, the Chilean mining industry is incorporating technology to develop intelligence system-type applications for supporting tasks such as copper recovery prediction. These systems are frequently based on artificial intelligence computing models. Apart from representing a technological contribution, these models are becoming a great help for predicting or reducing production costs [9,10], a very convenient fact for supporting modern technology characterized by a greater extraction complexity and increasing restrictions such as environmental ones [11].
In typical production processes such as leaching, predictive models have been satisfactorily used in the last decade to identify factors allowing production increase [9,10]. There are several cases illustrating predictive model generation using artificial intelligence, specifically soft computing [12]. In particular, this paper fully describes the process for developing predictive models [13] to recover copper by leaching and the results were obtained at SCM Franke Company, from the KGHM International Group, present in Chilean mining exploitation since 2009.
Recently, research into the applicability of artificial intelligence techniques such as predictive model algorithms, for copper recovery prediction, has been conducted. In this context, comparative studies of which predictive model algorithms are the most appropriate according to the characteristics of the copper mining production process have been published. Thus, advantages of using support vector machine (SVM), random forest (RF), artificial neural networks (ANN), gradient boosted trees (GBT), or wavelet neural network (WNN) are frequently reported in the literature (such as [14]). For example, in [15], a predictive modeling using SVM for copper potential mapping in the Kerman copper bearing belt in the south of Iran is reported. In [16], a comparative analysis of ANN, WNN, and SVM models to mineral potential mapping for copper mineralization is presented. As a particular result of this work, the authors highlight that WNN exhibits excellent learning ability compared to the conventional ANN.
Also, in [17], SVM, ANN, and RF were used to conduct predictive modeling of mineral prospectively. For these algorithms, input data was obtained from GIS-based mineral prospectively mapping of the Tongling ore district (eastern China). As a conclusion from this work, authors highlight that the RF model outperformed the SVM and ANN models, giving a greater consistency and better predictive accuracy. Another example of comparative analysis of predictive models using GBT, ANNs, and RF is the work described in [16], where authors highlight that the RF models show the highest coefficients of determination (R 2 ) values and the lowest root-mean-square error (RMSE), and the highest residual prediction deviations (RPD) were obtained.
There are several papers that report that RF and GBDT perform the best (see Table 1 for a comparison among these methods); therefore, and based on the described information in the previous paragraphs, the use of RF can more appropriately lead to the achievement of the stated objective.
This paper describes the tasks done to generate predictive models for copper recovery in leaching piles with low-grade material, using data from actual pile operation and those produced in a controlled environment (pilot), using the same artificial intelligence technique (random forest technique) in both cases to develop predictive models.
The remaining document is organized as follows: Section 2 describes the base concepts of the study and related work. Section 3 describes the experiment, the discretizing of the variables used in the model, data characteristics and how they were collected, work methodology, and the techniques used for analyzing results. Section 4 shows the results obtained for the two models, that is, operational data and piling data models. Section 5 deals with the discussion. Section 6 shows the conclusions of the paper. Finally, acknowledgments and bibliographical references are stated.

Concepts and Related Work
2.1. Leaching and Company Work. The copper leaching process involves tasks thoroughly identified by the industry, that is, irrigation beginning and maintenance, agglomerate condition evaluation, drainage distribution, pool solution inventory, PLS flow evaluation, and distribution and deposition of the material leached at the plant (harvest). These processes, due to the nature and variability of the input material, usually produce high levels of entropy and uncertainty (close to 20%) concerning copper recovery at the end of the harvest [9]. SCM Franke uses three industrial processes widely known in the industry of metallic copper production via hydrometallurgy. These processes are dynamic pile leaching, solvent extraction, and electrowinning [9]. The ultimate goal of these processes is to obtain the greatest copper production by saving resources and being the least possible aggressive to the environment (a kind of environmental trade-off). The leaching process has been shown to be one of the most convenient to achieve this environmental trade-off. The objective of this paper is predicting estimated copper recovery as accurately as possible at about 95% by dynamic pile leaching, using the least possible amount of leaching material and the best irrigation homogeneity.

Related Work and Predictive
Models. The development of applications using predictive modeling to improve mineral  [18][19][20].
Recent studies such as [3,9,10] reveal that one of the most critical tasks in prospective modeling is the selection of appropriate criteria and the application of sound innovating techniques to get the evidential characteristics of these criteria.
Traditionally, these criteria have been selected by different numerical methods, but in the last few decades, alternative techniques such as those from the artificial intelligence area have been applied for both criteria selection and the development of predictive models for mineral recovery [21]. In general, methods containing machine learning algorithms are being applied for building these predictive models.
In the literature, the methods referred to here have been grouped into two sets [21][22][23]: knowledge-driven models and data-driven models. Data-driven models are probabilistic models such as discriminant analysis or logistic regression [19,24]. The algorithms of data-driven models, whose evidence of use is more often reported in the literature, are artificial neural networks (generally with backpropagation [25,26]) (ANN) [19,27,28] and regression trees (RTs) [13,24,29] in sectors such as copper mining. Methods called support vector machines (SVM) and random forest (RF) [29] are sometimes used in this domain [13,30,31]. The common way of using the algorithms of the datadriven model group in concrete mining tasks such as studying copper recovery is using data themselves, while in knowledge-driven methods, an expert in mineral extraction via hydrometallurgy should be consulted for the job. As a whole, ANN, RT, or SVM models require enough amounts of records and parameters to achieve good quality in the models created as output.
The literature contains papers such as [32] that propose a comparison among the performance of predictive models. Table 1 shows that RF and GBDT perform the best, followed by SVM and ELM. Moreover, we observe that the interquartile ranges of RF, GBDT, and SVM are the smallest, showing that these three algorithms generally perform well, in terms of prediction accuracy, regardless of the datasets [32]. While ensemble and boosting methods have been reported to obtain good predictive performance in supervised learning, GBDT is generally less popular than RF. GBDT and RF show both best total average classification accuracy and best mean rank followed by SVM and ELM [32].
This study uses RF as a predictive model; it is a kind of predictive model based on decision trees. There are previous works as [33] that defined this kind of predictive models as "a type of predictive model that uses a decision tree to go from observations of an object (represented as the branches of a tree) to a certain conclusion about a target value of the object (represented by the tree leaves)." Thus, the interest of using RF is twofold. First, datadriven model algorithms (like RF) are frequently used to predict values of the target variable influenced by other variables (predictor variables) in datasets [33,34]. In this context, the RF model is adequate for generating a predictive model of the copper recovery by leaching (the target variable for this work), due to it providing a way to measure the influence of each predictor variable on the target variable. And second, one of the main benefits of RF is that it can be used to determine the importance of variables in a regression or classification problem intuitively [35]. So, RF can be used to determine the importance of each predictive variable over the target variable.
Prediction is a highly interesting topic in machine learning, which is, in turn, one of the branches of artificial intelligence. As mentioned above, RF is based on decision trees (DT). DT have been widely used in areas such as medicine to yield a diagnosis since they are easy to interpret. Basically, DT is a hierarchical set of nodes (starting from a root node), where each node contains a decision based on the comparison of an attribute with a threshold value [36,37]. DT-based learning goes from the observation of an object represented as branches of a tree to certain conclusions related to a target value of an object (represented by tree values) [36,37].
Previous studies use artificial intelligence techniques for copper-related models. For example, in [8], a model based on fuzzy logic is reported to predict ground vibration and environmental impact due to blasts in the open-pit mine. For this model, the toolbox fuzzy logic of MATLAB was used. In [38], ANN was used to predict the copper ore flotation indices of separation efficiency within different operational conditions.

Experimental Description.
Operational and piling data are available for attaining the objective set by SCM Franke company (environmental trade-off described in Section 2). The company keeps records of planning and copper recovery by heap leaching. These are called operational data (industrial operation). Work has also been done with data collected in a controlled environment. These data are known as piling data, which are the result of tests in leaching columns using strictly controlled measures on irrigation rates, acid concentration in irrigation solutions, and operational cycles.
For the specific case of this study, both operational and piling data were collected by two students in practice and Professor C. Leiva (students under the supervision of Professor C. Leiva, coauthor in this paper) all from the Chemical Department at the Universidad Católica del Norte, Chile. In a similar way to what worked in [9], the parameters of these data groups are fully described below: (iv) The height of a pile is defined by the production goals expected to be accomplished; that is, the piled fine copper tonnage with which the production to be obtained will be determined Due to the conditions of the process and operational decisions, the irrigation of some piles or modules in service was stopped, a fact that could render incongruent results when modeling the system. For this reason and with the purpose of avoiding unnecessary "noise" in the system, along with storing poor data for the statistic model, the records of the nonirrigation periods were deleted from the database.
3.3. Piling Data. Piling (or pilot plant) was conducted in two agglomerate tanks of the same dimensions with a material whose granulometry was less than 13 mm in diameter. The mineral was put in contact (irrigated) with a solution of sulfuric acid and water and refined to form lumps of fine material; this was made in order to give the mineral a proper uniform size for the leaching stage and also help copper sulfidation via contact with acid solutions. The aforementioned conditions vary according to leaching cycles to obtain piling scenarios as close as possible to actual pile mineral exploitation. Piling data were obtained in the same way as explained for operational data.
3.4. Random Forest. As previously mentioned, random forest (RF) is a predictive model based on decision tree (DT). The RF supervised learning algorithm is based on the machine learning theory which belongs to the ensemble methods family [34]. These methods use supervised learning methodology over a set of labelled data (training set) to make predictions and produce a model which can be later used to classify nonlabelled data [39]. It uses supervised learning methodology to collect data from parameter values and threshold values, working on a set of training data [40]. The method combines the idea of bagging with the random selection of characteristics, so as to build decision trees using controlled variance [37].
The RF model is successfully used in classification and regression tasks, operating via the construction of multiple decision trees during training, with the purpose of discovering patterns existing in data. The method generates several trees as subsets by combining several automatic learning algorithms appropriately selected [33]. This method is a general technique of random decision trees that combines the idea of bagging with a random selection of characteristics, with the intention of building decision trees with controlled variance [34,35].
RF is an ensemble method for classification and regression tasks, which operates through the construction of multiple decision trees during training [34]. Additionally, RF is useful for calculating the influence of predictive variables on the target and also for calculating the importance of each of these influences over the target. The calculation of this importance is made with a metric calculated according to impurity decrease in each node used for partitioning data. In case of a classification, the class determined corresponds to the mode of the classes provided by each tree. In case of a regression, it corresponds to the average prediction of individual trees. Random decision trees correct the DT tendency to overadjust to their training set [41].
3.5. Case Study. Using operational and piling data, a case study was conducted with a database of about 30,000 records. For each parameter above, discrete values of low, normal, and high were devised according to threshold values previously defined by SCM Franke, which are commonly used in copper leaching. In particular, this discretization considered data standard deviation (σ) defining low (low value of the variable), corresponding to values lower than a -σ; normal (normal value of the variable), corresponding to values at the interval [-σ, σ]; and values considered high (high value of the variable), that is, those greater than +σ.
3.6. Methodology. The methodology consists of 4 steps. The initial step to collect data of both operation and piling are considered a stage previous to the methodology described below since these data (mainly operational data) were collected during several years of operation. Parameter values were grouped in periods including days of operation while class (recovery) is described for each day of operation per each period. Figure 1 shows examples of what was described above. Figure 1(a) shows daily recovery in two consecutive periods of operation, while Figure 1(b) shows daily recovery in two consecutive periods, but with pilot plant (piling) data. In detail, the steps of our methodology are as follows: (1) Data Preparation. This stage included filtration tasks and data selection per leaching cycles. Plant data were obtained with a frequency of four hours in one year. Due to process conditions and operational decisions, the irrigation of some piles or modules in service was stopped during some periods, a fact that could render incongruent results when modeling the system. To ensure operational data congruence, records corresponding to irrigation suppression periods were deleted from the database; these records were being To make the analysis in stage 3 above, a confusion matrix was considered. The confusion matrix facilitates the analysis necessary to determine an error in the classification, through a sample of error distribution in the different categories.
In this matrix, performance indicators [42] frequently used to evaluate classifier performance are described. They are accuracy (Acc), recall (r), and precision (p). The way these indicators are calculated is described in Equations (1)-(3). The simplest indicator to evaluate a classifier performance is accuracy (Acc), corresponding to sample ratios correctly classified in the total number of examples of the dataset [33]. This indicator can be calculated on the basis of confusion matrix data according to Equation (1) (the dataset is supposed not to be empty). The other indicators, recall (r) and precision (p), are understood as relevance measures.
The p value is the ratio of true positives (a) among the elements predicted as positive (a + b). Conceptually, p value refers to the dispersion of the value set obtained from repeated measures of a quantity. Specifically, a high p value indicates low dispersion in measures. The r value is the ratio of true positives predicted among all the elements classified as negative.
where a is the true positives, b is the false positives, c is the true negatives, and d is the false negatives.

Results
The problem described above was dealt with as a regression instance, looking for obtaining a copper recovery prediction numerically from data in each dataset (operational and piling). So, a model was obtained for both operational and piling data, the importance of associated variables being studied in both cases. To obtain the models, the free Rapid Miner Studio v 9.0 was used. The strategy used in the model generation process was, first, preparing data according to task 1 of the methodology above. After the data preparation process (according to Section 3), a file with 1638 records for piling and another with 2001 records for operation were obtained (both files in CSV format). Previous studies such as [12,34,43] indicate that a minimum value of 1000 input cases for RF minimizes error in the classification and, at the same time, enables RF to make more stable predictions. So, both datasets are considered appropriate for generating the models.
In order to prepare the model evaluation and in a similar way to what is done in [34], a parameter tuning phase was performed. The models were evaluated using these parameters (40-fold crossvalidation 10 times) and averaging final results were taken. But the results of this validation were not good, for roundness. So, a method based on hold-out validation and similar to that performed in [34] was done as follows: for each dataset and using our defined optimal parameterization, one part of each dataset was taken to adjust the model and the rest of the sample for testing. In detail, to adjust the models, 70% of the total data in each dataset was used, leaving the remaining 30% for conducting the validation. The results and details of this are presented below. Table 2 summarizes the values obtained with RF in the parameter optimization process during training with the piling dataset. The parameters of interest for the optimal parametrization obtained in this model, that is, confidence (Con), number of trees (NoT), max depth (MDp), and accuracy (Acc), were used for interpreting results; these values are related to the confidence in a random tree model [43,44].

Model Based on Random Forest Using Piling Data.
Parameter Con is related to relative error, according to studies such as [1,44]. Therefore, the values of Con = f25, 40, 55, 70, 85, 100g were used for grouping the values of NoT, MDp, and Acc. Figure 2 shows the values of Acc for each value of Con. Figure 2 also shows that all the graphs indicate a decreasing trend for parameter Acc, except for Con = 40. In this figure, the best mean value of Acc is for Con = 100, the following best values being for Con = 40, 55, and 85. In all cases highlighted as the best, the average value of tree depth (MDp) is 8.5. This may be interpreted as follows: the best combination of parameters is given when the mean tree depth of 8.5 is achieved; that is, this value represents the optimal depth in this classification.
On the basis of the piling data, the confusion matrix of this model was also obtained. In this optimization, 80% data were used for crossvalidation and 20% for validation [40] ( Table 3). Table 4 shows the importance of variables for this model. The most important variable is "agglomerate H dose,", followed by variable "RL." In contrast, the least important variable is "soluble Cu." Variables "operation day," "H fed," and "CO 3 grade" are over 10% of the value of importance, a fact that may be interpreted as their having a good predictive capacity for this model. This is not so for variable "Soluble Cu," which does not exceed the threshold value of 10%.

RF-Based Model Using Operational
Data. This section describes the results obtained with the operational data. Table 5 summarizes the statistical values obtained with RF in the parameter optimization process during training with the operational dataset. Like the model using piling data, parameters NoT, MDp, and Acc of optimal parametrization were used for interpreting results, grouped according to parameter Con. Figure 3 shows that all the graphs indicate a decreasing trend for parameter Acc. Also, all the mean values of Acc are quite close to one another ( Table 5).
As can be seen in Figure 3, the best is when Con = 25. Other important aspects are, on the one hand, that the mean depth of trees increased (mean value = 12:7) as compared with the previous model (mean depth = 9:2). This indicates that a greater number of depth cases per each tree were classified, which is good for the model. On the other hand, the number of trees decreased (mean value = 23:4) as compared with the number of trees of the piling data model (mean value of the number of trees = 62:5). This may indicate that, as a whole, data were easier to group for the model algorithm.
Thus, on the basis of the abovementioned data and as shown in Figure 3, it may be stated that optimal parametrization for the operational data model is better than its equivalent with piling data.
Similar to the previous piling model, the confusion matrix for this model was also obtained, optimization procedure being the same as the previous model. Table 6 shows that all the values of recall (r) exceed 93%, the lowest being for the label high, thus coinciding with the previous model. Given this coincidence, the conditions for classifying records in this label should be improved to make future classifications better. The performance of the model is reliable, given the value p = 98:90% and the value of accuracy. Table 7 shows the importance of variables for this model. The two most important variables here are the same as those of the piling data model (agglomerate H dose = 22:76% relative importance and RL = 18:86% relative importance). As Table 2: Mean values of NoT, MDp, and Acc (as a percentage) for each value of Con; optimal parametrization for each operational set.

Con
NoT Journal of Sensors can be seen in Table 7, the order or importance of variables is the same as shown in the previous model (Table 4), but the importance values are different. The least important variable in Table 7 is the same as in the previous model (Cu soluble). For this model, the percentage value of soluble Cu decreased in about 1%. This means that, although the order of importance of variables is maintained, the relative importance of the variables changes with respect to the previous model. Since this model was developed using operational data, it is prudent to consider that this order of importance is the most convenient. Figure 4 illustrates the contrast described above.    Figure 5 summarizes the importance of variables according to RF models for each experiment. Particularly, the figure shows that variable H+fed (volumetric flow of ILS solution) is the most important, followed by variables RL, total Cu grade, and day of operation. The order of importance of the variables remains in both classifications; that is, reproductivity of the conditions of the leaching pile in a controlled environment (piling) is an accurate representation thoroughly describing the pile, and therefore, piling can be used to predict pile copper recovery, with a much lower cost and reliability in the predictive model resulting from piling.

Discussion
Artificial intelligence techniques, specifically soft computing, are being used in productive industry to generate predictive models that improve industrial activity [25]. Random forest (RF) was used in this study to predict copper recovery by leaching. Predictive models using RF have been recently published by the mining industry, showing good results such as those reported in [3,12,33], but these studies were directed to objectives different from copper recovery prediction.
In recent papers such as [9], artificial intelligence computing tools (particularly machine learning algorithms) have been reported, but no evidence of the use of RF has been found in the literature to predict copper recovery. However, these works have helped to identify and relate information that directly influences to improve the copper recovery process by leaching.
The study published in [3] highlights that machine learning algorithms, since they are artificial neural networks, regression trees, random forest, and support vector machines, make up powerful tools currently scarcely used in the copper mining industry, though there should be a tendency to increasingly use these machine learning tools in the present mining industry.
In RF, each tree is developed on the basis of the bootstrap algorithm philosophy. This may mean that the classification obtained for each tree is precise, thus causing a positive impact on the models presented here. In addition, this philosophy of work has made it possible to use all datasets in the classification and generate the models. The model precision obtained in this study is similar in both cases. The model for both datasets shows that a wealth of information was used to interpret the influence of predictive variables on class. For example, the order of the variables of interest is similar in both models and the performance shown by variables Acc, p, and r enables concluding that both models have a good quality and could be used to predict copper recovery in new cases with a good reliability value.
The capacity to identify the importance of variables for the model using training data (piling) is similar to the one shown by the model using actual data (operation). This was an expected result since the leaching material was the same in both cases, but this result validates the applicability of the machine learning algorithm selected for generating the models.
On the basis of the above described information, the objective of environmental trade-off was accomplished because model performance is optimal, and in both cases, the greatest number of records was classified as normal, when the acid irrigation rate lies between 20 and 50 g/l (normal value).

Conclusions
Copper recovery prediction by hydrometallurgical methods and, particularly, leaching is usually made with the help of mathematical models, but soft computing techniques can help create complex computational models [45] that help in this prediction. Recently, an increase in using soft computing tools in the industry has been observed [9,13,39], but in this particular case, the literature does not contain many studies reporting the use of RF to generate a copper recovery prediction model.
This study resulted in the generation of two copper recovery prediction models using the leaching method. Actual data (operation) were used in one of the models, while the other model was generated with hive-simulated data which had the same characteristics as the material to be leached and the lixiviant. In both cases, the models achieved an excellent predictive quality, one of the cases reaching 100% prediction for the label high, the mean being higher than 95% precision. In this way, it excelled in what was posed in the objective of this study (described at the beginning of this document).
As recently published in [9], a comparison between a linear model and an artificial neural network (ANN) for predicting copper recovery is made. One of the conclusions of this study is that ANN exceeds the linear model in terms of precision, but as conclusion at the present work, the interpretation capacities of RF-generated models exceed those of  This study helped make a comparison between two copper recovery prediction models in the same work context. Adjustment precision measure indicates that the RF algorithm is highly useful for processes to predict future copper production.  4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Relative error Weeks of operation      In addition, experience was gained for defining and implementing the predictive model in the leaching domain on this specific work context. This experience may be used for other simulations of processes relative to the improvement of results to obtain copper at SCM Franke by means of soft computing techniques or other companies of the same industrial production sector.
What was said about model performance, the capacity to identify the influence of variables on class, and the capacity to interpret results, etc., is very important in the copper industry because it allows generating supporting tools for material exploitation planning, along with viewing, via indicators generated with this type of model, copper recovery results in the presence of a certain material. It also allows properly selecting both the most influential variables and the values of those variables to achieve the desired recovery. This may have a considerable impact on the intelligent exploitation of this mineral, considering the increasing demand and lack of this industrial activity.
To conduct this study, a methodology was proposed; results obtained by following the methodological steps devised show excellent quality and are replicable for other copper leaching piles to study the future performance of copper recovery using the prediction method. Also, this methodology can be transferred to other copper leaching processes, including the knowledge of this particular process to generate a predictive model. In this way, this study may indicate a future line of research.

Data Availability
The input data used to support the findings of this study could be available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflict of interest.