Data Mining of the Thermal Performance of Cool-Pipes in Massive Concrete via In Situ Monitoring

Embedded cool-pipes are very important for massive concrete because their cooling effect can effectively avoid thermal cracks. In this study, a data mining approach to analyzing the thermal performance of cool-pipes via in situ monitoring is proposed. Delicate monitoring program is applied in a high arch dam project that provides a good and mass data source. The factors and relations related to the thermal performance of cool-pipes are obtained in a built theory thermal model. The supporting vector machine (SVM) technology is applied to mine the data. The thermal performances of iron pipes and high-density polyethylene (HDPE) pipes are compared. The data mining result shows that iron pipe has a better heat removal performance when flow rate is lower than 50 L/min. It has revealed that a turning flow rate exists for iron pipe which is 80 L/min.The prediction and classification results obtained from the data mining model agree well with the monitored data, which illustrates the validness of the approach.


Introduction
Temperature is an important factor that affects massive concrete structures' health [1].When the volume of a concrete structure is so large that it generates excessive heat and associated expansions [2], which may lead to a risk of temperature cracks, we can name it "massive concrete" [3,4].Postcooling measures, such as embedding cool-pipes, are commonly included in the design to mitigate such effects.
The first significant application case of cool-pipes was in a small experimental section of the Owyhee Dam in Oregon in 1931, which was the tallest dam in the world at the time of its completion [5].The Bureau of Reclamation believed the Owyhee experiment was successful and extended its application in the construction of the Hoover Dam in 1936.In every slab coil of 1-inch-thick-walled steel pipe running water was embedded.More than 582-miles-(937 km) long cool-pipes were placed within the concrete [5][6][7].The success of bedding steel cool-pipes in the Hoover Dam made the method as famous as the dam itself.The method has thus been utilized in a number of large-scale massive concrete projects [8][9][10] over the last 70 years.Several researchers have carried out studies on the thermal effect of cool-pipes [7,8,[11][12][13][14][15][16].The Bureau of Reclamation first studied a theory solution of long iron pipes [7].Zhu and Cai have made a great effort to conduct calculation for cool-pipes by using finite element methods [12,14].Much research has focused on the calculation model and method of cool-pipes based on assumptions (i.e., the pipe is straight; heat transfer between pipe and concrete is constant) that make the problem simple and solvable.However, during the construction of a massive concrete structure, the heat transfer process is very sophisticated.The cool-pipes are embedded in multiple U-shapes and the heat transfer between fluid (water) and solid (pipe) is very complicated.A recent study pointed out that the convection coefficient cannot be considered constant [13].So study of in situ thermal performance of coolpipes is needed, which can help us acknowledge the thermal mechanism of cool-pipes well.
To the knowledge of the authors, few studies have reported the statistics of the thermal performance of coolpipes in actual concrete engineering studies, based on real measured data, especially for large-scale projects.An important reason for this scarcity is the lack of adequate monitoring measurements and detailed field data analysis.Fortunately, a 285 m high arch dam project under construction gives a perfect opportunity to conduct the work.During the construction period of the dam, the temperature and the flow of every cool-pipe in every slab are monitored.A large amount of data is produced; we implement data mining of thermal performance of cool-pipes in this paper.
Many mathematical methods can be employed in data mining [17][18][19][20][21][22][23][24][25][26][27].Support vector machine (SVM) is a novel machine learning method developed on the basis of statistical learning theory [28,29].It is one of the most popular methods for pattern recognition, regression estimation, and time series prediction problems [30].A typical process of SVM training can be summarized into four steps as follows [31]: selecting kernel function, selecting the smoothing parameters of the kernel function, choosing penalty factor, and solving the quadratic problem.It has been shown that compared with other machine learning methods, such as artificial neutral network, SVM not only is easier to use [32] but also leads to higher accuracy and robustness.For specific cases, it converges 10 to 100 times faster in training [33,34].
Our paper is organized as follows: Section 2 describes the delicate thermal monitoring program, including the temperature measurement of concrete, inlet-outlet temperature, and flow of cool-pipes.Section 3 shows the establishment of the thermal model to guide the data mining.Section 4 describes the SVM method and its relevant concepts.Section 5 presents the details of the monitoring database and a developed automatic data mining program flow.Section 6 provides the data mining results, including the calibrated parameters, numerical model to predict outlet cool-pipe temperature, the evaluation of the thermal performance, and pipe material classification.And Section 7 provides our conclusions.

In Situ Monitoring
2.1.Engineering Description.Xiluodu Dam is located in the middle of the Jinsha River, Yunnan Province in Southwest China.It is designed as an arch dam with a maximum height of 285.5 m and a crest length at the top of 680 m [8].Based on the project plan, the amount of concrete utilized was estimated to be 1315 × 10 4 m 3 during the construction period from 2009 to 2014.The rear view of the dam being constructed is shown in Figure 1.
Across the river flow direction, the dam is divided by 31 monoliths [35].Vertically, each monolith consists of a number of slabs poured at different times.Figure 2 shows the slab zoning map.Each slab is individually poured in a formed hexahedron block with a vertical thickness of 3.0 m or 1.5 m.The first slab was poured on March 27, 2009, and 2158 slabs altogether had been placed by December 2013.These slabs are the main study objects of the research presented in this paper.
Cool-pipes were horizontally embedded in every slab to control the concrete temperature properly.Circulating water was supplied via two large cooling water units, which could stabilize the supply temperature at a constant level on demand.Considering that water temperature increases along the flow direction, the flow direction was changed twice a day to make the concrete temperature field more uniform.Iron pipes and high-density polyethylene (HDPE) pipes are employed in this project.

Concrete Temperatures.
Since the start of the dam construction, digital temperature sensors were installed during the pouring of every concrete slab.Staggered cool-pipes were vertically embedded every 1.5 m (i.e., two layers in a 3.0 m thick slab).The sensors were vertically positioned in the middle of two cool-pipe layers, as shown in Figure 3. Concrete temperature was measured and recorded for four or five times per day.

Cooling Water Temperatures and Flows.
Figure 4 shows the schematic layout of the cool-pipes in one slab.Faucets were connected to both ends of each cool-pipe.By letting water run from the tap, the inlet and outlet temperatures of the cool-pipes were measured using a mercury thermometer and recorded four or five times per day.At one end of every pipe, one water meter was installed (Figure 4).The cooling water flow rate was measured and recorded.

Thermal Model
A simplified thermal model of cool-pipe and concrete is sketched in Figure 5. Water flows through a pipe whose wall clings to the concrete.Heat transfer per unit time is discussed below.
The heat energy loss from water per unit length, which is denoted as   , can be expressed as follows: where   is the density of the water,  , is the specific heat capacity of water,   is the water temperature,   is the temperature difference along the pipe (per unit length),  is the water velocity, and   is cross section area of the pipe.The heat energy income for the concrete per unit length, which is denoted as   , can be calculated by Newton's law of cooling, which is also called the Robin boundary condition where  is the heat transfer coefficient between water and concrete,   is the concrete temperature, and   is the contact area of pipe wall per unit length.  and   can be further expressed by the geometry property of the pipe where  is the diameter of the pipe and  is the length of the pipe.
Propose that   is constant along the pipe and let Δ represent (  −   ).The differential of   can be regarded as the differential of Δ: Based on the conservation of energy,   =   .We can obtain that Integrating ( 5) along the whole pipe gives where subscripts in and out represent the inlet and outlet of the pipe, respectively.If the inlet status of the water pipe is known, the outlet water temperature can be determined by the following equation: Temperature rise along the pipe  -out − -in is an important indicator which directly gives the total energy loss of water.It can be obtained by (7): The physical properties of the water,  , and   , are constant in usual environment conditions. is determined by the pipe material.In engineering cases, the diameter of a pipe material is always fixed.In China's dam projects  for HDPE pipe and iron pipe are 32 mm.Let  *  denote  -out −  -in and  *  can be seen as the function of some parameter as follows: With  *  known, the absorbed heat power by cool-pipe, which is denoted as , can be calculated by the following equation: where   =   is the flow rate of the cooling water.Actually, the thermal process between the concrete and the cool-pipe is very complex, which has some differences with the theory model proposed above.
(1) Different pipe materials have different  values.The value of  is hard to determine and may change with flow states.
(2) Concrete temperature is not constant along the pipe.
(3) Pipe is embedded in multiple U-shapes like a long crawling snake, not in a straight line.
But (9) can give a good guide for us to conduct data mining, which tells us that  *  is relevant with the inlet temperature difference (Δ in ), cooling water flow speed (), pipe length (), and the pipe material ().where

Data Mining Model
The object is to find a function () ∈ R  that can separate the two classes with a margin as large as possible.
Linear estimation function to achieve the goals can be expressed as follows: where w and  are weight factors to be determined.Many possible linear classifiers exist that can separate the data.As shown in Figure 6,  * is the classifier plane, and H1 and H2 are the planes that cross the nearest data point and parallel to the classifier plane (the sample points on H1 and H2 can be called support vector).The basic idea of SVM is to find an optimal plane that maximizes the classification margin between H1 and H2.One way to ensure this is to minimize the norm of w [36], that is, The constraint condition of ( 13) can be expressed as In particular cases, data of different labels cannot be clearly separated.A set of slack variable {  } is introduced that measure the amount by which the constraints are violated, as shown in Figure 6.Equation ( 13) is then transformed into the following form: where  is a penalty constant chosen a priori, which determines the cost of constraint violation.As is well known, (15) is a quadratic programming optimization problem.The method of Lagrange multipliers is a good strategy to find the solution of an optimization problem considering the constraint conditions.Solving (15) can be converted into finding the saddle point of the following Lagrange function: where   ≥ 0,   ≥ 0 is the Lagrange factor.The saddle points can be obtained by Substituting ( 17) into ( 16) yields the dual optimization problem: The solution  *  for (18) can determine the optimal parameter w * : With w * , the optimal parameter  * can be obtained from the boundary condition in (17), which can be written as follows: The optimal hyperplane decision function can be finally written as where  is a loss precision, which is employed to create an intensive band near the regression line, as shown in Figure 7.
A double-set of slack variable { make the constraint more feasible.Equation ( 13) is then transformed into the following form: Similar to classification, (23) can be converted into a Lagrange function where With w * , the optimal parameter  * can be calculated from the boundary condition in (25).Then the optimal regression function can be finally obtained.

Kernel Function.
Actually the algorithms in Sections 4.1 and 4.2 are specified in linear space.Either in classification or regression problem, the input data x is projected into  by function , which is built during training.Furthermore, (x) = w ⋅x + can be expanded into (x) = w ⋅(x)+  to solve more complicated problems, such as nonlinear (1) Create mesh grid of , , .
problems.Thus, formulas ( 18) and ( 26) can be rewritten as follows: (  ,   ) = (x  )  ⋅(x  ) is defined as the kernel function.A kernel function provides a way to project data into a higher dimensional space by manipulating in its original space.The kernel function is chosen beforehand; then, a subset of vectors x  is determined during training, which is called a "support vector." The most commonly adopted kernel functions include polynomial function, radial basic function (RBF), exponential function, and ANOVA function.The RBF is chosen in this paper for it is suitable for nonlinear problems, which can be written as

Parameter Calibration.
As can be deduced, parameters  in RBF and  and  in SVM greatly influence the classification/regression result.Proper parameters are significant in obtaining satisfying results; thus, cross validation (CV) is introduced.CV was first introduced to fix the overoptimistic result [37] derived from training an algorithm and evaluating its statistical performance on the same data [38,39].-fold CV is a common method used in many statistics problems, such as biometrics [40], computational physics [41], and artificial intelligence [42].In this method, the data set is divided into several subsets: where  ⩾ 1 is a integer.Every subset becomes the test set once, so that the training results get closer to the distribution of original data without much increase in computing cost.
Combined with "grid-search, " -fold CV is adopted in this paper to identify good parameter combination of (, , and ) that makes the SVM model work well.The process can be summarized in Algorithm 1.

Data Processor and Programming
5.1.Database.We established a database to store quantities of monitoring data.Microsoft SQL Server (in Version 10.50.1600) is applied as the database engine.All the monitored data introduced in Section 2 were stored in the database.

Data Cleaning.
Data cleaning is carried out as a preprocedure of data mining; some work is done as follows.
(1) Monitored data that has a character of instrument failure is filtered out.
(2) The flow direction is changed every day, and each pair of  -out and  -in are automatically compared to guarantee that  -out >  -in .
(3) Pipe length is not studied as a variable in the data mining process, for most pipes are 220 m long in the engineering.So for all the pipes, the length is considered as a constant and not selected during the data mining procedure.
(4) HDPE pipe's diameter is identical as, in the iron pipe in the engineering, flow rate   is adopted in data mining instead of the water speed  for convenience.

Data Extractor.
A set of programs is developed in Python language to seek the database and export data samples.Given the massive stored data, the program is developed in parallel to accelerate the processing speed.
Taking data mining on  *  as an example, the relevant data include pipe material, inlet and outlet water temperature, water flow, and the concrete temperature.They are stored in three different data tables, the structures of which are shown in Figure 8. Data extraction is carried out based on Algorithm 2.

Data Mining Implementation.
The SVM method is developed in Python programs to model the thermal performance.Based on the integration of database, data extractor, and SVM model engine, the whole program flow of data mining is shown in Figure 9.
(2) for each Pipe in Pipes do: (3) Select * from TbSensorInfo where SlabId = Pipe.SlabId, record them as Sensors.(4) Select * from TbCoolRecord where Id = pipe.PipeId, record them as Rows.(5) for each Row in Rows do: (6) for each Sensor in Sensors do: (7) Select Temperature from TbTemRecord where TemTime nearest * to Row.CoolTime and SensorId = Sensor.SensorId (8) end loop Sensors (9)   = average(sum(Temperature)) (10) output   ,  -in ,  -out , Flow (11) end loop Rows (12) end loop Pipes Note * : all the temperature sensors' reading in the slab are collected.If the nearest time difference is larger than a threshold, the data of the time will not be used.

Results and Discussions
6.1.Parametric Analysis.The SVM coefficients including , , and  need to be determined before utilizing the method.To the knowledge of authors, no determinate function exists that directly obtains the optimal coefficients for a model.So, parametric analysis is carried out to calibrate the model.
Taking prediction of outlet temperature as a study case,  *  is employed as the target variable whereas Δ in and   are adopted as the observed variables.The data of cool-pipes embedded in monolith 15# is used as the training data set, and monolith 16# is used as the testing data set.The grid search method integrated with -fold CV is used here to find optimal parameters, as mentioned in Section 4.4.The mean squared error is applied to evaluate the model performance, which is expressed as  where x stands for the combination of Δ in and   and  stands for  *  .A small MSE indicates a good fitness of the numerical model.The ranges of  and  are both set to be (2 −5 ∼ 2 5 ).The results of MSE of different  are drawn in logarithmic plots with log 2 () and log 2 () as the x-and yaxis, respectively.
As shown in Figure 10, the model has a very well fit effect when  and  are large for training data set.But it performs relatively poor for the testing data set.This situation is called the overtraining phenomenon.Optimal SVM coefficients are chosen as the one that fits the testing data set best, in order to avoid the phenomenon.The optimal results are listed in Table 1.

Prediction of Water Temperature Rise.
As written in Section 3, the water temperature rise of cool-pipe,  *  , is relevant to the inlet temperature difference and cooling flow rate.We employ the data of monolith 15# as the training data set to model the cooling performance.All the other slabs are used as testing cases.The predicted outlet temperature agrees well with the monitoring result.Four slabs are taken as examples in Figure 11.

Thermal Performance Comparison.
The pattern of thermal performance of cool-pipes can be found in the numerical model that has mined the data mentioned above.The numerical results of  *  at different combination of Δ in and   are listed in Table 2. Three-dimensional visualization results of HDPE and iron pipes are also compared in Figure 12.From the results, conclusions can be given as follows.
(1) Outlet water temperature of HDPE pipes decreases with the water flow rate in a lower flow range, which is less than 30 L/min.But in a higher flow range, say more than 40 L/min, the outlet temperature is intensive with the flow rate.The number enclosed in brackets is the iron result, and the number outside of the brackets is the HDPE result.higher than this turning point, which can be seen in Figure 13.The turning point for iron pipe is at 80 L/min.
(3) Higher Δ in results in higher  *  , which is obvious in low flow rate range.When   = 5 L/min,  *  is almost Δ in .When   is large, the difference of Δ in does not take obvious change in  *  .(4) Absorbed energy rate by cool-pipe can be calculated by (10), which is an important index for the design of cooling procedure.For the same flow rate, higher  *  means that more heat can be released by the cool-pipe.The absorbed power by HDPE and iron cool-pipes is plotted in Figure 14.It can be seen that power increases with the flow rate and has a saddle point when   is high.
(5) The turning point mentioned in (2) is an interesting point.It can be seen that the absorbed power's saddle point locates near the point.We can name the point as the turning flow rate.For iron pipes, the turning flow rate is 80 L/min, the detailed reason of which will be deep studied in our future researches.
(6) Iron pipe has a better performance when flow rate is lower than 40 L/min.When flow rate is higher than 60 L/min, HDPE pipe can have a better performance of absorbing heat.
(7) When Δ in = 2 ∘ C and   > 80 L/ min, HDPE pipe has a strange decline curve, which is not consistent with other situations.The reason has not been explored by the authors by now and will be studied in detail.

Classification of Different Pipe Materials.
The pipe material determines the value of  in Section 3. In order to correctly classify the pipe materials, Δ in ,  *  , and   are adopted as the variables, and iron/HDPE is used as the classification label.The data of monolith 15# is used as the training data set and 108 slabs are used as the testing data set.
A large number of data exist for a slab as the product of the monitoring program.The labels of HDPE pipes are defined as 0 and iron ones are defined as 1.Every day's data of the slab is tested and labeled by the SVM classifier, and the following equation is used to estimate the pipe material of the slab: Material = { { { { { HDPE when avg < 0.4 iron when avg > 0.6 alternative when otherwise, (32) where avg is the average value of the label values.The classification result is shown in Figure 15.The detailed classification results of two slabs are plotted as examples, as shown in Figure 16.
From the classification result, we can see that the SVM model can classify the different pipe materials well, with accuracy at 83%.The HDPE pipes have higher accuracy than iron pipes.The result shows that the classification by SVM method is effective.

Conclusions
The heat transfer process inside the cool-pipe embedded concrete is very complex.Most current researches focus on the theory calculations.Statistical study on the actual measurement is limited.Based on a delicate monitoring program during the construction period of a super high arch dam, data mining on the thermal performance of the cool-pipes is carried out.SVM is applied as the data mining method.With the direction of thermal model built in this paper, the relative factors of outlet temperature of coolpipes are determined and the relationship is numerically mined.The prediction result has a good agreement with the monitoring data, which verifies the validation of the approach proposed in this paper.The thermal performances between iron pipes and HDPE pipes are also analyzed in detail in this paper.Iron pipes have better performance when flow rate is low, and HDPE have better performance when flow rate is large.Iron pipes have a turning flow rate of 80 L/min.The classification of pipe material is also conducted in this paper, which has correctly automatically distinguished HDPE pipes and iron pipes from monitoring data.

Figure 1 :
Figure 1: Picture of the dam under construction.

Figure 4 :
Figure 4: Schematic diagram of the cool-pipes layout.

4. 1 .
SVM Classification.For a classification problem of two classes, a set of sample data  is given as

Figure 11 :
Figure 11: Prediction outlet temperature and monitoring temperature.

Figure 12 :
Figure 12: 3D results of the numerical model.

Figure 13 :
Figure 13: Turning point of iron pipe.
Regression can also be performed by SVM.The goal is to find a function (x) that can achieve the targets for all data points and is as flat as possible.By flat, we mean a small w, which can generate a problem as follows: ) 4.2.SVM Regression.subject to:       −  (x)     ≤ ,  = 1, 2, . . ., ,

Table 2 :
Result of  *  ( ∘ C) output by the numerical model.