We propose a deep learning approach to better utilize the spatial and temporal information obtained from image sequences of the self-compacting concrete- (SCC-) mixing process to recover SCC characteristics in terms of the predicted slump flow value (SF) and V-funnel flow time (VF). The proposed model integrates features of the convolutional neural network and long short-term memory and is trained to extract features and compute an estimate. The performance of the method is evaluated using the testing set. The results indicate that the proposed method could potentially be used to automatically estimate SCC workability.
Predicting the workability of self-compacting concrete (SCC) during the mixing process is an important research problem in construction engineering. SCC is a highly workable material that can flow to fill gaps in reinforcements, corners of moulds, and voids in rock blocks, without exhibiting any vibration and compaction during the placement process [
Slump flow and V-funnel tests [
The problems mentioned above can be solved if the SCC workability can be estimated during the mixing process. Chopin proposed that a concrete mixer can be considered as a large rheometer [
To compensate for the drawbacks of the aforementioned methods, the dependency on human experience must be reduced, and methods that dig deep into the information hidden behind the original data must be used. An alternative approach is to employ automatic feature learning using deep learning- (DL-) based models.
DL deals with the problem of data representation by introducing simpler intermediate representations that can be combined to build complex concepts. Therefore, no specific techniques need to be applied to extract features that represent the image data [
Deep convolutional neural networks (CNN) are a successful DL model that provides an extremely powerful tool for learning visual representations. The success of Krizhevsky et al. in the ImageNet classification and localization challenges [
Compared to CNN, recurrent neural network (RNN) models are “deep in time” and can form implicit compositional representations in the time domain [
Machine learning techniques have been successfully used in several applications in the field of construction engineering, such as automatic crack detection [
Our objective in this study was to design a method for estimating SCC workability during the mixing process. In this paper, we propose an end-to-end trainable DL architecture that combines CNN and LSTM and helps estimate SCC workability. Instead of using specific feature extractors, we train a neural network to find the relationships between image sequences and SCC workability. Data preparation is performed to make the raw data suitable for training. Using the trained model, the SCC workability characteristics can be predicted automatically. Moreover, the strategy for choosing a proper time resolution is discussed to optimize the performance of the DL method.
The remainder of this paper is organized as follows. In Section
Data preparation involves compiling data in a format that is suitable for use by DL approaches that estimate the workability characteristics. Owing to the nature of concrete experiments, the amount of raw data available for this task were limited and techniques were applied to generate synthetic data. Another issue encountered in this task was that of overfitting, which is quite typical in the DL field. Preprocessing methods were used to avoid this situation. Finally, an operation called data augmentation was considered to further reduce the overfitting problem, without adding new information into the model.
The proposed model takes image sequences of the SCC mixing process as input. The raw data comprised videos of the SCC mixing process collected from various SCC experiments in the laboratory. Figure
Schematic diagram of the apparatus: (a) side view; (b) front view.
The single-shaft mixer used in the experiments.
During the mixing process, the smart phone was used to record a video through the opening hatch at a frame rate of 30 fps. Each video had a fixed resolution of 1,920 pixels × 1,080 pixels. After mixing and recording, traditional slump and V-funnel tests were conducted to label the mixing process videos with SF and VF values.
The cement used in all the experiments was 42.5R Portland cement. Polycarboxylate-based superplasticizer (SP), tap water, and fine and coarse aggregates were used to batch the SCC specimens. The SP, used as a water-reducing agent, had a 20% solid content. The fine aggregates comprised quartz sands with a maximum particle size of 5 mm, and the coarse aggregates included two types of crushed stones with maximum particle sizes of 10 and 20 mm, respectively. The relative densities of the fine and coarse aggregate were 2.83 and 2.69, respectively.
Each of the concrete mixes used in this study had a fixed fine aggregate content of 45% by volume and a fixed 20:10 mm coarse aggregate ratio of 1.5 : 1 by weight. The SCC volume for each experiment was 20 L. The single-shaft mixer described previously was used to mix the concrete. All the dry materials were initially mixed for 30 s. Then, water and SP were added, and the materials were wet-mixed for 240 s. The slump test and V-funnel test were conducted after the mixture was poured out. In some experiments, the produced SCC was kept in a container for a period of time, namely, 30 min, 60 min, and 90 min. The SCC mixture was then poured into the mixer again for a mixing. This kind of videos was gathered as well. After mixing, the slump test and V-funnel test were conducted again, making sure that the mixture appeared in each video had corresponding SF and VF values precisely.
Data comprising 31 videos of different workability characteristics were collected; the video indexes, mix composition, and workability characteristics of SCC are shown in Table
Mix composition and workability characteristics of mixing videos.
Video number |
|
SP (%) | Cement (kg/m3) | Water (kg/m3) | Hold time (min) | SF (mm) | VF (s) |
---|---|---|---|---|---|---|---|
1 | 1.10 | 0.90 | 527.74 | 183.46 | 0 | 575 | 35.8 |
2 | 1.10 | 0.90 | 527.74 | 183.46 | 30 | 355 |
|
3 | 1.10 | 1.00 | 527.74 | 183.04 | 0 | 460 | 45.5 |
4 | 1.20 | 0.80 | 503.75 | 191.78 | 0 | 480 | 14.5 |
5 | 1.20 | 0.80 | 503.75 | 191.78 | 30 | 350 | 24.1 |
6 | 1.20 | 0.90 | 503.75 | 191.37 | 0 | 705 | 43.9 |
7 | 1.20 | 0.90 | 503.75 | 191.37 | 30 | 565 | 83.0 |
8 | 1.20 | 0.90 | 503.75 | 191.37 | 60 | 510 | 14.7 |
9 | 1.20 | 1.00 | 503.75 | 190.97 | 0 | 610 | 9.9 |
10 | 1.20 | 1.00 | 503.75 | 190.97 | 30 | 585 | 9.2 |
11 | 1.20 | 1.00 | 503.75 | 190.97 | 60 | 495 | 56.8 |
12 | 1.20 | 1.00 | 503.75 | 190.97 | 90 | 360 |
|
13 | 1.30 | 0.90 | 481.85 | 198.60 | 0 | 610 | 18.1 |
14 | 1.30 | 0.90 | 481.85 | 198.60 | 30 | 485 | 69.0 |
15 | 1.30 | 0.90 | 481.85 | 198.60 | 60 | 340 |
|
16 | 1.30 | 1.00 | 481.85 | 198.21 | 0 | 700 | 10.2 |
17 | 1.30 | 1.00 | 481.85 | 198.21 | 0 | 690 | 9.9 |
18 | 1.30 | 1.00 | 481.85 | 198.21 | 30 | 690 | 52.9 |
19 | 1.30 | 1.00 | 481.85 | 198.21 | 60 | 655 | 30.8 |
20 | 1.30 | 1.00 | 481.85 | 198.21 | 90 | 440 |
|
21 | 1.11 | 0.85 | 525.24 | 184.50 | 0 | 625 | 26.0 |
22 | 1.11 | 0.90 | 525.24 | 184.29 | 0 | 675 | 13.0 |
23 | 1.13 | 0.85 | 520.31 | 186.12 | 0 | 598 | 16.4 |
24 | 1.13 | 0.90 | 520.31 | 185.91 | 0 | 675 | 13.0 |
25 | 1.15 | 0.85 | 515.47 | 187.72 | 0 | 600 | 13.0 |
26 | 1.15 | 0.88 | 515.47 | 187.61 | 0 | 679 | 31.0 |
27 | 1.15 | 0.90 | 515.47 | 187.51 | 0 | 648 | 13.0 |
28 | 1.17 | 0.85 | 510.71 | 189.28 | 0 | 680 | 11.6 |
29 | 1.17 | 0.90 | 510.71 | 189.08 | 0 | 668 | 9.8 |
30 | 1.18 | 0.85 | 508.37 | 190.05 | 0 | 668 | 64.0 |
31 | 1.20 | 0.90 | 503.75 | 191.37 | 0 | 710 | 52.0 |
It is noteworthy that the SCC mixtures corresponding to videos 2, 12, 15, and 20 were blocked during V-funnel tests because of large viscosities. The video data contained both spatial and temporal information, which were learned by the DL model. The frames of the videos were extracted as a number of images right after recording, as shown in Figure
The frames extracted from videos.
Preprocessing methods were applied to each frame, serving two purposes. First, it substantially reduced the resolution of the input image, thereby decreasing the computational requirements of the neural network. Second, it prevented the overfitting problem from affecting the prediction. The idea of overfitting is introduced in the next subsection. The procedure used for the preprocessing comprised the following steps: (1) conversion of RGB images to grayscale, (2) affine transformation, (3) extraction of the region of interest (ROI), and (4) histogram equalization. Figure
Flowchart for preprocessing images.
Several preprocessing methods can be used to prevent the overfitting phenomenon. The overfitting problem occurs when a model fits the training data too well. In other words, the model learns the details and noise in the training data to such an extent that it negatively impacts the performance of the model on new data. Noise or random fluctuations in the training data are picked up and learnt as features by the model. However, as these concepts do not apply to new data, it negatively affects the model’s ability to provide accurate predictions.
For example, the SCC paste might leave some marks on the inner wall of the mixer hatch, which could be recognized as a feature of SCC by mistake, as shown in Figure
Overfitting example: (a) the mark being learned as the feature of SF = 340 mm; (b) resulting inaccuracy in prediction of image of the same SF value.
Considering the limited size of dataset, using some preprocessing methods may also provide an alternative. For example, clipping the wall of mixer from images would prevent the learning algorithm from recognizing the redundant information in this case. Besides the detail mentioned above, colour, illumination, and perspective distortion also comprise noises that should not be learned by the model. Therefore, corresponding approaches were applied to eliminate these as well.
In recognition tasks involving natural images, colour provides extra information, and transforming images to grayscale may hurt performance [
The effect of this preprocessing is shown in Figure
Converting the RGB images to grayscale.
As shown in Figure
Different tripod placement and shooting angles cause different distortions.
The affine transformation technique is typically used to correct geometric distortions or deformations that occur owing to nonideal camera angles. It was appropriate for use in this case as the transformation preserves collinearity and ratios of distances. Transforming and fusing the images to a large, flat coordinate system helped eliminate distortion and enabled easier interactions and calculations that did not require accounting for image distortion.
In this study, the transformation target was set to a rectangle of fixed size 350 pixels × 200 pixels, removing experimenter and environment. The coordinates of the four corners of the mixer hatch were then recognized. The transformation matrix was computed based on the mapping between the original coordinates and target size. As a result, the grayscale image was finally converted into a subimage containing the mixer inner wall, rotating blades, shaft, and moving SCC. After the transformation, useless visual information was partly eliminated from the processed images. Figure
Affine transformation to the images.
As noted above, the SCC paste might leave some marks on the inner wall of the mixer hatch, which could be erroneously recognized as a feature of the SCC. The simplest way to avoid this problem involved extracting the ROI without mixer wall from the affined images. This operation comprised three steps.
First, the three reference points used for extraction were identified. Assume that point A represents the bottom left corner of the mixer wall and point B is on the bottom right corner of the mixer wall. Let point C denote the bottom edge of the extracted region, which is defined as a point on the top edge of the rotating shaft.
The ROI was extracted using equation (
Finally, the extracted region was resized to 150 pixels × 50 pixels. Figure
Extracting the ROI.
Histogram equalization is used to flatten the image histogram. In this process, the image is modelled as a probability density function, and the processing attempts to ensure that the probability that a pixel will take on a particular intensity is the same for all values. This is especially used in images that have poor contrast. Images that look like they are too dark, washed out, or bright are good candidates for applying histogram equalization. On plotting the histogram, the spread of pixels is limited to a very narrow range. Performing histogram equalization flattens the histogram and provides a better contrast image. Implementing this stretches the dynamic range of the histogram.
In this study, illumination is a noise that should not be learned by the model. Therefore, histogram equalization was applied to each image to eliminate the illumination difference. Figure
Histogram equalization: (a) dark condition; (b) light condition.
As mentioned before, the amount of raw data were limited, and therefore, techniques to supplement this were applied. Figure
Flowchart for enlarging the amount of training data.
Each of the 31 videos was first converted into a sequence of images arranged in time series, which included visual information about the entire mixing process, as introduced in Section 2.1. It was observed that the sequence contained several mixing cycles as the blades kept rotating. Owing to engineering practice and existing research [
To expand the amount of data, two parameters were utilized; these are shown in Figure
Definition of segmentation length (
As mentioned before, the used mixer had a fixed rotating speed of 51 rpm and the video had a fixed frame rate of 30 fps. It was easily determined that each mixing cycle consisted of 35 images using equation (
After enlarging the data, the dataset was rearranged into several sequences using SF and VF values as corresponding labels; this is shown in Figure
The enlarged dataset.
Rotating blade phase angle and the corresponding images.
After data preprocessing and expansion, the raw data were transformed into several image sequences contained 7 processing images of fixed size 150 pixels × 50 pixels. As a result, the amount of data grew remarkably, minimizing the overfitting effect. To further reduce the overfitting problem, the conduction of data augmentation could be taken, which improved the performance of training without adding new information into the model. The operation of data augmentation were typically chosen to be label-preserving, such that they can be trivially used to extend the training set and encourage the system to become invariant to these transformations [
Data augmentation operation.
As illustrated in Figure
Architecture of the proposed model.
The network architecture and the selected parameters of layers were shown in Table
Trainable parameters of the model.
Layer | Kernel | Output shape | Function | Param # |
---|---|---|---|---|
Data | — | 7 × 50 × 150 | — | 0 |
Time distribute (conv 1) | 4 × 2 × 2 | 7 × 49 × 149 × 4 | ReLU | 20 |
Time distribute (pool 1) | 2 × 2 | 7 × 23 × 73 × 4 | — | 0 |
Time distribute (conv 2) | 4 × 2 × 2 | 7 × 23 × 73 × 4 | ReLU | 68 |
Time distribute (pool 2) | 2 × 2 | 7 × 11 × 36 × 4 | — | 0 |
Time distribute (flatten 3) | — | 7 × 1584 | ReLU | 0 |
LSTM 4 | — | 5 | — | 31800 |
fc 5 | — | 2 | — | 12 |
The model was implemented in Python using the Keras package. The CPU used was Intel® Core (TM) i7-4790, RAM was 12.0 GB, and GPU was NVIDIA GeForce GT 730.
A number of empirical hyperparameters were applied in the training. Cross entropy was employed as the cost function to measure the similarity between the predicted values and target values [
Samples of 5 randomly chosen videos were used for testing, while others served as the training and validation sets.
The learning and evaluation results are shown in Figure
Training and validation results.
After the training phase, the model was applied to the test sequences for further validation. The predictions of the sequences were computed together to obtain average SF and VF values corresponding to a specific video. Table
Prediction results of SF values.
Video number | GT SF (mm) | Prediction (mm) | RE (%) |
---|---|---|---|
1 | 575 | 587 | 2.1 |
2 | 355 | 386 | 8.7 |
4 | 480 | 487 | 1.7 |
25 | 600 | 677 | 12.9 |
27 | 648 | 676 | 4.4 |
Table
Prediction results of VF values.
Video number | GT VF (s) | Prediction (s) | RE (%) |
---|---|---|---|
1 | 35.8 | 18.2 | 49.6 |
2 | 200.0 | 176.6 | 11.7 |
4 | 14.5 | 21.5 | 43.5 |
25 | 13.0 | 8.3 | 35.9 |
27 | 13.0 | 14.1 | 8.1 |
The time resolution (
The
Time resolutions (
R | Lengths of sequences |
---|---|
3 | 11 |
5 | 7 |
7 | 5 |
9 | 3 |
Training time consumptions of different
The learning results obtained for different
Training results of different
Finally, the estimation accuracies of the same testing set were computed, as shown in Figures
Prediction results of SF values with different
Prediction results of VF values with different
Prediction results of SF values with different
Video number | GT (mm) |
|
|
|
| ||||
---|---|---|---|---|---|---|---|---|---|
Prediction (mm) | RE (%) | Prediction (mm) | RE (%) | Prediction (mm) | RE (%) | Prediction (mm) | RE (%) | ||
1 | 575 | 509 | 11.5 | 643 | 11.8 | 531 | 7.7 | 515 | 10.4 |
2 | 355 | 340 | 3.7 | 314 | 11.3 | 305 | 14.0 | 364 | 2.5 |
4 | 480 | 277 | 42.3 | 504 | 5.0 | 535 | 11.3 | 534 | 11.3 |
25 | 600 | 599 | 0.1 | 573 | 4.5 | 585 | 2.5 | 596 | 0.6 |
27 | 648 | 637 | 1.7 | 589 | 10.0 | 593 | 8.5 | 589 | 10.0 |
Prediction results of VF values with different
Video number | GT (s) |
|
|
|
| ||||
---|---|---|---|---|---|---|---|---|---|
Prediction (s) | RE (%) | Prediction (s) | RE (%) | Prediction (s) | RE (%) | Prediction (s) | RE (%) | ||
1 | 35.8 | 25.0 | 30.5 | 18.2 | 49.6 | 20.7 | 42.5 | 36.4 | 1.2 |
2 | 200.0 | 177.5 | 11.2 | 176.6 | 11.7 | 164.3 | 17.9 | 169.7 | 15.2 |
4 | 14.5 | 12.7 | 15.6 | 21.5 | 43.5 | 25.3 | 68.9 | 16.0 | 6.6 |
25 | 13.0 | 0 | 100.0 | 8.3 | 35.9 | 16.3 | 25.2 | 5.36 | 58.9 |
27 | 13.0 | 30.6 | 135.0 | 14.1 | 8.1 | 15.6 | 20.3 | 13.3 | 2.4 |
Thus, a framework can be constructed to determine the parameters of downsampling to serve as a strategy for enlarging original data. Figure
Downsampling parameters determination framework.
In this paper, a method was proposed for estimating SCC workability during the mixing process. A combined model based on CNN and LSTM was utilized to predict the SF and VF values of SCC. The SCC mixing videos were converted into a dataset of image sequences to fit the training needs of the proposed model. The trained DL model achieved good performance and could be taken into consideration for use in automated mixing plants. A framework to determine the data preparation strategy was introduced as well. The strategy mainly focused on the determination of the time resolution of the raw data. The proposed method provides an effective basis that will help develop a smart batching plant in the future.
The data collection approach used was easy to implement as there were no strict requirements that the tripod placement and shooting angle of the smart phone be maintained constant. This enabled collection of data from different experiment batches as long as the mixer volume was the same. The significance of the ease of setting up the experiment would be evident when collecting a high volume of data. Furthermore, the data feed in the proposed model comprised image sequences arranged in time series, which ensured that the potential data in the temporal information were used.
In future work, we propose taking into account more practical condition for training. The training and testing images used in this study included preprocessed image sequences; however, the architecture of the model may also be suitable for other types of data, such as videos captured from a 30 L single-shaft mixer or a twin-shaft mixer. In future research, we plan on collecting a series of video data to explore the possibility of making the proposed model more flexible.
The data used to support the findings of this study are included within the article.
The authors declare that they have no conflicts of interest.
This work was supported by the National Natural Science Foundation of China, the National High Technology Research and Development Program 863, and the National Key Laboratory (nos. 51239006, 2012AA06A112, and 2015-KY-01).