Research on Classification of Fine-Grained Rock Images Based on Deep Learning

Rock classification is a significant branch of geology which can help understand the formation and evolution of the planet, search for mineral resources, and so on. In traditional methods, rock classification is usually done based on the experience of a professional. However, this method has problems such as low efficiency and susceptibility to subjective factors. Therefore, it is of great significance to establish a simple, fast, and accurate rock classification model. This paper proposes a fine-grained image classification network combining image cutting method and SBV algorithm to improve the classification performance of a small number of fine-grained rock samples. The method uses image cutting to achieve data augmentation without adding additional datasets and uses image block voting scoring to obtain richer complementary information, thereby improving the accuracy of image classification. The classification accuracy of 32 images is 75%, 68.75%, and 75%. The results show that the method proposed in this paper has a significant improvement in the accuracy of image classification, which is 34.375%, 18.75%, and 43.75% higher than that of the original algorithm. It verifies the effectiveness of the algorithm in this paper and at the same time proves that deep learning has great application value in the field of geology.


Introduction
Rocks are naturally occurring minerals or solid aggregates composed of minerals and other materials (volcanic glass, biological bones, rock debris, etc.) [1]. From the perspective of scientific research, the study of rocks helps to understand the geological evolution history, rock chemical composition, and petrological characteristics of a certain region. In a practical sense, the research of rocks helps to find mineral resources and water resources. Professionals can directly classify and identify rock samples. is method is mainly based on human observation and empirical classification.
is method has problems such as inability to quantitatively analyze, low efficiency, greater influence by human subjective factors, high professional degree, and inability to popularize [2]. With the rise of deep learning, the possibility of automatic identification and classification of rocks has been provided [3,4]. e application of deep learning to rock image recognition and classification has significant practical significance [5]. e identification and classification of rocks is a complex process [6]. In recent years, a large number of scholars have conducted in-depth research on the classification of rocks and achieved certain results. Lin et al. [7] used a deep learning method based on a convolutional neural network for rock recognition, and the classification accuracy rate of 15 kinds of common rock image data could reach 63%. Guojian and Peisong [8] proposed a method of rock slice image classification based on residual network, which used residual network model to automatically extract and classify the features of rock slice images. Pascual et al. [9] used a 3layer convolutional neural network (CNN) network method to classify rock images to improve the recognition accuracy. Tian et al. [10] proposed an automatic classification method of sandstone based on SVM [11], using PCA [12] to reduce the dimension of the feature space and using SVM to obtain the relationship between the feature space and sandstone types.
e test set results showed that the classification accuracy of the support vector machine classifier could reach 97.0%. Gao et al. [13] proposed a novel feature selection method-feature correlation combination (CFR). Experiments showed that feature correlation combination could effectively retain the relevant features of the data and eliminate redundant features as much as possible to improve the classification accuracy. e above studies have achieved certain results for rock image classification; however, these studies have many shortcomings and deficiencies: (1) e dataset required for training the model is large, the preparatory work is large, and too much manpower and financial resources are consumed. (2) For superdivision images, it is difficult to extract finer image information, and the quality of feature extraction is low. (3) For the problem of fine-grained image classification, it is difficult to obtain more accurate classification results.
In order to solve the problems in the above research, this paper proposes an algorithm that combines image cutting and SBV algorithm. e specific work is as follows: (1) Without adding the interference information, the data set is expanded by cutting the image to extract the more comprehensive feature information in the image as far as possible.
(2) SBV classification algorithm is proposed to obtain richer complementary information to improve the classification accuracy of the deep learning network model.

Data Augmentation Technology
e deep learning algorithm model is obtained by training the neural network model, which is essentially a process of constantly adjusting parameters so that the model can map the image to the label [14]. What we need is to make the loss function of the model as low as possible. e deep learning algorithm requires a large number of datasets as input, which can make the parameters of the model fully trained and can also improve the generalization ability of the model. erefore, it is necessary to improve the dataset of the sample [15]. [16] is an image data enhancement method. First of all, it clears the pixel values of some areas of the image. Secondly, it randomly fills the pixel values of other data in the training set. Finally, we distribute the classification results according to the proportion of the filling area, thus completing the augmentation of the training set data. It has some advantages: noninformation pixels will not appear in the training process, which can improve training efficiency; it retains the advantages of data regional dropout; by requiring the model to recognize the object from the partial view and adding other pieces of sample information to the cleared area, the positioning ability of the model can be further enhanced, and so on. CutMix data augmentation map and original image and their labels are shown in Table 1.

Image Cutting.
e characteristic of rock image is that the local features are very similar to the global features, which is easy to think of the use of image cutting to expand the dataset to achieve the function of data enhancement. e original image pixel array is uniformly cut into N × N � M pictures with a smaller resolution, and the M pictures are put together to form a complete image. As shown in Figure 1, the image cutting when N is 6 is selected.

Conventional Classification Training Network Structure.
e conventional network structure used for training is shown in Figure 2. It is generally composed of only one backbone network module. First, input the image training data into the backbone network. Next, the network output will be obtained through the calculation of the multilayer neural network. And then the difference between it and the expected output will be used to construct the loss function. After that, a suitable optimization algorithm is selected to perform gradient descent on the loss function to update the network parameters. Finally, a better network model is obtained after a certain round of iterative training.
In recent years, with the vigorous development of the big data industry, deep learning has also been advancing by leaps and bounds. Researchers have proposed a large number of classic neural network models. Among them, there are three kinds of representative models: well-designed convolutional neural networks: LeNet, GoogLeNet, SE-ResNet, and so on; networks obtained by compound model expansion method combined with neural structure search technology: Effi-cientNet, EfficientDet, and so on; the natural language processing (NLP) which has become a standard transformer architecture network: Vision Transformer (ViT), DeiT, and so on. To test the robustness of the new framework proposed in this paper on the rock classification problem, we chose to test it under three different types of backbone. Among them, SE-ResNet-50 [17] achieved more useful features and worked better than other same-type networks, ViT [18] transformed the image classification problem into NLP problem and achieved the state-of-the-art levels on multiple datasets, EfficientNet-B0 [19] is a model with leading speed and higher accuracy, and they are all the mainstream backbone network modules. erefore, SE-ResNet-50 and EfficientNet-B0 were selected, respectively, as backbone network modules to perform rock image classification experiments on the new framework to validate the performance of the new model.
ViT network architecture is shown in Figure 3; SE Block module is introduced based on the ResNet model to construct SE-ResNet network architecture as shown in Figure 4 and the network structure of EfficientNet-B0 is shown in Table 2.

Conventional Classification Prediction Network Structure.
e conventional network structure used for prediction is shown in Figure 2. It is the same as the traditional network structure used for training. It is consistent with the regular classification training network structure. Input the image data to be predicted into the network, obtain the feature map of the image through the calculation of the network, and complete the classification. 2 Computational Intelligence and Neuroscience   Figure 2: e structure of the conventional classification network.
Computational Intelligence and Neuroscience 3 e network structure in Figure 2 can get the classification result, as in formula (1), where x is the input image data, y is the classification result, f(·) is represented as the feature extractors in the backbone network, and softmax(·) is a normalized exponential function that acts as a classifier.   Computational Intelligence and Neuroscience (1)

Improved Classification Training Network
Structure. e network model structure designed for training in this paper is shown in Figure 5, including two parts: image widening and trunk network. Compared with the conventional classification training network structure, the difference in this paper is to cut the image before entering the backbone network and then to input the cut image data in random order into the network training.
Among them, the image augmentation module is the image cutting method in Section 2.2, and the cutting rule is as shown in formula (2). Among them, (A, B) represents the resolution of the original image, (A1, B1) represents the resolution after cutting, "//"represents the rounding symbol, and N represents the number of image cuts. (2)

Improved Classification Prediction Network Structure.
Different from the traditional classification network, the structure of the classification prediction network model in this paper is slightly different from the structure of the classification training network model. e structure of the network model designed in this paper for prediction is shown in Figure 6. It contains three parts, namely, image augmentation, backbone network, and voting scoring. Among them, the image augmentation module, the backbone network module, and the two modules in the network model for classification training are the same. Compared with the network model of classification training, the network model structure of the classification test adds a part: score by voting module. e K-nearest neighbor (KNN) classification algorithm [20] is very popular with clustering algorithms, whose core idea is that if most of the K most adjacent samples in the feature space of a sample belong to a certain category, the sample also belongs to this category and has the characteristics of the sample on this category. e method determines the category of the test sample based on the category of one or more of the nearest neighbors. e KNN method is only related to a very small number of adjacent samples in category decisions.
According to the KNN algorithm, this paper proposes scoring by voting (SBV) algorithm, whose core idea is that a category has the largest share of the classification space of the cutting image. Its mathematical expression is shown in formula (3), where max n(·) represents the n value with the largest proportion, n represents the type coding, i represents the number of cuts of the image, N represents the total number of cuts of the image, y i represents the predicted type of the i-th cutting graph, and F i n represents that the predicted type of the i-th graph is n.

Data Preparation.
e experimental data in this paper are to use industrial cameras to take pictures of cuttings and core samples at the logging site and to take white light pictures in a dark box. e experimental categories cover four categories of mudstone, coal, fine sandstone, and siltstone, including dark gray mudstone, black coal, gray fine sandstone, light gray fine sandstone, dark gray silty mudstone, gray-black mudstone, and gray argillaceous siltstone, with a total of 315 rock samples. e resolution of the picture is divided into two categories (4096, 3000) and (2448 * 2048). e superresolution image paves the way for the cutting method of this paper. e number of various rock samples is shown in Table 3.

Parameter Setting and Experimental Evaluation Index.
e algorithm in this paper runs on a computer with Intel (R) Core i5-6500 CPU and GPU RXT 1660, the operating system is Windows 10, the programming language is Python, the open-source deep learning framework used is PyTorch, and some parameter settings are shown in Table 4.
For the rock image classification problem, the label smoothing [21] cross-entropy loss function is introduced to suppress overfitting, and the momentum stochastic gradient descent (SGD) optimizer is selected to update the weight parameters of the network. e initial value of the learning Computational Intelligence and Neuroscience rate is set to 0.01. At the same time, the exponential decay method is used to adjust the learning rate. e minimum learning rate is limited to 1e − 5. e training rounds of each experiment are 500, the CutMix data augmentation method are used in the first 300 rounds, and the CutMix data augmentation method are removed in the next 200 rounds. After reading the picture, the data feature sizes of the input nodes are all 224, 224, and 3. e value of α in the CutMix data augmentation method is set to 1 so that the value of λ is a random value between (0, 1) numbers. e classifier conceived is certainly valid for images that are not affected by uncertainties and inaccuracies. en, in these cases, it would be necessary to carry out a fuzzy preprocessing of the images [22][23][24]. erefore, we employed Gaussian filtering in the image fuzzy preprocessing phase. e experiment used classification accuracy as an indicator of the evaluation results to analyze the performance of the algorithm. Classification accuracy [25] is one of the most commonly used image classification evaluation indicators, which is defined as the proportion of the correct number of images predicted by the model to the total number of images predicted by the model, as shown in formula (5), where num R represents the number of images correctly predicted by the model and num A represents the total number of images predicted by the model.

Experimental
Results. e improved classification training network was used in this paper to experiment on the rock image dataset, as shown in Figure 7, which are the accuracy and loss function change graphs of the SE-ResNet-50, ViT, and EfficientNet-B0 models using the image cutting method. It can be seen from Figure 7 that as the iteration progresses, the loss functions of the SE-ResNet-50, Effi-cientNet-B0, and ViT networks steadily decrease, and the classification accuracy of the training set steadily increases. After 300 iterations, the loss function of each network is still declining. e accuracy of the training set of SE-ResNet-50 and EfficientNet-B0 is close to 100%, and the accuracy of the training set of the ViT network has reached 60%. After 300 iterations, the loss function of each network is still declining. e accuracy of the training set of SE-ResNet-50 and Effi-cientNet-B0 is close to 100%, and the accuracy of the training set of the ViT network has reached 60%. Table 5 shows the comparison of ablation experiment results of different algorithms. Among them, the classification accuracy rates of the SE-ResNet-50, ViT, and

Discussions
Experiments were performed on the rock image dataset using a conventional classification training network, as shown in Figure 8, respectively, for the accuracy and loss function plots of the SE-ResNet-50, ViT, and EfficientNet-B0 deep learning model in the training set without image cutting. As can be seen from Figure 8, with the increasing number of iterations, the loss function shock of SE-ResNet-50 and EfficientNet-B0 network decreases, the classification accuracy shock of the training set increases, and the difference is that the loss function of ViT network decreases rapidly and fluctuates violently after rapid improvement. After the images of the three networks were iterated up to 300 times, the loss function and accuracy of the training set changed significantly. In 300-500 iteration rounds, the loss function and accuracy of the SE-ResNet-50 and EfficientNet-B0 networks tend to be stable and the accuracy of the training set of the two has reached about 90%, but the accuracy of the training set of the ViT network only has reached about 45%. e accuracy and loss function graphs of the training set of each model in Figures 7 and 8 have a large change after 300 times. is is because of the impact of CutMix data enhancement closure [26] and the high accuracy of the training set indicates that the deep learning network can fit the rock images of the training set. Compared with Figure 8, the curve in Figure 7 is smoother. After 300 iterations, the loss function of the training set is still declining, the decline is lower, and the classification accuracy of the training set is higher. e accuracy of SE-ResNet-50 and EfficientNet-B0 networks has reached almost 100%, and the accuracy of ViT has reached 60%. It can be seen that the image cutting method proposed in this paper can improve the accuracy of the network model to a certain extent. e experimental results in Table 5 show that the accuracy of the SE-ResNet-50 and ViT model using the image cutting method decreased by 12.5% and 6.25%, respectively, and the EfficientNet-B0 model decreased by 3.125%, indicating that the use of image cutting to increase the dataset alone does not improve the classification accuracy of small-sample fine-grained images. e accuracy of the SE-ResNet-50, ViT, and EfficientNet-B0 models using SBV algorithm alone improves by 31.25%, 6.25%, and 43.75%, respectively, indicating that the SBV algorithm can obtain more messages to help classification and effectively improve the classification accuracy of fine-grained images with fewer samples. SE-ResNet-50, ViT, and EfficientNet-B0 models using both the image cutting method and the SBV algorithm were 75%, 68.75%, and 75%, respectively, which improved 34.375%, 18.75%, and 43.75%, respectively, over the models without any method, demonstrating the effectiveness and robustness of the algorithm.
To further analyze the rock image classification performance of the algorithm in this paper, the algorithm in this paper is compared with the commonly used VGG (Bai, 2019) [27], AlexNet   [28], PCA-SVM (Tian, 2019) [10], and Bilinear CNN (Lin T, 2015) [29] and other algorithms are compared on the rock image dataset of this paper.
e classification results are shown in Table 6. It can be seen     Table 6 that the algorithm in this paper has higher accuracy on the rock image dataset. e image classification accuracy of the algorithm in this paper is 75%, which is 28.1255%, 53.125%, 34.375%, and 18.75% higher than that of the VGG, AlexNet, PCA-SVM, and Bilinear CNN algorithms, respectively. It also shows that the algorithm proposed in this paper has excellent performance. When applying deep learning to the classification of a small number of fine-grained rock images, through the test, there are some findings: (1) e experimental results of the test set classification show that our method can effectively alleviate the lack of data, which makes it difficult for the model to extract sufficient feature information. is method avoids the shortcomings of traditional methods, such as low efficiency and susceptibility to subjective factors, which rely on professional experience classification. (2) e performance comparison of ResNet-50, ViT, and EfficientNet-B0 models before and after  modification shows that when the image cutting method is used alone, the performance of the model sometimes decreases, but when the SBV algorithm is used at the same time, the performance of the model is greatly improved. (3) e method proposed in this paper can be used for rock classification in interstellar exploration. e steps are as follows. First, the camera collects the rock image in real time to obtain the rock image to be measured and then uses the microprocessor to preprocess the rock image to be measured and input it into the trained model for prediction. Finally, the classification result is directly transmitted back through the communication system. ere is no need to go through sample collection. is program can realize artificial intelligence-based interstellar exploration.

Conclusions and Future Work
Aiming at the problem of fine-grained rock classification with few samples, this paper proposes a data augmentation method to alleviate the problem of lack of sample data by cutting the image without adding additional data samples and SBV algorithm based on deep learning network is proposed to construct a new classification prediction network structure. e experimental results show that, on the rock image dataset in this paper, the classification accuracy of the three networks of SE-ResNet-50, ViT, and Effi-centNet-B0 is 40.625%, 50.0%, and 31.25%, respectively. Using the image cutting method proposed in this paper and the SBV algorithm network, the classification accuracy of 32 images obtained by the test is 75%, 68.75%, and 75%. e results show that the method proposed in this paper has a significant improvement in the accuracy of image classification.
e original algorithm has increased by 34.375%, 18.75%, and 43.75% respectively. It also proves that the neural network replaces the traditional rock sample identification method with good application value and has certain research value for the identification and classification of rocks. Deep learning technology requires a large number of datasets and clear images. e method proposed in this paper can greatly improve the accuracy of the algorithm in the case of a small number of datasets. However, there are still two research directions that can be further studied in the future: (1) Deep learning is a black-box model, and its interpretability needs further research. (2) Open-source rock datasets are lacking. In other cases, the robustness of the method in this paper is unknown.
e establishment of open-source fine-grained rock datasets is a significant study.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.