Identifying Capsule Defect Based on an Improved Convolutional Neural Network

Capsules are commonly used as containers for most pharmaceuticals, and capsule quality is closely related to human health. Given the actual demand for capsule production, this study proposes a capsule defect detection and recognition method based on an improved convolutional neural network (CNN) algorithm. The algorithm is used for defect detection and classiﬁcation in capsule production. Defective and qualiﬁed capsule images in the actual production are collected as samples. Then, a deep learning model based on the improved CNN is designed to train and test a capsule image dataset and identify defective capsules. The improved CNN algorithm is based on regularization and the Adam optimizer (RACNN), on which a dropout layer and L2_regularization are added between the full connection and the output layer to solve the overﬁtting problem. The Adam optimizer is introduced to accelerate model training and improve model convergence. Then, cross entropy is used as a loss function to measure the prediction performance of the model. By comparing the results of RACNN with diﬀerent parameters, a detection method based on the optimal parameters of the RACNN model is ﬁnally selected. Results show a 97.56% recognition accuracy of the proposed method. Hence, this method could be used for the automatic identiﬁcation and classiﬁcation of defective capsules.


Introduction
In the medical industry, the quality of drugs is important and closely related to human health. Medicine and healthcare products in the capsule form play an important role in people's lives. However, capsule products usually feature various defects due to the limitations of capsule production technology [1][2][3]. Capsule defects include the presence of air bubbles, capsule shrinkage, deformation, and holes. ese defects cause unqualified capsule dosage, affect capsule efficacy, and directly lead to poor air tightness, which affect the efficiency and lifespan of the medicine. Poor appearance defects may not affect the efficacy of drugs, but they harm the image of enterprises in the minds of consumers. At present, equipment that can detect capsule defects is few, and methods mainly depend on human visual or sampling inspection. However, these methods have slow detection speed and low detection accuracy and are vulnerable to individual subjective factors [4,5]. erefore, the automatic identification of defective capsules is needed during the production process. In recent years, numerous online capsule detection technologies based on computer vision have been applied [3,6,7]. For example, Qi and Jiang proposed a capsule defect detection method based on the hierarchical support vector machine to extract capsule defects in a real-time capsule defect recognition system [2]. Wang et al. used a backpropagation neural network to identify capsule defects [1]. However, these traditional detection methods cannot meet the growing production requirements and, thus, require efficient and reliable detection technology.
As an important part of artificial intelligence, the convolutional neural network (CNN) has achieved processing and feature extraction in image information of complex dimensions [8]. e CNN structure belongs to a multilayered supervised learning neural network, which is changed by a multilayer perceptron. Given its multilevel structure, the CNN model is suitable for image processing problems [9]. Since the 1990s, CNN has been widely used in face [10,11], handwriting [12,13], audio [14], defect [8,[15][16][17][18], and fiber recognition [19]. For example, Basheera and Satya Sai Ram proposed a method for classifying brain tumors using the deep features extracted by the CNN [20]. ey used the experimental results to prove that the CNN is an important machine learning technology for medical image segmentation and classification. Zhang et al. proposed a method using a CNN model to recognize faces from nearinfrared images [21]. Experimental results showed that the proposed CNN structure features higher recognition rate compared with the traditional binary code, Zernike moment, and Hermite kernel recognition methods. Huang et al. applied a CNN to the automatic recognition of Chinese font characters and designed an inception font network with two additional CNN structural elements [13]. In addition, Soukup and Huber-Mörk applied a CNN to steel surface defect recognition [16]. ey experimented with a classical CNN trained using a pure supervisory method, discussed the influence of regularization methods, and achieved a high recognition rate for steel surface defects. Zhang et al. proposed a novel dual-channel CNN framework for an accurate spectral spatial classification of hyperspectral images [22]. One-and two-dimensional CNNs were used to extract hierarchical spectral and space-related features, respectively, and the expected classification results were obtained. Kumar et al. presented a framework that uses deep CNNs to classify multiple defects in sewer closed circuit television inspection images and achieved high classification accuracy [23]. ese experimental results proved the effectiveness of CNNs in defect recognition.
Given the methods used in the literature, the CNN architecture features remarkable success and speed in classification compared with other methods. In addition, compared with the general neural network, the CNN exhibits notable advantages [24][25][26]: (a) shared convolution kernel and the absence of pressure in high-dimensional data processing; (b) omission of manual selection of features when using weights that are trained to obtain features; and (c) low number of network connections and training parameters, and a simple network structure. However, CNNs are rarely used for the identification of defects in capsule detection. erefore, this paper proposes a capsule defect detection method based on an improved CNN algorithm (RACNN). e dropout layer and L2_regularization are added between the full connection and output layers, and the Adam optimizer is introduced to accelerate model training and improve model convergence.
e detection method based on the optimal parameters of the RACNN model is finally selected through optimizing parameters. e proposed method achieves high recognition accuracy in the process of capsule defect detection, which provides a new method for defect capsule classification.

CNN.
e most basic component of CNNs is the neuron model. Two layers of neurons constitute the perceptron model, which can achieve the logical operation and weight learning of a complex task. Adding a hidden layer between the input and output layers to form a multilayer functional neuron model can solve multiclassification problems [16,17]. CNN is an artificial neural network based on deep learning theory. e core part of CNNs is the hidden layer, which is composed of convolution, pooling, and fully connected layers. e convolution layer extracts the main features of the input image data and contains multiple convolution kernels, which are similar to the feed forward neural network neurons in the convolutional layer [24]. e convolution kernel in the convolutional layer can extract deep information from the data and local features of the image. A brief description of the convolution process is described by the following formula: where * denotes the dot product of the kernel and the local regions, w l i denotes the weights of the ith filter kernel in layer l, and x l(j) denotes the jth local region in convolutional layer l.
e activation function is essential after the convolution operation. e activation function adds nonlinear factors to the neural network to solve complex problems. In recent years, ReLU has been widely used as an activation unit of CNNs [27]. Compared with the common activation functions, sigmoid and tanh, ReLU offers advantages, such as a low calculation cost. ReLU sets the output value of certain neurons to zero, thereby resulting in the sparseness of the network and reduction in the interdependence of parameters; as such, the problem of overfitting is alleviated. e ReLU activation function is described by the following formula: where u l(i,j) is the output value of the previous operation. e pooling layer is used to compress the feature matrix while retaining useful information; it can accelerate the calculation and prevent overfitting [28]. Pooling includes two methods: maximum and average. e CNN designed in this study uses maximum pooling. e maximum pooling transformation is described by the following formula: where a l(i,n) denotes the value of the nth neuron in the ith frame of layer l and W is the width of the pooling region. In most cases, the full connection layer is located at the end of the network. is layer performs regression classification on the features of the previous layer-by-layer transformation and mapping extraction.

RACNN for Capsule Identification.
On the basis of the basic CNN model, the proposed CNN model is improved from three aspects of preprocessing, optimizer, and model structure.
e improved CNN algorithm (RACNN) is constructed with regularization and the Adam optimizer. e RACNN structural model for capsule recognition is shown in Figure 1. e main steps are as follows: Step 1: in the image preprocessing stage, the resize function is introduced to process the capsule image.
Step 2: the preprocessed image dataset is inputted for training, and the weight w i and bias b i of the network are initialized.
Step 3: the RACNN structure is designed with four convolution-pooling and three fully connected layers. e first layer convolutional feature matrix X 1 is calculated, and the feature matrix X 1 is merged into one column vector as the neuron input to the next layer. e weight w i and the bias b i are updated to obtain the feature matrix X 2 , and then the next layer is sequentially executed.
e specific parameters of each layer are described in the following chapters.
Step 4: the dropout layer is added between the fully connected and the output layers to improve the normalization ability of the network. In addition to the dropout layer, L2_regularization is introduced to prevent overfitting. e difference is that the dropout is obtained by modifying the neural network itself, whereas L2_ regularization is achieved by modifying the loss function. at is, the L2_regularization adds regularization into the original loss function. e principle is shown by the following formula: where J 0 represents the original loss function, C represents the L2_regularization parameter, and W is the weight matrix of each layer in the trained network.
Step 5: in the model training stage, the Adam optimizer is introduced to update and calculate the network parameters, which affect the model output value and make it close to the optimal value. As a result, the model's convergence speed is improved, and the model error is reduced.
Step 6: finally, a SoftMax classifier is used to map the extracted features to 10 different types of capsule images. e output of each layer is fused and input to the SoftMax classifier to improve further the recognition accuracy of capsules.

Experimental Design and Parameter Setting
In the production process of the capsule, 10 types of capsule images, including nine types of defective capsules and one type of normal capsule, were acquired. Image enhancement technology was used for image processing to enrich the image training set, extract image features, and promote the model. Two image enhancement techniques, namely, image rotation and image adding noise, were used in the experiment, among which Solt noise and Gaussian noise were added. Figure 2 shows the physical images of the capsules. e types of defective capsules included holes, concave heads, uncut bodies, oil stains, short bodies, insertion, shrivel, locking, and nesting. Table 1 lists the number of each type of capsule image processed by image enhancement technology for subsequent experiments. Given that the CNN classification of this study belongs to supervised learning, the red number indicates the label set for each capsule category. e CNN structure used in the experiment comprised four convolution and pooling layers, followed by fully connected hidden and SoftMax layers. e size of the convolution kernel of the first two layers was 5 * 5, and that of the latter two layers was 3 * 3. Maximum pooling was used, and the excitation function was ReLU. Table 2 provides the detailed parameters of the convolutional and pooling layers. e experiments were implemented using the Ten-sorFlow toolbox. In the training process of the RACNN model, the k-fold cross validation method was used to evaluate the model, and k was set to 10. All capsule image datasets were randomly divided into 10; nine of them were trained each time, and the remaining one was used for testing (i.e., 1413 and 157 samples were trained and validated, respectively). e process was repeated 10 times, and the image datasets used for testing each time were different. e specific implementation process was completed by calling the k-fold function in the sklern.model_selection module in the Scikit-learn library. e sample images acquired from the producing spot were input into the RACNN network. Each sample image was preprocessed into a 100 * 100 pixel image as the input dataset by using the resize function in the Skimage library. e size of the convolution core in the first layer was 5 * 5, and the convolution layer C1 obtained 32 feature images with 100 * 100 pixels. e downsampling coefficient was 2, that is, the step length of the pooling layer P2 was 2. e pooling layer P2 obtained 32 feature images with 50 * 50 pixels. e same process was executed on the next layers. Specific parameter settings are shown in Table 2. As the last layer of RACNN, the SoftMax layer classified the data and outputted a vector of 10 * 1. Each vector value represented the probability that each sample belonged to a class. Cross entropy was used as the loss function, which increased as the predicted probability diverged from the actual label. us, the model aimed to minimize cross entropy. After establishing the RACNN network as described above, training data were used to train the network and fix the trained RACNN network. Finally, the test data were identified and classified by the trained RACNN network.

Selection of the Learning Rate and Batch_size.
e learning rate and batch_size are important parameters of CNNs. e learning rate considerably affects the test effect of the network. An extremely high or low learning rate reduces recognition accuracy. e learning rate is the step interval of each gradient descent, which determines how far the weight moves in the gradient direction. When the learning rate is large, the convergence speed of the model is faster in the early stage. But, this will make the model difficult to converge, that is, oscillate near the extreme point, and never reach the best point. When the learning rate is small, the Shock and Vibration 3 convergence rate will be extremely slow. erefore, in the experiment, a large learning rate is set in the early stage to make the gradient drop rapidly, and then the learning rate is gradually reduced to make the model gradually reach the best. But, at the same time, dropout and regularization were used to prevent overfitting. Different learning rates for training are used to optimize the relationship between the learning rate and recognition accuracy in the experiment. e results are shown in Table 3. As the learning rate decreases, the test accuracy gradually increases, but when the learning rate is too small, the test accuracy decreases. When the learning rate is approximately 0.0007 in experiment, the recognition accuracy is the highest, and the model should be optimal.
Batch_size indicates the batch number of images that the model reads during training or testing. For the classification experiments of different targets, setting the appropriate batch_size plays an important role in improving the training efficiency and recognition performance of the network model. When the batch_size is extremely small, the network convergence is unstable, and the convergence speed is slow. Hence, the loss value fluctuates back and forth. When the batch_size is extremely large, the amount of calculation is large, memory consumption is excessive, and local optimum may occur. erefore, in this experiment, after adjustment, the optimal parameters are obtained. e learning rate is 0.0007; the maximum number of training is 20; and the batch_sizes are set to 8, 16, 32, and 64 for comparison. As shown in Table 4, when the batch_size is 16, the smallest loss is obtained, and the test set achieves the highest accuracy. Figures 3 and 4 show the influence of the learning rate on the experimental results. Figures 3(a) and 3(b) display the accuracy curve of the training and test sets under different learning rates, respectively. When the learning rates are 0.0007 and 0.0004, the accuracy rate is higher under the same number of iterations compared with other learning rates. Moreover, the accuracy improves gradually with the increase in the number of training times. When the number of iterations reaches approximately 11, the recognition accuracy increases slowly and remains stable. Figures 4(a) and 4(b) show the change curve of loss of the training and test sets, respectively, at different learning rates. As shown in the figures, after 20 iterations, the negligible loss of networks occurs at the learning rates of 0.0007 and 0.0004. With the increase in iterations, the loss value continuously decreases and approaches zero. Figures 5 and 6 show the influence of the batch_size on the experimental results. Figure 5 Figure 6 shows that when the batch_size is large, the initial loss value is also large. Figure 5 shows that when the batch_size is small, the accuracy curve fails to rise smoothly, indicating the instability of the network. erefore, the network can achieve accurate and rapid convergence effect only when an appropriate batch_size is selected for our capsule sample set.

Efficiency Evaluation.
As described above, the optimal parameters of the RACNN model in the experiment are as follows: batch_size, 16; learning rate, 0.0007; and maximum training epoch, 20. e recognition accuracy of the test set obtained by the optimal parameter model is 97.56%. e confusion matrix of the test accuracy is drawn after normalization of the capsule sample data to explain the unbalanced classification results of each kind of defective capsule in detail (Figure 7). A specific matrix is used to present the visualization effect of the algorithm performance. e labels of 10 kinds of capsules (order: hole, concave head, uncut body, oil stain, short body, insertion, shriveled appearance, locking, nesting, and normal) are defined as numbers from 1 to 10 in order. e results show that most of the capsules are correctly identified. In the experiment, 14% of the oil-stained capsules is mistaken for uncut body capsules, and 11% of the shriveled capsules is incorrectly identified as hole capsules.
For comparison, RACNN, CNN, support vector machine (SVM), and k-nearest neighbor (KNN) are applied to analyze the same image dataset. e identification accuracy        for the four methods is shown in Figure 8. RACNN obtains the best performance among the four methods. Although the identification accuracy of SVM is relatively high, SVM is not suitable for processing large-scale training samples. Since the number of capsules in actual production is far more than that used in experiments, this method is more complicated when applied to the actual production process of a capsule. However, compared with SVM, CNN has no pressure on high-dimensional data processing, and is more suitable for processing large-scale sample data. It is concluded from experiments that RACNN has higher identification accuracy, and the classifier is simple and easy to operate, which is convenient for application in actual capsule production.

Conclusion
An efficient method for capsule defect detection is crucial to improve capsule quality. erefore, in this study, a CNN model was introduced to recognize defective capsules. Sample images, including one kind of normal capsule and nine kinds of defective capsules, were collected. An improved CNN algorithm based on regularization and the Adam optimizer was designed. e RACNN model was used to train and identify the capsule images. In addition, the effects of two important network parameters (batch_size and learning rate) on the experimental results were analyzed in detail. By adjusting the parameters, the best parameter combination model was obtained, and the classification accuracy was as high as 97.56%. e experimental results showed that the proposed method is concise and efficient, and the RACNN structure model is suitable for the identification of defective capsules.

Data Availability
Data can be obtained from National Engineering Laboratory of energy saving motor and control technology, College of Electrical Engineering and Automation, Anhui University.

Conflicts of Interest
e authors declare that they have no conflicts of interest.