On Using XMC R-CNN Model for Contraband Detection within X-Ray Baggage Security Images

We present an X-ray material classifier region-based convolutional neural network (XMCR-CNN)model for detecting the typical guns and the typical knives in X-ray baggage images. )e XMC R-CNN model is used to solve the problem of contraband detection in overlapped X-ray baggage images by the X-ray material classifier algorithm and the organic stripping and inorganic stripping algorithm, and better detection rate and the miss rate are achieved.)e detection rates of guns and knives are 96.5% and 95.8%, and the miss rates of guns and knives are 2.2% and 4.2%.)e contraband detection technology based on the XMC R-CNN model is applied to X-ray baggage images of security inspection. According to user needs, the safe X-ray baggage images can be automatically filtered in some specific fields, which reduces the number of X-ray baggage images that security inspectors need to screen. )e efficiency of security inspection is improved, and the labor intensity of security inspection is reduced. In addition, the security inspector can screen X-ray baggage images according to the boxes of automatic detection, which can improve the effect of security inspection.


Introduction
In recent years, with the increasing seriousness of terrorist activities, the safety of air transport has been paid more and more attention by all countries in the world. At present, there are three pain points of the security inspection, which need to be solved urgently: first, how to improve the effect of security inspection; second, how to improve the efficiency of security inspection; and third, how to reduce labor intensity of security inspection. e contraband detection technology is not used in the traditional technical solution, because the baggage contents are complex and highly varying. How to accurately identify the contraband in the X-ray baggage image is the most important and most difficult challenge for human operators. In addition, during peak periods, the human operators have limited time to screen images.
As the most popular machine learning method, deep learning has achieved excellent results in object classification and detection. For the task of X-ray image classification, the previous work proposed the traditional machine learning method. In this paper, the XMC R-CNN method based on deep learning will be used to detect the contraband within X-ray baggage security images.
In the ImageNet Large-Scale Visual Recognition Challenge 2012 (ILSVRC12), Hinton's team won the championship with the AlexNet model constructed by convolutional neural networks (CNNs), which ignited the enthusiasm of academic and industrial for deep learning. In order to use the deep learning method to detect the contraband in X-ray baggage security images, the framework of deep learning, the backbone model of deep learning, and the detection model of deep learning should be understood, as shown in Figure 1.
With the upsurge of deep learning research, various open-source frameworks of deep learning emerge in endlessly. e main framework is introduced in [1][2][3][4], such as TensorFlow, Caffe, MXNet, Keras, CNTK, Torch, and eano. Different deep learning frameworks usually have different model design, interface, deployment, performance, and architecture design, so their advantages and disadvantages are different. e classic backbone models mainly include LeNet [5], AlexNet [6], ZFNet [7], VGGNet [8], GoogleNet [9], ResNet [10], and DetNet [11]. Although the algorithms of object detection are different, the convolutional neural networks are usually used to process the input image, generate the feature map, and then use various algorithms to complete the region generation and loss calculation. e convolutional neural networks are the backbone of the whole detection algorithm. e basic components of the backbone include convolution layer, activation function layer, pooling layer, dropout layer, batch normalization layer, and full connection layer. e main backbone models are introduced above, which are usually used for object classification models. We study that the contraband detection belongs to classification and detection, and only classification is not enough.
ere is much contraband in X-ray baggage security images, so it is not only necessary to classify different contraband but also to determine the locations and sizes of the contraband. e classic classification and detection models include R-CNN [12], Fast R-CNN [13], Faster R-CNN [14], Mask-CNN [15], SSD [16], YOLO [17], and R-FCN [18]. ese models have developed from two stages to one stage, from bottom-up only to top down, from the single-scale network to the feature pyramid network, and many algorithms have achieved excellent results on the ImageNet dataset.
Computer-aided screening (CAS) [19] has also been widely used in the automatic detection of the contraband in X-ray baggage images; however, this largely remains an unsolved problem. e contraband detection based on multiview X-ray images is carried out in each X-ray image, and then, the constraint between multiview images is used to improve the detection accuracy in [20][21][22][23][24][25][26][27]. e contraband detection in computed tomography (CT) images is to extend the detection method of the single X-ray 2D image to 3D image in [28][29][30][31][32][33][34][35]. As mentioned above, both the contraband detections in multiview images and CT images are traditional technologies. e contraband detection by deep learning is proposed in [19,[36][37][38], and the accuracy of contraband detection is improved through deep learning methods. At present, there are still some problems in using these techniques to detect the contraband in overlapped X-ray baggage images. For example, a knife or a gun is covered by other objects in the baggage, and the outline of the knife or the gun is not clear in the original X-ray image, and its color is shown as orange (it is not metal color), so it cannot be correctly detected by the prior detection algorithm. Also, an explosive is hidden under several steel plates in the baggage, and it cannot be seen at all in the original image. It is difficult for the security inspector to determine that there is a dangerous object in this area.
e XMC R-CNN model is used to solve the problem of contraband detection in overlapped X-ray baggage images by the X-ray material classifier algorithm and the organic stripping and inorganic stripping algorithm.
In the next section, we review related work on the dual-energy X-ray material classifier and the contraband detection based on dual-energy X-ray data. In Section 3, we discuss the contraband detection technology based on the XMC R-CNN model, including the detection model design and the algorithm implementation. In Section 4, we introduce the training and test results of the model. We summarize the work of this paper in Section 5.

Related Work
We now summarize the related work on the dual-energy X-ray material classifier and the contraband detection based on dual-energy X-ray data.

X-Ray Material Classifier.
e dual-energy method of material classifier has been widely used in the X-ray security inspection systems by measuring the difference of attenuation coefficient of different materials for high-and lowenergy X-ray in [39][40][41][42][43][44][45][46]. e physical principle of the dualenergy X-ray radiography is based on the exponential law of photon radiation attenuation. When an X-ray beam passes through an object, the detector signal can be described by the Beer-Lambert law: where E is the photon energy, I 0 (E) is the initial energy intensity of photons emitted from the source, μ(E, Z) is the attenuation coefficient of a material with atomic number Z for impinging photons with energy E, and τ is the material thickness.
In previous work for the material classifier in the X-ray baggage images, the log-ratio R is defined as the vertical axis, and 1/Hi is defined as the horizontal axis by Ogorodnikov and Petrunin in [47]:  where Hi is the high-energy transparency and Lo is the lowenergy transparency. By formulas (1) and (2), for a given material with atomic number Z, R is the unique value of the material and does not depend on the thickness. erefore, R can be used to discriminate materials. e α-curve method computes the features of the X-ray material in [48][49][50]: where α 2 is defined as the vertical axis and α 1 is defined as the horizontal axis. e two materials can be easily differentiated if their α-curves are largely separated. On the contrary, the materials cannot be easily differentiated if their α-curves are too close.

Contraband Detection.
Most contraband detection algorithms based on deep learning use the RGB images and do not really use the X-ray high-and low-energy data. erefore, the material information is not fully used by the detection algorithm. e contraband detection within X-ray baggage imagers has not been well explored by machine vision community due to the lack of publicly available X-ray image datasets. Since the contraband detection within X-ray baggage imagers is a challenging problem, the detection models that use the X-ray high-and low-energy data are significantly more limited in the literature. e dual-energy images which provide material information about the objects are used for the contraband detection in [21,51]. e authors show that using multiple local color and texture features improves classification and detection performance. ey first present an extensive evaluation of standard local features for object detection on a large X-ray image dataset in a structured learning framework. en, they propose two dense sampling methods as a key-point detector for textureless objects and extend the SPIN (the image generation process can be visualized as a sheet spinning about the normal of a point) color descriptor to utilize the material information. Finally, they propose a multiview branch-and-bound search algorithm for multiview object detection. e two-channel dual-energy networks and the fourchannel dual-energy networks are described in [37]. For each input, the high-and low-energy images are transformed into the feature space of a given method. e final convolutional layer feeds into the fully connected (FC) layers. e networks used in this work consist of 16 convolutional layers and 3 fully connected layers. e author experimented with the four-channel dual-energy networks, and the four-channel dual-energy networks with −log H, Δ, −log L, Σ perform the best performance metrics, where Σ � H + L and Δ � H − L are used.

Detection Model Design
In this paper, the Faster R-CNN model is combined with the characteristics of the X-ray baggage image, and the XMC R-CNN model is designed for the automatic detection of the typical guns and the typical knives in the X-ray baggage image. e framework of the system is Caffe, and the backbone model of the system is VGG-16 [8].

XMC R-CNN Detection Model.
e XMC R-CNN model is composed of two modules. e first module is X-ray material classifier (XMC) that strips organic and inorganic, and the second module is the Faster R-CNN detector that uses the proposed regions.
e XMC R-CNN algorithm model mainly includes the following steps, as shown in Figure 2.

X-Ray Material
Classifier. High-energy data and lowenergy data are input, and different material values of organic, mixture, and inorganic are output through the material classification algorithm.

Organic Stripping and Inorganic Stripping.
High-energy data, low-energy data, and material values are input, and the material value of the organic, the gray value of the organic, the material value of the inorganic, and the gray value of the inorganic are output through the organic stripping and inorganic stripping algorithm.

Convolutional Layer Features.
A convolutional layer features adopts the Simonyan and Zisserman model [8] (VGG-16), which has 13 shareable convolutional layers. e organic, mixture, and inorganic images are input, which sizes are different. e feature maps of the input image are output, which will be used for the region proposal network (RPN) layers and the full connection layers. [14] is mainly used to generate region proposals. Firstly, anchor boxes of multiple scales and aspect ratios are designed. By the SoftMax layer, it can be determined that anchor boxes belong to foreground or background. en, the anchor boxes are modified by binding box regression and a more accurate region proposal is obtained.

Region of Interest Pooling.
e RoI (Region of Interest) pooling layer in Faster R-CNN is adopted. e feature map from the last layer of VGG-16 and the region proposals from RPN are input, and the fixed size 7 × 7 proposal feature map is output.

Contraband Detection.
e proposal feature maps are input. rough the full convolutional layer and the SoftMax layer, we can get the classification of region proposal. At the same time, we can get the locations of the detection boxes by the binding box region. Finally, the classifications and the locations of the typical guns and the typical knives are output.
e detailed implementation of Step (3), (4), (5), and (6) can be seen in the Faster R-CNN [14] and VGG-16 [8]. e X-ray material classifier algorithm of Step (1) and the organic stripping and inorganic stripping algorithm of Step (2) will be described in the following sections.

X-Ray Material Classifier.
In this paper, the organic glass (chemical formula: C 5 H 8 O 2 ) is selected to represent the organic material in Table 1, the aluminum (model: 2A12) is selected to represent the mixture material in Table 2, and the carbon steel (model: 45 # ) is selected to represent the inorganic material in Table 3.
ree kinds of materials are selected from different thickness ranges, and data are collected; then, the curves are drawn as shown in Figure 3. e horizontal axis is the highenergy value (Hi) of the dual-energy X-ray, and the vertical axis is the material value (Mat) of the dual-energy X-ray. e log-ratio R is defined as the material value. 122 curves are evenly inserted between the three curves, which, respectively, represent the different materials of organic, mixture, and inorganic. rough these 125 curves, the material values of any point in the space are calculated, and the material table of the dual-energy X-ray is generated. e whole material space is divided into five regions: low-gray unrecognizable region where the image shows red, high-gray unrecognizable region where the image shows gray, organic region where the image shows orange, mixture region where the image shows green, and inorganic region where the image shows blue, as shown in Figure 3. e material value of the low-gray unrecognizable region is defined as 125, the material value of the high-gray unrecognizable region is defined as 0, the material value of the organic region is defined as 1, the material value of the inorganic region is defined as 124, and the material value of the mixture region needs to be calculated, which is the transition region from the inorganic region to the organic region. ree material curves in the mixture region have been obtained. According to the X-ray material classifier model in Figure 3, the low-and high-energy data of other 122 curves with different thickness are calculated. rough the materials of 125 curves and 100 data sampling points of different material thickness, the mixture region can be divided into 12276 grid cells. As shown in Figure 3, each grid cell is composed of four points (A, B, C, and D). Using the linear interpolation algorithm, we can calculate the material value of any point in the quadrilateral (A, B, C, and D). Finally, the material values of all points in the whole material space can be calculated and exported to the material table.

Organic Stripping and Inorganic Stripping.
By means of physical model designing, experimental data sampling, and the using of the grid cell method and linear interpolation algorithm, the organic or the inorganic can be screened separately from the overlapped X-ray baggage image. e organic glass (chemical formula: C 5 H 8 O 2 ) is selected to represent the organic material, the carbon steel (model: 45 # ) is selected to represent the inorganic material, and the overlapped simulants of the organic glass and the carbon steel are selected to represent the mixture material, whose thickness is different. e effective atomic number [52] is given by where Z eff is the effective atomic number, m i is the mass percentage of the element, Z i is the atomic number, and A i is the atomic weight. According to formulas (1) and (2), the effective atomic number Z fe of the carbon steel is equal to 25.91, and the effective atomic number Z gl of the organic glass is equal to 6.56. e material of the effective atomic number Z n (7∼25) can be obtained by overlapping the organic glass and the carbon steel with different thicknesses. As shown in Figure 4, the horizontal axis is the highenergy value (H i ) of the dual-energy X-ray and the vertical axis is the material value (Mat) of the dual-energy X-ray. e log-ratio R is defined as the material value. e orange curve is the organic glass curve, which represents the organic material. e blue curve is the carbon steel curve, which

Mathematical Problems in Engineering
represents the inorganic material. e curves between the orange curve and the blue curve are the different thicknesses overlapped curves of the organic glass and the carbon steel, which represents the mixture material. By overlaying the organic glass (Z gl � 6.5) and the carbon steel (Z fe � 26), the effective atomic number Z n (7∼25) is obtained. For example, if we want to calculate the material with Z n � 7, take Z eff � 7 into formulas (1) and (2), so we can get the mass ratio of the organic glass and the carbon steel (m gl :m fe � 0.997572383:0.002427617). e length and width of the organic glass and the carbon steel models are the same in the test process, we know that the density of the carbon steel is 7.82 g/cm 3 , and the density of the organic glass is 1.18 g/cm 3 , so that the volume ratio of the organic glass and the carbon steel can be calculated and then converted to the thickness ratio (T gl :T fe � 0.999632928: 0.000367072). Similarly, we can calculate the material with Z n � 8-25, and the corresponding thickness of the organic glass and the carbon steel is shown in Table 4.
According to Table 4, the material of the effective atomic number Z n � 7∼25 can be obtained by overlapping the organic glass and the carbon steel with different thicknesses. 9 sampling points are designed for each material, and 171 sampling points are obtained in total. Six groups of data were collected at each sampling point, and a total of 1026 sampling data points were obtained.
For example, the thickness of the organic glass is 32.7 mm, and the thickness of the carbon steel is 0.012 mm. We know that the density of the carbon steel is 7.82 g/cm 3 , and the density of the organic glass is 1.18 g/cm 3 . In this way, we can calculate the mass ratio of the organic glass and the carbon steel (m gl :m fe � 38.568:0.09384), and bring the m gl and m fe into formulas (1) and (2), so that we can get Z eff � 7.00039831652385 ≈ 7 (i.e., lines 7-6 in Table 5). Similarly, it can be calculated that each point in Table 5 meets the corresponding effective atomic number. See Table 5 for the experimental data of the design sampling point, where Z-n is the effective atomic number and the serial number of the sampling point, T gl is the thickness of the organic glass, T fe is the thickness of the carbon steel, G m is the high-and low-energy value of the mixture material, G gl is the high-and low-energy value of the organic glass, and G fe is the high-and low-energy value of the carbon steel.
An organic stripping and inorganic stripping table is generated from the data in Table 5. As shown in Figure 4, there are four points: A, B, C, and D. rough Table 5, we can get A (G m , G gl , G fe ), B (G m , G gl , G fe ), C (G m , G gl , G fe ), and D (G m , G gl , G fe ). rough the bilinear interpolation algorithm, we can calculate the (G m , G gl , G fe ) value of any point in the quadrilateral (A, B, C, and D), so that we can calculate the (G m , G gl , G fe ) value of any point in the coordinate system. Finally, we can realize the organic stripping and inorganic stripping of dual-energy X-ray security images.
In Figure 5(a), a knife is covered with some objects. In the original image, the outline of the knife is not clear, and its color is shown as orange, so it cannot be judged as a dangerous metal. In Figure 5(b), the knife is clearly visible, and it is shown as a metal color (blue), because the organic stripping algorithm is used. e outline of this knife is clear in the organic stripping image, and it is easy to be detected by the XMC R-CNN model.
In Figure 6(a), an explosive is hidden under several steel plates. In the original image, it is difficult for the security inspector to determine that there is a dangerous object in this area. In Figure 6(b), the explosive is clearly visible, and it is shown as an explosive color (orange), because the inorganic stripping algorithm is used. is is a good basis for detecting more kinds of contraband in X-ray baggage images in the future.

Evaluation
Training and testing are performed via the use of Caffe [53], a deep learning tool designed and developed by the Berkley Vision and Learning Center. e framework of the system is Caffe, the backbone model of the system is VGG-16, and the algorithm model of the system is XMC R-CNN.

Dataset Design.
e experimental samples of different material guns are shown in Table 6, and the experimental samples of different type knives are shown in Table 7.

Training and Testing.
e thirty pieces of baggage of different sizes and materials are filled with different objects, whose six pieces of baggage are complex, eighteen pieces of baggage are medium complex, and six pieces of baggage are simple. Each baggage is divided into three layers with the top facing up, and each layer is divided into nine areas. As shown in Figure 7, 1-9, 11-19, and 21-29 are used to represent the nine areas of each layer, and the baggage internal space is divided into 27 areas for placing contraband samples.
Four directions of each luggage are as shown in Figure 8. e top of the luggage is upward to enter the X-ray equipment channel, the top of the luggage is upward to enter the X-ray equipment channel at an angle of about 45 degrees, the bottom of the luggage is upward to enter the X-ray equipment channel, and the bottom of luggage is upward to enter the X-ray equipment channel at an angle of about 45 degrees. e gun samples in Table 6 and the knife samples in Table 7 are placed in 27 space areas within each luggage, and the sample data are collected in four different directions as shown in Figure 8. e total number of collected samples is (10 + 20) × 30 × 27 × 4 � 97200. As shown in Figure 9, the 80000 samples are selected as the training dataset, and the 17200 samples are selected as the test dataset.       Mathematical Problems in Engineering models on the ILSVRC dataset and the PASCAL VOC dataset, then two classic models are selected for comparative testing on the X-ray baggage images dataset. By comparing the speed and accuracy of various classical models in [12][13][14][15][16][17][18], the basic conclusion is that the accuracy of the Faster R-CNN model is higher, while the processing speed of the SSD model and the YOLO model are faster. erefore, the Faster R-CNN model and the SSD model are selected for comparative testing on the X-ray baggage image dataset. e experimental environment is built as follows.
e configuration of the high-performance graphics workstation is shown in e Faster R-CNN model and the SSD model are tested on the X-ray baggage image dataset. e Caffe deep learning framework is used in the Faster R-CNN model, and the Torch deep learning framework is used in the SSD model, and both network models are the VGG-16 model. IoU (intersection-over-union) is a concept used in the target detection, which is the overlap rate between the generated candidate bound and the ground truth bound. Here, IoU is equal to 0.5, which is used to test the X-ray baggage image dataset. e performance of the proposed method and the prior work is evaluated by comparing the following indicators: true Positive (TP%), a positive sample is predicted to be positive, and it can be called true accuracy. True negative (TN%), a negative sample is predicted to be negative, and it can be called false accuracy. False positive (FP%), a negative sample is predicted to be positive, and it can be called false alarm rate. False negative (FN%), a positive sample is predicted to be negative, and it can be called miss rate. Precision    features.
e framework of the system is Caffe, and the backbone model of the system is VGG-16. e SSD model is a one-stage object detection algorithm. e framework of the system is Torch, and the backbone model of the system is VGG-16. Table 9 shows the test results of the PASCAL VOC dataset and the X-ray baggage image dataset. e mAP (VOC) and the FPS (VOC) are obtained on the PASCAL VOC dataset, and the mAP (X-ray) and the FPS (X-ray) are obtained on the X-ray baggage image dataset. e accuracy of the Caffe + VGG-16 + Faster R-CNN model is lower on the PASCAL VOC dataset, but the accuracy of the Caffe + VGG-16 + Faster R-CNN model is higher on the X-ray baggage images dataset. e processing speed of the Torch + VGG-16 + SSD512 model is faster on the X-ray    baggage image dataset and the PASCAL VOC dataset. e processing speed of the Caffe + VGG-16 + Faster R-CNN model and the Torch + VGG-16 + SSD512 model is faster on the PASCAL VOC dataset than the X-ray baggage images dataset, because the image size of the PASCAL VOC dataset is 500 × 375, and the image size of the X-ray baggage images dataset is 1024 × 700.
Considering the accuracy of the Faster R-CNN model is higher than the SSD model and the processing speed of the Faster R-CNN model can also meet the current application needs, which it takes 2 seconds to collect an X-ray image. erefore, the XMC R-CNN model was designed based on the Faster R-CNN model combined with the characteristics of the X-ray baggage images for the automatic detection of the typical guns and the typical knives in the X-ray baggage images. Table 10 shows the test results of the XMC R-CNN model. e detection rates (TP%) of guns and knives are 96.5% and 95.8%, the miss rates (FN%) of guns and knives are 2.2% and 4.2%, and the accuracies (ACC) of guns and knives are 97.1% and 93.1%. It can be seen from the data that the X-ray image features of the guns are obvious, and all types of guns include barrels, butts, triggers, and other components, and these features are obviously different from the features of the common objects in the passenger baggage, so the detection rate of the gun is relatively high and the miss alarm rate of the gun is relatively low. On the contrary, the X-ray image features of the knives are weak, and the features of different types of knives are significantly different. In addition, these features are similar to the features of objects in the passenger baggage, such as baggage handles, baggage locks, umbrellas, and electronic equipment. erefore, the detection rate of the knife is relatively low and the miss alarm rate of the knife is relatively high. At present, the maximum processing speed of the XMC R-CNN model is 250 milliseconds per image, and the processing speed can meet the requirement of collecting an X-ray image for 2 seconds. In Figure 10, the knives in these images are covered with some objects. e knives are not detected in the X-ray baggage image dataset without the material classifier, and these knives are detected in the XMC R-CNN model with the material classifier.

Conclusion and Future Work
In this work, the XMC R-CNN model is explored in the tasks of classification and detection within X-ray baggage images. e main contribution of the XMC R-CNN model is used to solve the problem of contraband detection in overlapped X-ray baggage images by the X-ray material classifier algorithm and organic stripping and inorganic stripping algorithm, and the detection rate and the miss rate that meet the requirements of screening on-site are achieved by the deep learning method. e detection rate is greater than 95%, and the miss rate is less than 5%. In some applications, it has exceeded the level of security inspectors. e automatic detection technology of the contraband based on the XMC R-CNN model is applied to the X-ray baggage security image. According to user needs, the safe X-ray baggage images can be automatically filtered in some specific fields, which reduces the number of X-ray baggage images that security inspectors need to screen. e efficiency of security inspection is improved and the labor intensity of security inspection is reduced. In addition, the security inspector can screen the X-ray baggage image according to the box of automatic detection, which can improve the effect of security inspection.
Future work will consider exploring the threat image projection (TIP) research in order to increase the training dataset, which can improve the accuracy of the automatic detection algorithm. On the other hand, future work will consider developing the XMC R-CNN model based on the safe objects to reduce the false alarm rate. ese safe objects have the marked features and are common in passenger baggage, such as baggage handles, baggage locks, umbrellas, watches, power banks, and electronic equipment. e detection rate of the safe object is improved, and the false alarm rate of the contraband is reduced.
Data Availability e datasets used in this work were provided by the FISCAN Systems. e dataset involving security cannot be shared.

Conflicts of Interest
e authors declare that they have no conflicts of interest.