Research on the Basketball Goal Recognition Method Based on Improved MobileNet

Moving target detection is involved in many engineering projects, but it is difficult because of the strong time-varying speed and uncertain path. Goal recognition is the key technology of the basketball goal automatic test. Also, accurate and timely judgment of basketball goals has important practical value. )erefore, a basketball goal recognition method based on an improved lightweight deep learning network model (L-MobileNet) is proposed. First of all, the basket detection is carried out by the Hough circle transform algorithm. )en, in order to further improve the detection speed of basketball goals, based on the lightweight network MobileNet, an improved lightweight network (L-MobileNet) is proposed. First of all, for deeply separable convolution, channel compression and block convolution reduce the parameters and computational complexity of the module. At the same time, because block convolution will hinder the information exchange between characteristic channels, an improved channel shuffling method, IShuffle, is introduced. )en, combined with the residual structure to improve the generalization ability of the network, the RLDWSmodule is constructed. Finally, a more lightweight network L-MobileNet is constructed by using the RLDWSmodule. )e experimental results show that the proposedmethod can effectively realize the judgment of basketball goals, and the judgment accuracy is improved by 8.35%. At the same time, the amount of parameters and computation is only 29.7% and 53.2% of the original, and it also has certain advantages compared with other lightweight networks.


Introduction
e NBA (National Basketball Association) and CBA (China Basketball Association) are popular sports in current ball games. rough the understanding of the NBA and CBA, it is found that they use artificial and intelligent devices to realize timing and scoring. e method of combining artificial and intelligent devices is used to realize timing and scoring. ere is a camera on the backboard of the NBA, which automatically takes photos whenever players are under the basket or on the layup. When the basketball falls from the basket, it will touch the net, which will drive the sensor. e controller will receive the goal information, and the referee will update the score and the time. e competition between the NBA and CBA is dominated by manpower and supplemented by equipment.
At present, the physical education professional examination includes special items and supplementary items.
Special items and auxiliary items should be completed in a short time, and there are many examination items. erefore, the task for invigilators is very heavy. However, the fairness and justice of the college entrance examination must be observed by every examinee and invigilator. At present, many places still use some manual methods to score basketball exams, so too much workload will lead to unfairness.
is method is now slowly being replaced by some smart devices. Nowadays, infrared detection [1][2][3][4][5] is used in many exams to judge basketball goals. rough the two intelligent detection methods of infrared detection and microswitch, although some problems existing in traditional manual work have been effectively solved, these two methods also have some shortcomings, such as easy damage and high maintenance cost. en, finding a better alternative method is an urgent research work at present.
With the rapid development of image processing technology, the detection technology of moving objects in a video has been more widely used [6][7][8][9][10]. With the rapid development of the same era, more and more requirements for video processing applications have been put forward. For example, the fixed-point shooting test is conducted in the college entrance examination for physical education. Moving object detection technology is a particularly important branch of vision technology. Because of the strong scientific research value of moving target detection technology, researchers have devoted a lot of energy to research it and achieved good research results.

Target Detection Analysis Based on the Deep Learning Network
At present, the target detection methods based on deep learning are mainly divided into two categories [11][12][13][14]: the two-stage detection framework and single-stage detection framework. e two-stage detection framework mainly divides the detection task into two stages. Firstly, the candidate region is generated, and then, the region is regressed and classified by using the deep network model. is class detection algorithm mainly has high detection accuracy, but the detection speed is slow. Girshick et al. [15] put forward the application of RCNN in the field of target detection and improved the map of the algorithm to 53.3% in the Pascal VOC 2012 dataset [16], which is far superior to the traditional target detection algorithm. However, there are also some problems such as tedious training steps, slow detection speed, and the need to input fixed-size images. He et al. [17] proposed an SPPNet algorithm to solve the problem that the fixed-size image must be input to extract features by the CNN. e innovation of this algorithm is that a Spatial Pyramid Pooling (SPP) layer is added between the convolution layer and fully connected layer. e single-stage detection framework can classify and regress the targets in the image at the same time, without the operation of generating candidate regions.
is class detection algorithm has a fast detection speed, but the detection accuracy is usually not as good as that of the twostage detection algorithm. Redmon et al. [18] proposed the classic YOLO algorithm. In 2016, Liu et al. [19] proposed the SSD algorithm. is algorithm combines the advantages of YOLO's fast speed and RPN's accurate positioning [20] so as to achieve the effect of detecting the target at different scales. Compared with YOLO, the SSD algorithm can predict more candidate regions, and the detection effect is better, but the disadvantage is that the speed is slower than that of YOLO. e lightweight network model MobileNet [21] proposed by Google focuses on devices with limited resources, such as mobile or embedded devices, to maximize classification accuracy. e main innovation of this network lies in the proposal of the Depth-Wise Convolution (DWC) module to reduce the parameters and computation and the effective compromise between classification accuracy and speed by using two superparameters of width multiplier and resolution multiplier. erefore, in order to better solve the problems of easy damage, high cost, and misjudgment in traditional fixedpoint shooting devices, this paper proposes a basketball goal recognition method based on image analysis and uses deep learning technology to solve the abovementioned problems. Firstly, the problems of the existing lightweight network model MobileNet are analyzed theoretically, and the improved strategies are put forward to solve these problems, and the RLDWS module is gradually constructed. en, the improved L-MobileNet model is constructed by using this module instead of the original deep separable convolution. Finally, a comparative experiment with other lightweight networks is carried out to further verify the rationality and effectiveness of the proposed improved network.

Detection of Basket.
e detection of the basket is that the original color image is processed into a gray image, the gray image is subjected to median filtering and mathematical morphology processing, and then, the Hough circle transformation algorithm [22] is used to extract the basket. Finally, the extracted basket circle is added to the original color image, and the position and size of the basket are marked. e essence of the Hough circle transformation is to transform the coordinates of the image and to transform the plane coordinates with the parameter coordinates so that the transformed results are easier to identify and detect. e general equation of a circle is as follows: where (a, b) is the center of the circle and r is the radius of the circle. When the circle on the X-Y plane in the image space is transformed into the a-b-r parameter space, a three-dimensional cone will be formed in the parameter space corresponding to the circle containing (x, y) points. e Hough transform principle is shown in Figure 1. e parameters of the circle can be obtained from the detected point so as to determine the circle. e parameter image of the circle is shown in Figure 2.

Problem Analysis of the MobileNet Model.
MobileNet has the following three problems [23][24][25]: (1) 1 × 1 convolution has a large amount of computation: by deducing the computation and parameters of the network structure, it is found from the perspective of the layer type that the computation and parameters are mainly concentrated on 1 × 1 pointby-point convolution operation, in which the computation accounts for about 95% of the whole network and the parameters account for 75%, as shown in Table 1.
In the depth separable convolution operation, the calculation amount N DW of the depth convolution and the calculation amount N PW of the point-bypoint convolution are shown in equations (2) and (3), respectively.

Scientific Programming
where D K is the size of the convolution kernel, H × W is the size of the input feature, M is the dimension of the input feature, and N is the dimension of the output feature. It can be seen that the calculation amount of point-by-point convolution is positively correlated with N, and the value of N will gradually increase with the deepening of network layers, resulting in an increase in the proportion of calculation amount of point-by-point convolution operation. Subsequently, this problem is mainly improved by improving the depth separable convolution structure.
(2) Low-dimensional data collapse caused by ReLU: when low-dimensional (n-dimensional) data are mapped to high-dimensional (m-dimensional) by the random matrix for ReLU operation and then mapped back to this dimension by the generalized inverse matrix, some information will be lost, and the smaller the m is, the more the information will be lost.
To solve this problem, in the subsequent module design, it is considered to use the Mish activation function [26,27] instead of ReLU after feature mapping with few channels; otherwise, the information will be destroyed. (3) No reuse feature: MobileNet is a very simple straightcylindrical structure. In the training process of the network model, if the weight of a convolution node becomes 0, the output of the node will be 0 for any input. However, the gradient of ReLU operation to 0 value is 0, so the value of this node will not be recovered no matter how much the iteration is, and the residual module will be added to improve it later.

Improved MobileNet Model.
In response to problems 1 and 2, in order to maximize the use of packet convolution, we modified the improved channel shuffle (IShuffle), as shown in Figure 3. First of all, it is still uniform recombination of different groups of features, but there is a group of recombined features that are different, and this group of features is obtained by merging and combining the recombined features of each group, respectively. Specifically, it is assumed that the number of input features m is 9 and the number of scores is 3. e first six features are still the same as those in channel shuffling. e remaining three features are obtained by intergroup feature fusion of these nine feature channels. e last six feature channels are spliced with these three features in the channel dimension, and the final output features are obtained. As can be seen from Figure 3, after the uniform recombination of features, IShuffle fused features between groups to improve the information exchange between groups.
In addition, because ReLU is prone to data collapse in low dimensions, we consider changing ReLU nonlinearity to the Mish activation function after compressing the 1 × 1 convolution kernel of dimension, and its expression and graph are shown in Figure 4.
Compared with ReLU, the Mish activation function has better smoothness. When the value is negative, a smaller gradient flow is allowed, which makes the information better penetrate into the network, thus improving the accuracy and generalization ability of the network while still being borderless.
Because the block convolution reduces the information flow between channels, the modified IShuffle strategy is subsequently added. Finally, the outputs from the two parts are spliced to obtain the final output features. As shown in Figure 5, the module is named the "LDWS module."       Scientific Programming e LDWS module is mainly composed of a compression layer and expansion layer. e compression factor is t, which represents the dimension reduction ratio of the compression layer, and its calculation is shown in the following equation: where m represents the number of input channels of the LDWS module and s 1 represents the number of convolution kernels of the compression layer. e calculation amount for the LDWS module is shown in the following equation: where e 1 represents the convolution kernel number of the 1 × 1 part in the expansion layer and e 2 represents the convolution kernel number of the depth separable convolution part in the expansion layer. e comparison between the calculation amount of the LDWS module and that of depth separable convolution is as follows: where g 1 represents the number of packet groups in pointby-point convolution. It can be seen that by reducing the compression factor t and increasing the number of packets in packet convolution, the amount of parameters and computation can be greatly reduced. In the experiment, the value of t is 0.125. In view of problem 3, the RLDWS module is designed by introducing residual connection based on the LDWS module, as shown in Figure 6, in which the blue and red parts are LDWS1 modules, and the purple part is an improved residual structure. Before ResNet appeared, in order to improve the recognition accuracy of the neural network model, a deeper network was often built by simply stacking layers. However, due to the back propagation process of the gradient, the deeper network may make the parameters of the shallow layer unable to be updated, and the gradient disappears, thus leading to the saturation or even decline of the network performance. erefore, a residual connection structure is proposed in ResNet.
Using the RLDWS module and the LDWS module to replace some DWS modules in MobileNet and increase the number of RLDWS modules, the final structure of L-MobileNet is constructed, as shown in Table 2, where s is the step size in deep convolution, c is the number of output channels, and k is the number of final categories.
L-MobileNet is mainly composed of one standard convolution, five DWS modules, and nine RLDWS modules.
Like MobileNet, in L-MobileNet, the standard convolution operation of 3 × 3 is the first step, followed by five DWS modules.
en, the remaining eight DWS modules are replaced with the improved RLDWS module in this paper, and an additional RLDWS module is added after the last RLDWS module. Finally, through the average pooling and full connection layer of 7 × 7 sizes and multiclassification with softmax, the output of the network is obtained.

System Construction.
e basketball goal recognition system is mainly built by hardware and software environment.
e hardware environment mainly provides video image data and running environment for the system, mainly including cameras and PCs. e software mainly processes video image data. e model of the camera is M30A, and the frame rate is 60 frames per second when the resolution is 640 * 480. e image data format is YUV422. e data transmission protocol is USB2.0. e CPU of the PC is i5-7200U@2.50 GHz, the memory is 8G, and the graphics card is GTX 2060s. e operating system is Windows 10 (64-bit), and the software is MATLAB 2019 b.
FLOPs are used to measure the complexity of the model, specifically the number of multiply-add. e smaller the value of this index, the less the amount of calculation required by the model, that is, the faster the speed.

Selection of Grouping Quantity.
FLOPs can be greatly reduced by using block convolution operation. However, the fact that FLOPs do not increase does not mean that the speed becomes faster. Grouping too much will increase the memory access consumption, and it will also reduce the speed. erefore, it is necessary to weigh the selection of the number of packets through experiments. Table 3 shows the comparison of the effects of setting different packet convolutions on the network model on L-MobileNet, and g represents the number of packets.
With the number of packets increasing from 1 to 16, the number of parameters and calculation of the network also decreases, and the judgment accuracy first increases and then decreases. erefore, it is necessary to make a certain tradeoff between the judgment accuracy and the speed, so the convolution number of the grouping is selected as 4 to continue the subsequent experiments. Table 4 shows the LDWS results of the influence of the residual structure on the judgment accuracy and speed of the network. When the residual structure is added, the error of the network is reduced by 6.84%, respectively.

Analysis of the Experimental Process.
e basket test results are shown in Figure 7. In addition, through the comprehensive analysis of three simulated test videos collected from the left, middle, and right shots, the Scientific Programming basketball was detected from the video sequence frame images. e test sequence of shooting the video is shown in Figure 8.
After testing 30 groups of data (10 groups of data on the left, middle, and right), the test results are as shown in Tables 5-7, respectively.    In this experiment, the performance of L-MobileNet is compared with the commonly used lightweight models MobileNet, MobileNet V2 [28], SqueezeNet [29], and ShuffleNet [30]. Table 8 shows the performance comparison of each network model. e L-MobileNet model is better than the other network models in every index. Compared with MobileNet, the judgment accuracy of L-MobileNet is improved by 8.35%, and the amount of parameters and calculations is only 29.7% and 53.2% of the original ones. SqueezeNet greatly reduces the number of parameters due to the compression of parameters, but the amount of calculation is still very large. L-MobileNet is superior to SqueezeNet except for a few more parameters. Compared with ShuffleNet, L-MobileNet's judgment accuracy and parameter quantity are almost the same, but the amount of calculation is almost half that of ShuffleNet. e change curve of judgment accuracy of each network model during dataset training is shown in Figure 9.

Conclusions
In this paper, an improved lightweight neural network model (L-MobileNet) based on MobileNet is proposed and applied to basketball goal recognition. rough the construction experiment of the improved network model, the improved network with a compromise between accuracy and speed is selected. Compared with MobileNet, the judgment accuracy of the improved network is increased by 8.35%, while the amount of parameters and calculations are only 29.7% and 53.2% of the original ones. Finally, the improved network is compared with some commonly used lightweight network models, and it is concluded that the improved network has a good effect on the accuracy and speed of goal judgment. en, the improved RLDWS module is tried to combine with VGGNet so as to further reduce the complexity of the model. At the same time, neural structure search will try to use data-driven and intelligent methods to automatically build a better network.

Data Availability
e experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e author declares no conflicts of interest regarding the present study.