Sensor Fusion Basketball Shooting Posture Recognition System Based on CNN

In recent years, with the development of wearable sensor devices, research on sports monitoring using inertial measurement units has received increasing attention; however, a specific system for identifying various basketball shooting postures does not exist thus far. In this study, we designed a sensor fusion basketball shooting posture recognition system based on convolutional neural networks. The system, using the sensor fusion framework, collected the basketball shooting posture data of the players’ main force hand and main force foot for sensor fusion and used a deep learning model based on convolutional neural networks for recognition. We collected 12,177 sensor fusion basketball shooting posture data entries of 13 Chinese adult male subjects aged 18–40 years and with at least 2 years of basketball experience without professional training. We then trained and tested the shooting posture data using the classic visual geometry group network 16 deep learning model. The intratest achieved a 98.6% average recall rate, 98.6% average precision rate, and 98.6% accuracy rate. The intertest achieved an average recall rate of 89.8%, an average precision rate of 91.1%, and an accuracy rate of 89.9%.


Introduction
Basketball is one of the most popular sports with a large fan base worldwide. As a competitive sport, basketball requires two teams of players to use various technical actions to compete with each other. A basketball game includes various technical statistics, such as score, rebound, assist, block, and steal, among which score is the central aspect that decides which team is the winner and loser [1][2][3]. The score in a basketball match is accomplished through players shooting the ball into the basket, and, thus, shooting plays an important tactical role in this game.
In recent years, with the rapid development of wearable sensor technology and the increased demand for basketball worldwide, many researchers have used wearable devices integrated with an internal measurement unit (IMU) to study basketball shooting [4][5][6]. Bai et al. [7] used Microsoft Band and weSport systems to collect data from two basketball players on both attack and defense, and they used a support vector machine (SVM) to effectively distinguish shooting and defense in basketball games. Aacikmese et al. [8], using an IMU placed on the arm, classified the six technical movements (forward-backward dribbling, left-right dribbling, regular dribbling, two-handed dribbling, shooting, and lay-up) in basketball by SVM. Zhao et al. [9] used four IMUs placed on the left and right upper arms and forearms to collect basketball technical movement data, using SVM to identify dribbling, passing, catching, and shooting. These studies, which use sensors on the arms, address basic shooting postures but ignore composite shooting postures. Composite shooting postures are shooting postures that consist of a series of hand and foot movements [10]. It is not sufficient to study composite shooting postures using only arm sensor data.
Shooting is a technical movement that requires physical coordination. In the shooting process, the movement of the feet is as important as the movement of the arms [1,10]. Shi et al. [11] used smart insoles integrated with IMUs to distinguish between dribbling, jumping, and turning around during basketball. Peng et al. [12] also used smart insoles to study the sideslip, back, cross, jab, and jump steps in basketball. These studies, which concern footstep movement in basketball using smart insoles integrated with IMUs, provide a basis for us to carry out research on composite shooting postures.
After more than 100 years of development, basketball has developed many complex and delicate technical movements, such as shooting. Recognizing these technical movements requires powerful recognition tools. SVM is a simple and robust algorithm and is widely used in basketball technical movement recognition [7][8][9]. However, SVM requires feature extraction and cannot be applied to large-scale training samples. Convolutional neural networks (CNNs) have solved these problems. A convolutional neural network is one of the representative algorithms of deep learning. Research on CNNs started in the 1980s and 1990s. In the twenty-first century, with the introduction of deep learning and the development of computer hardware capable of supporting deep learning, there has been a rapid development of CNNs, and these CNNs have worked well in the fields of computer vision and natural language processing [13][14][15]. With the development of CNNs, many researchers have studied sports using multiple IMUs combined with a CNN model of deep learning. Lee et al. [16] used the CNN long short-term memory model of deep learning to classify six squat positions (one correct and five incorrect); Kautz et al. [17] used a deep CNN to classify 10 types of beach volleyball technical movements. The effectiveness of these studies illustrates the great potential of deep learning models based on CNNs in the field of sports technical movement recognition.
Although there are many sports monitoring systems based on IMUs [6,18,19], there is still no system based on deep learning models to recognize a variety of basketball shooting postures. This study proposes a sensor fusion basketball shooting posture recognition system based on a CNN to recognize multiple types of basketball shooting postures. The main features of this study are as follows: (1) A sensor fusion framework dedicated to basketball shooting postures is designed to collect shooting posture data and perform sensor data fusion The remainder of this paper is organized as follows. Section 2 briefly summarizes the system framework and methods of data collection, fusion, and classification. Section 3 presents the experiments and results. Section 4 discusses the results of this study. Finally, conclusions are presented in Section 5.

System
Hardware and Software Design. The sensor fusion basketball shooting posture recognition system consists of two independent wireless sensor modules, a USB dongle, and a laptop computer, as shown in Figure 1. The wireless sensor module is composed of an IMU (mpu-9250, including accelerometer and gyroscope) and a microcontroller unit (MCU, Nordic nrf52832, including Bluetooth functionality). The module is powered by an external 500 mA 3.3 V lithium battery. The IMU is responsible for collecting the players' raw shooting posture data, including accelerometer data and gyroscope data. The sampling rate was 100 Hz, and the collected data were transmitted to the MCU through I2C. The MCU is the core component of the wireless sensor module and is responsible for transmitting the raw shooting posture data from the IMU to the USB dongle via Bluetooth. The USB dongle includes Bluetooth and USB human interface device (HID) functions; it is responsible for transmitting the raw shooting posture data received from the wireless sensor modules to a laptop through a USB HID. The laptop contains a data-processing software developed by MATLAB, which is responsible for receiving, displaying, and fusing the raw shooting posture data to form the sensor fusion basketball shooting posture datasets.

Sensor Fusion Framework.
Consider right-handed players as an example. When a basketball player shoots, his right hand is the main force hand, and at the same time, his left foot is the main force foot. The main force hand and main force foot perform the main tasks in basketball shooting. They make the shooting postures stable and can best reflect the characteristics of the shooting postures [10]. Therefore, the posture data of the right hand and left foot of righthanded players are key data. Correspondingly, the key data of the left-handed players were generated from the left hand and right foot. Modern basketball has several types of shooting posture. For basic shooting postures, the main force hand sensor data reflects the characteristics of the shooting postures. For composite shooting postures, such as stop jump shots and gather step shots, the main force hand sensor data cannot fully reflect the characteristics of the postures. However, the main force hand sensor data fused with the main force foot sensor data can fully reflect the characteristics of composite shooting postures. Therefore, this study proposes a sensor fusion framework for basketball shooting postures. This framework collects accelerometer and gyroscope data using wireless sensor modules placed on the main force hand and the main force foot of the player. Then, the data are fused to form the input of the deep learning model for shooting posture classification. The sensor fusion framework proposed in this study can classify a variety of complex shooting postures without increasing the number of sensors.
As shown in Figure 2, the sensor fusion framework proposed in this study consists of three steps: (1) shooting posture data collection, (2) data alignment and mergence, and (3) data segmentation and exclusion. First, we synchronized the two independent wireless sensor modules and placed them on the player's main force hand and main force foot.

2
Journal of Sensors Then, we collected the shooting posture data, which contain timestamps and transmit them to the laptop computer, stored in two separate files. Because our sampling rate is 100 Hz, the timestamp is in units of 10 ms. Second, the data in the two data files are aligned according to the timestamps and merged into a sensor fusion data file, as shown in Algorithm 1. File _H, file_F, and file_M represent the main force hand sensor data file, main force foot sensor data file, and sensor fusion data file, respectively. Owing to data loss in the wireless sensor module and other reasons, the shooting posture data of the main force hand and main force foot did not match, and hence, the sensor fusion data frequency suffered a loss of 1.17%. However, the frequency reduction did not affect the recognition of shooting postures. Finally, we divided the sensor fusion data file into independent shooting posture data entries, removed the erroneous posture data, and stored them in the sensor fusion basketball shooting posture dataset, as shown in Algorithm 2, where matrix(i) represents the ith shooting posture data matrix. We marked the data generated due to sensor misplacement or incorrect shooting posture in the experimental stage and deleted it in this stage. Thereafter, the sensor fusion basketball shooting posture dataset was finally formed.  3 Journal of Sensors VGG16 deep learning model showed excellent performance in image classification, and the VGG16 model with onedimensional convolution kernels had been used to classify the one-dimensional data obtained by using the accelerometer and gyroscope [26], in this study, we used the onedimensional convolution kernels VGG16 deep learning model to classify sensor fusion basketball shooting postures.
The structure of the VGG16 deep learning model mainly includes convolutional layer, max pooling layer, and fully connected layer, as shown in Figure 3. The function of the convolutional layer, which consists of several convolutional units, is to extract different features of the input data. Adding a greater number of convolutional layers means a greater number of complex features can be extracted. The working mode of the convolutional layer can be expressed by Equations (1), (2), and (3): where S represents the vector of sample data, C represents the vector of the convolution kernel, m is the number of sample data, n is the number of input features, l is the length of the convolution kernel, and k is the number of convolution kernels. The result of the convolution operation of the ith convolution kernel is shown in Equation (4).
where i = 1, 2, 3, ⋯, k. Because we use padding, the width of the vector after the convolution operation is n.
The max pooling layer is mainly used for reducing feature dimensionality, compressing the number of data and parameters, reducing overfitting, and improving the fault tolerance of the model. The working mode of the max pooling layer can be expressed by Equations (5) and (6), where a is the stride.
The fully connected layer mainly plays the role of classification, which is used to integrate and map the distributed feature representation extracted by the convolutional layer and the max pooling layer to the sample label space. The output of the fully connected layer is the final classification result.
The weight initialization of the convolutional layer and fully connected layer uses the Kaiming method [27], which can accelerate the convergence speed of the model. 1. Open file_H, file_F, and file_M 2. Send file_H first record to record_H, send file_F first record to record_F 3. while (file_H NOT end) AND (file_F NOT end) do 4. if record_H.timestamp == record_F.timestamp do 5. Merge record_H and record_F to file_M 6. Send file_H next record to record_H, send file_F next record to record_F 7. else if record_H.timestamp > record_F.timestamp do 8. Delete record_F, send file_F next record to record_F 9. else if record_H.timestamp < record_F.timestamp do 10. Delete record_H, send file_H next record to record_H 11. end if 12. end while 13. Close file_H, file_F, and file_M  Journal of Sensors The Z-Score method [28], which can convert datasets of different measurements into a unified measurement of Z -Score for comparison, is adopted for data standardization. The model uses the Adam optimizer [29], which has the advantages of simple implementation, high calculation efficiency, and lower memory requirements, and it is suitable for large-scale data and parameter scenarios. It is often used as the optimization algorithm for stochastic gradient descent (SGD). The minibatch gradient descent algorithm adopted for the model has the high speed of the SGD algorithm as well as the stability of the batch algorithm, which is suitable for deep learning models that need to process large amounts of data [25]; in the proposed model, the batch size is set to 200. The specific parameter settings are listed in Table 1.

Experimental Method.
The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the IEC for Clinical Research of    A total of 13 Chinese male adults (age 28:5 ± 9:5 years, height 179 ± 14 cm, weight 80:5 ± 21:5 kg) were selected as subjects. Although they had basketball experience (12 ± 9 years), they were not professional players and had no professional training, as reported in Table 2. All subjects were righthanded players. There were two centers (C), two power forwards (PF), three small forwards (SF), three shooting guards (SG), and three point guards (PG). Among them, five subjects had been trained as part of a college team. All subjects gave their informed consent for inclusion before they participated in the study. The subjects were verbally informed of the experiment process and precautions to be taken before the start of the experiment.
We chose 10 types of basketball shooting postures [30][31][32][33] for the experiment, as summarized in Table 3. These postures included five basic shooting postures: hook shot, free throw, inside shot, lay-up, and jump shot. In addition, we chose five types of composite shooting postures: gather step shots, stop jump shots, pump fakes, jettison throws, and spin jumpers. The stop jump shot, pump fake, and spin jumper are frequently used in basketball. The gather step shots and jettison throws are new introductions to the sports that have become increasingly popular in recent years.
The experiment was conducted in the basketball court of the Nanjing University of Information Science and Technology. The subjects repeated each of the 10 types of basketball shooting postures, as shown in Tables 3, 50-150 times. Each shooting posture was divided into 1-4 groups according to the physical strength of the subjects, with 25-150 shooting posture cycles in each group. At the beginning of each shooting posture cycle, the subjects held the ball without moving for 3 s and then performed the corresponding shooting posture. When the shooting posture was finished, they did not move until after 3 s had passed. Immediately after the shooting posture was completed, the staff picked up the ball and passed it to the subject after the subject moved. If the testing of the group was not complete, the next shooting posture cycle was started after the subject received the ball. If the testing of the group was complete, the data-processing software stored the raw shooting posture data, which contained 25-150 shooting posture cycles for sensor data fusion. The shooting posture test process is presented in Figure 4.
Finally, 10 types of sensor fusion basketball shooting posture datasets of 13 subjects, a total of 12,210 shooting posture data entries, including 12,177 valid data entries, were formed. The datasets included 1,210 gather step shots, 1,228 hook shots, 1,209 free throws, 1,223 stop jump shots, 1,221 pump fakes, 1,225 inside shots, 1,218 jettison throws, 1,216 lay-

Classification.
In this study, intra-and intertraining and testing methods were used for the sensor fusion basketball shooting posture datasets. Both methods were carried out on a computer configured with a Core i5-9400 CPU, 32 GB memory, and a GeForce GT730 graphics card. The operating system was Windows 10 Home, and the model was implemented using the MATLAB 2019b Deep Learning Toolbox.
3.2.1. Intratraining and Testing. All 12,177 sensor fusion basketball shooting posture data entries were randomly arranged, and the training and test datasets were designed with an 8 : 2 ratio, including 9,741 data entries in the training dataset and 2,436 data entries in the test dataset. The training dataset was used to train the model, and the test dataset was used to test the model. Figure 6 presents a comparison between the loss rate and accuracy rate of the intratraining process. As the loss rate decreases, the accuracy rate continuously increases, demonstrating the continuous improvement of the training model. Figure 7 presents the confusion matrix of 10 types of sensor fusion basketball shooting posture test dataset classified by the intratest. The row variables of the matrix represent the recall rate and false negative rate, and the column variables represent the precision rate and false discovery rate.

3.2.2.
Intertraining and Testing. The sensor fusion basketball shooting posture data of 13 subjects were randomly arranged; the data of 11 subjects were used to form the training dataset, and the data of two subjects were used to form the test dataset. The total number of training data entries was 10,126, and the total number of test data entries was 2,051. The training dataset was used to train the model, and the test dataset was used to test the model. Figure 8 shows a comparison between the loss rate and the accuracy rate of the intertraining process. Figure 9 presents the confusion matrix of the 10 types of sensor fusion basketball shooting postures test dataset classified by the intertest. Figure 10 depicts the t-SNE diagram of the intratest dataset. The t-SNE diagram shows the distribution characteristics of the data intuitively by reducing high-dimensional data to two-dimensional data. From the t-SNE diagram of the intratest dataset, it can be observed that the lay-up, jettison throw, and stop jump shot are easily confused, as well as hook shot, inside shot, and free throw. Table 4 summarizes the recall rate, precision rate, average recall rate, and average precision rate of the intratest classification results. The classification results of the intratest reveal that the average recall rate was 98.6%, and the maximum recall rate was 100% for the jump shot and spin jumper. The minimum recall rate was 96% for the gather step shot. The average precision rate was 98.6%; the maximum precision rate was 100% for the hook shot, and the minimum precision rate was 97.2% for the jump shot. The above data indicate that although the t-SNE diagram shows that there are easily confused shooting postures in the intratest dataset, the sensor fusion basketball shooting posture recognition      11 Journal of Sensors system still performed well in the intratest, as a result of the selection method of the intratraining and intratest datasets. Figure 11 presents the t-SNE diagram of the intertest dataset. The t-SNE diagram of intertest dataset demonstrated that inside shot, free throw, hook shot, and jump shot are easily confused, as well as stop jump shot and jettison throw. Table 5 reports the recall rate, precision rate, average recall rate, and average precision rate of the intertest classification results. The classification results of the intertest show that the average recall rate was 89.8%, and the maximum recall rate was 100% for the gather step shot, pump fake, inside shot, and spin jumper. The minimum recall rate was 71% for the jettison throw. The average precision rate was 91.1%; the maximum precision rate was 100% for the hook shot and spin jumper, and the minimum precision rate was 65.9% for the inside shot. As per Figure 10, 20 free throws and 74 hook shots were identified as inside shots. This is because free throws and inside shots are similar in action, differing only slightly in the release angle and speed. In addition, as we did not have wireless sensor modules on both left and right wrists, the ability to discriminate between singlehanded and double-handed shooting postures is slight. Thus, some of the one-hand shots, such as hook shots, were identified as double-hand shots, such as inside shots. The 35 jettison throws were identified as stop jump shots for the same reason. There are 36 jump shots identified as free throws owing to the similar shooting distance and shooting angle between free throws and jump shots. Furthermore, there are no barometer data collected in this study; thus, there is no clear distinction between jump shooting posture and nonjump shooting posture. In addition, as the subjects in this experiment included five different play positions (i.e., C, PF, SF, SG, and PG), the heights and weights of the subjects were different, the subjects had very different actions in the same shooting posture, and considering that the subjects were   12 Journal of Sensors not professional players, the shooting postures varied considerably. In addition, stability is poor when physical strength is insufficient [34]. These two points also explain the aforementioned low recognition rate of the shooting postures. Finally, the small number of subjects contributes to the low recognition rate. From the classification results, the VGG16 deep learning model achieved good classification in 10 types of sensor fusion basketball shooting posture recognition experiments.

Results and Analysis.
In contrast, [35] developed a deep learning model around a one-dimensional convolutional network (1D-CNN) architecture and verified it on the public dataset UTD-MHAD, which contained 27 types of activities. In [36], the CNN model was used to identify six types of pedestrian mode. In [37], a hybrid deep learning model based on the fusion of multiple spatiotemporal networks (FMS-Net) was proposed, which was used to detect the four phases of walking. As all the above research results were achieved only for the intratest, the comparison of the above research results with the intratest results of the VGG16 model used in this study found that the classification results of VGG16 were better than those of the other three classification models. The comparison results are shown in Table 6.
To verify the accuracy and effectiveness of the proposed system, it was compared with references [18,19,38]. Reference [18] established a real-time wearable assist system for upper extremity throwing action based on accelerometers, which used the longest common subsequence (LCS) algorithm to recognize the six phases of baseball throwing    13 Journal of Sensors posture. In [19], an activity assessment chain for evaluating human activity was established using machine learning (ML) to classify six types of indoor rowing stroke postures (one correct and five incorrect). Reference [38] used a wearable and wireless system based on SVM to recognize overhead passes, chest passes, and shooting in basketball. As shown in Table 7, similar to the three systems above, the system proposed in this paper uses a small number of sensors to recognize a number of postures. This system has certain advantages in terms of average accuracy compared with the systems proposed in references [18,19]. Although the average accuracy is slightly lower than the system proposed in reference [38], the system proposed in this paper recognizes more postures and achieves good recognition, even for easily confused postures. In addition, compared with the ML and SVM models used in references [19,38], the deep learning model used in this study has greater development potential. Based on the above analysis, the system proposed in this study exhibits certain advantages compared with the other three systems.

Discussion
Shooting is an important aspect in basketball matches and training. Correctly distinguishing the shooting posture used by basketball players in a match and during training can help in making a correct evaluation of the technical characteristics of the players, which in turn could prove helpful in carrying out targeted guidance and practice sessions for players. This study proposes a sensor fusion framework for basketball shooting posture. It fuses the sensor data of the main force hand and main force foot to identify and classify the basic shooting postures and composite shooting postures in basketball. The framework proposed here shows a novel development direction for wearable devices in basketball, which is beyond the conventional framework of IMUs placed only on the arms. Although this sensor fusion framework can recognize more composite shooting postures without integrating more sensors, it still has limitations, which are as follows: (1) The amount of limb data is still limited, which can pose challenges for accurately reflecting the subjects' posture information (2) While using the proposed method, it is necessary to consider the problem of sensor synchronization and how to align the data of the two sensors if one sensor loses data (3) The use of two sensors makes our proposed method more costly compared with the method that uses a single sensor Many shooting postures in basketball have certain similarities, and nonprofessional basketball fans' shooting postures are generally not standard and less stable; hence, their shooting postures are generally confusing. The basketball shooting posture recognition system proposed in this paper selects nonprofessional basketball fans as subjects, uses a deep learning model based on CNN, classifies 10 types of easily confused shooting postures, and obtains a good classification effect, proving the feasibility of the deep learning model for basketball shooting posture recognition and demonstrating the robustness of the proposed system. Moreover, compared with ML models such as SVM, the deep learning model used in this paper has strong development potential in the future, and it is possible to integrate it into a low-cost integrated circuit in the future to reduce the cost of corresponding smart devices. Therefore, the results of this study can be used in the future for the development of low-cost wearable intelligent basketball motion recognition devices for nonprofessional basketball players.
Basketball shooting, especially composite shooting postures, can be divided into a series of decomposition actions. The time-series and attention of each decomposition action are different [10]. Although the VGG16 deep learning model used in this study has achieved good results in classifying 10 types of sensor fusion basketball shooting postures, there remain shooting postures with lower classification accuracy, such as stop jump shots and inside shots. If time-series and attention judgments are added to the deep learning model, the recognition effect could be further improved. Because researchers have used deep learning models that combine time-series and attention judgments for classification [39], we can also add time-series and attention judgments to deep learning models to improve classification accuracy in our future work. Furthermore, lightweight deep learning models such as MobileNet [40] and SqueezeNet [41] will be adopted to ensure the corresponding time and space efficiency after adding time-series and attention judgments.
Reference [42] studied walking and trotting in equestrian sports by calibrating the sensor data accuracy of four coordinate systems. Similarly, the accuracy of sensor data is also important for recognizing basketball shooting postures. Compared with the reference [42], this study still needs to be strengthened in sensor data accuracy calibration. In general, the sensor will suffer from sensor drift after a period of time, which affects the accuracy of the collected data. In addition, sensor misplacement, as the sensor is not firmly fixed on 14 Journal of Sensors the limb, can also lead to other accidents. In [43], the authors proposed a method combining zero velocity update (ZUPT) to reduce the sensor drift error. A rotation matrix method was also proposed in [44] that obtained good performance in dealing with sensor misplacement. In future works, to improve the data precision of sensor fusion for long-term data collection, we will attempt to fix sensor misplacement and sensor drift through software calibration, as was performed in references [42][43][44]. Furthermore, we will enhance the binding of wireless sensor modules and add a software filter to decrease the effect of sensor misplacement and sensor drift and improve the accuracy of the sensor fusion data.

Conclusion
In this study, a sensor fusion basketball shooting posture recognition system based on a CNN was designed. The system used a sensor fusion framework to collect the shooting posture data of the players' main force hand and main force foot and performed sensor data fusion. Subsequently, a CNNbased deep learning model was used for classification. A total of 12,177 sensor fusion basketball shooting posture data entries of the right hand and left foot were collected using this system for 13 Chinese adult male subjects aged 18-40 years with at least 2 years of basketball experience but without any professional basketball training. The shooting posture data entries were trained and tested using the classic VGG16 deep learning model based on CNN through intraand intertraining/testing methods, achieving satisfactory classification results. These classification results are substantially better than those of similar systems, demonstrating the effectiveness and future development potential of the system.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare no conflict of interest.