E-Sports Training System Based on Intelligent Gesture Recognition

In order to improve the effect of e-sports training, this paper combines the intelligent gesture recognition technology to construct an e-sports training system and judges the training effect of players through the recognition of players' gestures. Moreover, this paper studies the commonly used feature extraction algorithms and proposes an improved SLC-Harris feature extraction algorithm, and the feasibility of this algorithm is verified by the experimental results on the EUROC data set. In addition, this paper uses the KLT optical flow algorithm to track the extracted feature points and calculates the pure visual pose through epipolar geometry, triangulation, and PnP algorithms. The experimental research results show that the electronic economic training system based on intelligent gesture recognition proposed in this paper has certain effects.


Introduction
e reason why e-sports can become a sports competition is that it is closely related to the progress of society, the development of science and technology, and the spiritual and cultural needs of the people. Although there are countless people who enjoy this high-tech intellectual sports event, in fact, public opinion instills the harmful opinion of e-sports in people intentionally or unintentionally. Some media reported extensively that some students were addicted to games and could not extricate themselves, wasting their youth and studies, which made e-sports become "electronic heroin" that everyone shouted. e huge pressure of public opinion makes e-sports face severe survival pressure, and it is difficult for enterprises to enter this market justifiably. Moreover, athletes can only be called "players," and their treatment cannot be compared with that of ordinary athletes. At the same time, the majority of fans can only engage in e-sports secretly. In addition, in the face of huge pressure from public opinion, it is difficult for the government to guide and supervise confidently, and sometimes, it has to prohibit escrow. e ban on television broadcasting of e-sports competitions can be described as a huge obstacle to the normal development of the current social discrimination against the e-sports sports industry.
Generally speaking, the development of e-sports is not yet mature, and the development of e-sports is still in its infancy [1], which is manifested in many aspects: the public recognition is not enough, there are few related large-scale events, there is no professional-scale operation, there is less research in this area, and so on [2]. Especially on college campuses, although students have more time for self-discipline than before, the school does not pay enough attention to e-sports, and there is no relatively formal organization and management of participants, which has led to many human resources problems waste [3].
In order to cater to the trend of e-sports development, vigorously develop e-sports business, improve the overall level of e-sports, and enable e-sports activities to develop well in colleges and universities, the current primary task is to deepen the characteristics of students participating in e-sports activities [4]. Among them, the analysis and research on the current situation, development trend, and participation significance of e-sports in colleges and universities are particularly important in order to discover the problems existing in the development of campus e-sports and put forward reasonable suggestions for the development [5].
As emerging sports, e-sports are mainly participated by the younger generation, which has the characteristics of younger and younger age. E-sports can exercise people's thinking ability, psychological pressure resistance, unity and cooperation, hand-eye coordination, and so on. It can also make the younger generation have the awareness of abiding by the rules in the process of participating in e-sports [6]; trained participants have a fair and open, never admit defeat, pursue a stronger competitive spirit, and have a positive impact on the lives of participants. Many colleges and universities have successively opened related majors in e-sports. Although e-sports is popular in the world, the related research and guiding theories on how to cultivate e-sports talents are rare [7].
Different scholars have different views on the attributes and characteristics of e-sports. Literature [8] proposed that "e-sports include three basic characteristics: one is electronics, the other is competitive sports, and the third is a confrontation between people. At the same time, e-sports sports are divided into virtualized e-sports sports and fictionalized sports." Literature [9] pointed out that "the most fundamental characteristics of video games that distinguish them from other artificial games are: virtual environment, absence of the body and artificial intelligence," emphasizing the main position of electronic communication technology in e-sports. Scholar Yang Fang believes that "e-sports should return to the essence of games, and games to competitive sports are based on the evolution trend of play-gamecompetitive sports" and based on the development process of traditional competitive sports, puts forward a plan for the development of e-sports. Jia Peng and Yao Jiaxin believe that e-sports has great characteristics: the diversity of functional structure requirements, the full expansion of self-awareness, the complexity of sports information pattern recognition, the agility of information processing efficiency, and the accuracy of intuitive thinking and decision-making. Sex analyzes and clarifies the various attributes of e-sports from many aspects [10]. e discussion on the attributes of e-sports is still going on. Based on the current research, it can be determined that the two essential attributes of e-sports are electronic interaction and confrontational competition. Without electronic interaction, it becomes traditional competitive sports; it becomes a video game, so the two are interdependent and indispensable. With the development of electronic interaction technology, various forms of e-sports have emerged [11].
Event services are mainly engaged in e-sports referees, coaches, club operation and management, game commentary, data and tactical analysis, and so on. Practitioners need to have data analysis capabilities, management capabilities, and commentary capabilities. e production and broadcast of the event include content production and external dissemination, mainly involving the design of live content and promotion plans, venue layout, equipment debugging, video data collection, postprocessing, background data analysis, and so on. e practitioners should have journalism, communication, broadcasting, TV technology, and other related professional abilities [12].
Since the e-sports industry is an emerging industry, most employees are not from e-sports majors and have not received a complete and systematic e-sports theoretical education, but nearly 90% of the employees believe that the e-sports industry needs prejob training [13]. Judging from the current situation of the development of the entire industry, it is undoubtedly the most attractive option to work for game manufacturers, but it is difficult for game manufacturers to absorb more human resources without major business adjustments. erefore, the need to train practitioners in support organizations around e-sports events becomes more obvious [14]. For example, training content production capabilities (reporters, screenwriters, copywriters, and anchors) requires a professional background in journalism and communication; training event support capabilities (coaches, data analysts, nutritionists, and brokers) requires sports and information technology. Professional background: training public relations, marketing capabilities (products, business, brand marketing, and media), requires a professional background in marketing and management [15].
E-sports self-media is still a media, and you must have the ability to report news, or you can dig deep into a vertical field, such as specializing in video commentary of games, specializing in game clearance strategies, specializing in sharing game skills, and so on. After all, hot spots can bring traffic. WeMedia is a personalized media with social attributes; it communicates with users; and it has its own distinct character orientation [16]. To be a self-media, you should also have strong analytical skills and be able to interpret a topic or special event from a unique or professional perspective. Current e-sports professional ability training pathways.

Current E-Sports Professional Ability Training Pathways.
Most training institutions in society position themselves as training professional players but basically lack training resources. Training institutions do not have coaches, data analysts, or club managers, and it is difficult for the trained people to find a suitable position in the e-sports circle. Rather than cultivating professional skills, it is better to make money from e-sports hot spots. Money has no intention or inability to contribute to the development of the e-sports industry [17].
At present, the main e-sports talents are cultivated by e-sports companies and e-sports clubs. e club mainly trains professional players, coaches, and data analysts in order to achieve better results in the league. Game companies train referees, game developers, commentators, and other related talents to ensure the healthy development of the e-sports industry [18]. An analysis of the revenue structure of the e-sports industry can help us see the e-sports industry more transparently. e truly profitable institutions are still game manufacturers, which continuously create market value through development and operation. In the context of the continuous development and popularization of the video game industry, competition has become a starting point for expanding influence and creating new 2 Computational Intelligence and Neuroscience commercial value.
e comprehensive development of competitive value is inseparable from the promotion of surrounding formats, and new jobs such as video, live broadcast, and commentary emerge in an endless stream [19].
is paper combines the intelligent gesture recognition technology to construct an e-sports training system and uses the player's gesture recognition to judge the player's training effect to improve the e-sports training effect.

Gesture Intelligent Positioning.
e structural framework of the gesture autonomous localization algorithm is shown in Figure 1.
Monocular visual-inertial odometry uses a pure camera in the front end for motion estimation. e algorithm firstly extracts the features of the image information collected by the camera, then uses the optical flow method to track the feature points, and finally uses PnP (Perspective-n-Point) to perform motion estimation on the tracked feature points. en, the algorithm eliminates the mismatched point pairs through random sampling consistency (RANSAC) and uses nonlinear optimization to optimize the pose. e front-end process is shown in Figure 2.

SLC-Harris Feature Extraction.
e feature is the digital expression of the object in the image, and the image can be quantitatively analyzed by extracting the feature. Commonly used feature extraction methods mainly include SIFT algorithm, SURF algorithm, and ORB algorithm. e traditional Harris algorithm calculates the angular responsivity as shown below. It is mainly based on the weighted summation of the squared and multiplied gradients of all pixels in the window. (1) Among them, there are In formula (1), k is a constant ranging from 0.04 to 0.06, and both λ 1 and λ 2 in formula (2) represent eigenvalues.
For a grayscale image, the value of any point (x, y) in the integral image ii (x, y) refers to the sum of all grayscale values from the upper left corner of the image to the area where this point is located, as shown in Figure 3. e calculation formula of pixels in the rectangular window is as follows: e most complex calculation in the Harris algorithm is the calculation of diagonal responsivity. e original calculation method causes the calculation overlap between each pixel in the integration window, resulting in high computational complexity. For this, the gradient values in g 2 x , g 2 y and g x g y are used to integrate the image to speed up the calculation of the angular responsivity. e calculation formula is as follows: Efficient nonmaximum suppression (E-NMS) is used to efficiently extract unique feature locations for each corner region, and the region thresholds are compared using image patches instead of pixels. e principle is shown in Figure 4.

KLT Optical Flow Tracking.
After the key points are extracted, the optical flow method is used to calculate the minimum photometric error by establishing an error model. is method does not need to calculate descriptors or feature point matching, which will greatly save the amount of calculation.
e basic idea of LK optical flow is to assume that the optical flow in the local neighborhood of a pixel is invariant, and based on this assumption, construct a least-squares problem about the optical flow of the neighborhood pixels.
First, it is assumed that the light intensity of the pixel in each frame of the image is constant. According to this, for the pixel located at (x, y) at time t, moving to (x + dx, y + dy) at time t + dt, there are Computational Intelligence and Neuroscience I(x, y, t) � I(x + dx , y + dy , t + dt). (6) en, according to another basic assumption of LK optical flow, the displacement of pixels in adjacent images is small; the Taylor expansion of formula (6) is Combining the above formulas and dividing by dt into both sides of the formula, we get: where dx/dt represents the motion speed of the pixel on the xaxis, dy/dt represents the motion speed of the pixel on the yaxis, and the two speeds are recorded as u and v, respectively. At the same time, zI/zx represents the gradient value of the image in the x-axis direction at the pixel point; zI/zy represents the gradient value in the y-axis direction at the pixel point; and zI/zt represents the derivative value of the image in the t direction, which are denoted as I x , I y , and I t , respectively. erefore, formula (8) can be written in matrix form as follows: Finally, according to the third basic assumption of the LK optical flow method, adjacent pixels in the same image plane have similar motion; a w × w size window is defined. According to the same motion of all pixels in the window, w 2 formulas can be listed; the overdetermined formulas can be constructed; and the motion parameters of the center point can be obtained by the least square method. Accordingly, its formula can be expressed as follows: Each image frame is downsampled by pyramid layering, and multilevel pyramids are established.  where L represents the Lth layer image. e algorithm calculates the value of the bottom layer from top to bottom according to the Gaussian pyramid and calculates the pixel value near the edge of the image based on the following formulas: 1 (x, 0), e camera motion pose is estimated using SFM in the vision front end. For a monocular camera, the camera pose can be estimated by the geometric relationship between two points in different locations in real space and their projected points on their respective imaging planes. As shown in Figure 5, P is any point in the three-dimensional space, and its coordinates are [X, Y, Z] T ; O 1 and O 2 are the optical centers of the two camera positions. p 1 and p 2 are the projection points of point P on the imaging plane I 1 and the imaging plane I 2 , respectively. According to the pixel positions of the two matched point pairs p 1 and p 2 , the essential matrix E and the fundamental matrix F can be obtained.
According to the camera imaging model, we assume that K is the camera internal parameter matrix, and R and t represent the rotation matrix and translation vector from plane I 1 to plane I 2 , and the following formula can be obtained: Homogeneous coordinate transformation and normalization between 2D and 3D, we can get where x 1 and x 2 represent the coordinates of pixels p 1 and p 2 in the normalized plane, respectively. e algorithm combines formuls (13) and (14) and multiplies by x T 2 t ∧ to obtain the essential matrix E and the fundamental matrix F, which can be sorted out as follows: where t ∧ represents the antisymmetric matrix. When there are more than eight sets of point pairs such as p 1 and p 2 , the eight-point method can be used to construct a linear formula system for the simplified formula, and then the unique solution of R and t can be obtained.
When the monocular camera recovers the pose through the epipolar geometric relationship, the obtained translation is the normalized value, which has no practical significance.
In order to obtain the depth information on feature points, triangulation needs to be introduced. We assume that s 1 and s 2 represent the depth of the two feature points; we can get e feature point depth values x ∧ 2 and x 1 ∧ can be obtained by left-multiplying formula (17) by s 1 or s 2 , respectively, as follows: When the positions of multiple points in space are known, the camera pose can be estimated by the PnP algorithm. Common PnP algorithms include P3P, DLT, and BA optimization. Among them, the P3P algorithm is the most common method. e algorithm needs to know at least three points and their projection points on the camera imaging plane. en, the camera pose can be estimated by solving the relationship between point pairs according to the similar triangle principle and the cosine theorem. A schematic diagram of the P3P relationship is shown in Figure 6.
e coordinate system convention is as follows: the world coordinate system is represented by (·) w , and (·) b and (·) c represent the IMU coordinate system and the camera coordinate system, respectively. e relationship between the coordinate systems is shown in Figure 7. (.) v represents the visual reference frame in the sliding window, which is independent of the IMU measurement and can represent any frame in the visual structure. (.) w b represents the transformation from the IMU coordinate system to the world coordinate system; b k represents the IMU frame of the kth image; (·) v c represents the transformation from the camera coordinate system to the visual reference system; and c k represents the camera frame of the kth image. (·) represents the measured value and parameter estimation value of the sensor; (·) represents the latest scale parameter of the sliding window; and the rotation can be represented by the rotation matrix R and the quaternion q. g w � [0, 0, g] T represents the gravity vector in the world coordinate system, and g v represents the gravity vector in the visual reference coordinate system. Computational Intelligence and Neuroscience

IMU Preintegration.
e sampling frequency of the camera used in this paper is 20 Hz, and the sampling frequency of the IMU is 200 Hz. It can be seen that the frequency of the IMU is much higher than that of the image. In order to avoid the repeated integration phenomenon caused by the frequency change of the visual frame optimization state caused by the high sampling rate of the IMU, a preintegration technique is used for all IMU sampling data between two image key frames. Furthermore, inertial measurements between adjacent image key frames are aggregated into a relative motion constraint through a preintegration technique. e principle of preintegration is shown in Figure 8.
In Figure 8, from top to bottom are the time scale line, the number of image frames generated, the number of image key frames generated, the number of IMU samples, and the IMU preintegration value. e measurement error of the system is mainly affected by bias random walk b and white noise η, and other errors such as the Markov process are ignored. en, the measurement model of the accelerometer and gyroscope in the IMU can be expressed by the following formula: where ω b , a b (t), ω b (t), and a w (t) represent the measured value and real value of angular velocity and acceleration, respectively; b ω , b a , η ω , and η a represent the random walk noise and measurement white noise of angular velocity and acceleration, respectively; and q w T b is the rotation matrix transformed from the world coordinate system to the IMU coordinate system.
White noise obeys a Gaussian distribution, that is, . e derivative of random walk noise also obeys the Gaussian distribution, that is, . e differential kinematic formulas for P, V, Q (representing the position, velocity, and rotation expressed in quaternions, respectively) versus time can be written as follows: where ⊗ represents quaternion multiplication. rough the above derivative relationship, the position, velocity, and rotation at time k + 1 can be obtained from the position, velocity, and rotation at time k and by integrating the measured values of the IMU over time Δt k . e continuous integration formula for PVQ is as follows: where a t and ω t represent the acceleration and angular velocity measured in the IMU coordinate system, respectively. Δt k represents the time difference from the kth frame to the k + 1 frame. R w t represents the rotation matrix from t Image frame Image keyframe IMU frame IMU pre-integration frame  the world coordinate system to the IMU coordinate system. Because the measured a t and ω t belong to the IMU coordinate system, in order to transform the IMU measured value to the world coordinate system, the rotation matrix needs to be left-multiplied. Ω(ω) means quaternion right multiplication; ω x means antisymmetric matrix in quaternion multiplication ω ( means the imaginary part value of quaternion). We assume that the quaternion is q � x y z s � ω s ; then we have By observing the continuous integral formula of PVQ, it can be seen that the current state is recursively obtained from the state of the previous time, and the estimated value is constantly changing. is will cause the IMU measurements to be repropagated, causing the velocity and rotation to be reintegrated after each nonlinear optimization iteration, resulting in a higher computational cost. erefore, the optimization variables are separated from the IMU preintegration terms of the two key frames, and the rotation matrix R b k w of the world coordinate system to the IMU coordinate system can be obtained by simultaneously leftmultiplying the left and right sides of the continuous integration formula of PVQ: (23) e image frames b k and b k+1 of two consecutive moments are given, and the linear acceleration and angular velocity are preintegrated in the local coordinate system b k to obtain represent the relative pose, velocity, and rotation constraints, respectively, and are also the relative motion of b k+1 to b k . It can be seen that they are only related to a t and ω t in b k and b k+1 , and they have nothing to do with the initial position and velocity of coordinate system b k . erefore, the preintegration formula is rediscussed, in terms of α b k b k+1 ; it is related to a t and ω t of the IMU; and a t and ω t are also variables that need to be optimized. When the bias change is small, are adjusted according to their first-order approximations to the bias.
where J α b a and J α b ω are the block matrices in J α b k+1 and J e derivation is based on the derivative of the error term kinetic formula. First, two concepts are introduced: true and nominal, where true represents the real measurement value containing noise and nominal represents the theoretical value without noise, and δ represents the measurement error; there are Among them, there are: Combining the above formulas, we can obtain e derivation of δ _ θ is as follows, and according to the formula in the literature, it can be known that In this paper, according to the noise model and bias, we can get In summary, the derivative of the IMU measurement error term at time t can be as follows: Computational Intelligence and Neuroscience We . e above formula can be simplified to According to the definition of the derivative, the prediction formula of the mean is as follows: According to the error value at the current moment, the mean and covariance at the next moment can be predicted. e prediction formula for covariance is as follows: where P b k t represents the initial value of the iteration and its value is zero and Q represents the diagonal covariance matrix of the noise term as follows: According to formula (35), the iterative formula of the error term Jacobian can be obtained as follows: where the iterative initial value of the Jacobian matrix J t is I.

Sliding Window Initialization.
When the camera extrinsic parameter matrix (p v b k , p v c k ) is known, the pose obtained by the initialization of the monocular camera is transformed from the visual coordinate system to the IMU coordinate system to obtain the following formula: where s is the translation factor obtained by visual initialization, which has no real information.
e pure visual initialization method lacks absolute scale information.
erefore, the value estimated by the visual SFM is aligned with the IMU after preintegration to estimate the true scale. Visual-inertial alignment initialization is mainly to solve the following problems, including the initialization of gyroscope bias, the initialization of velocity, gravitational acceleration, and scale. e first is to initialize the gyroscope. e gyroscope bias can be obtained from two consecutive key frames with known orientations, considering two consecutive frames b k and b k+1 in the sliding window, and q v bk and q v b k+1 represent the rotations obtained from the pure visual sliding window optimization, respectively. Linearize the IMU preintegration term for the gyroscope bias and minimize the following function: Among them, there are: In formula (42), B represents all the frames in the window, and the product of the two quaternions indicates that the camera rotates from the kth frame to the k + 1th frame, and the gyroscope rotates from the k + 1th frame to the kth frame, and the optimized objective function is e algorithm takes c into the formula and multiplies the inverse moment ordering of the relative constraints obtained from the preintegration to the left on both sides of formula (40) and obtains by Cholesky decomposition (multiplying the transpose of J c b w on both sides of the formula): In this way, the initial calibration value of the gyroscope bias b w can be estimated, and then the IMU preintegration e second is the initialization of velocity, gravitational acceleration, and scale. e initialized state vector is as follows: where the state vector v v b k represents the speed of the visual coordinate system of the kth frame image, g v represents the gravity vector in the visual coordinate system, and s is the estimated scale parameter. To sum up, the dimension of χ I is 3(n + 1) + 3 + 1. e constraint relationship between the scale parameter and the speed of the visual SFM is as follows: respectively. e optimization variables of the visual residual are as follows: where λ l represents the inverse depth value when the landmark point 1 is first observed by the j-th image. e inverse depth is used as the optimization variable because the inverse depth satisfies the Gaussian distribution, and it can reduce the parameter variables in the actual Step 2 data preprocessing Step 3 gesture feature extraction   optimization process. According to the above formula, by is paper combines the finger joints and the sensor in the data glove to demarcate the movement of the finger joints.
is paper mainly considers the distal phalanx of the thumb (TDP) and the proximal joint proximal phalanx of the thumb (TPP) of the thumb as shown in Figure 10 and the changes in the intermediate joints middle phalanges (MP) and proximal phalanges (PP) of the remaining four fingers.
is paper combines the algorithm part of the second part to construct the e-sports training system, and the overall framework of the system is shown in Figure 11. e simulation of the system proposed in this paper is carried out through the MATLAB platform, and the gesture recognition effect of the system and the application effect in the e-sports training system are evaluated, and the obtained results are shown in Tables 1 and 2. It can be seen from the above research that the electronic economic training system based on intelligent gesture recognition proposed in this paper has certain effects.

Conclusion
As emerging sports, e-sports are mainly participated by the younger generation, which has the characteristics of younger and younger age. E-sports can exercise people's thinking ability, psychological pressure resistance, unity and cooperation, hand-eye coordination, and so on. Moreover, in the process of participating in e-sports, it can also make the younger generation have the awareness of abiding by the rules, cultivate the participants to have a fair and open, never admit defeat, pursue a stronger competitive spirit, and have a positive impact on the participants' lives. is paper combines the intelligent gesture recognition technology and the construction of the performance e-sports training system and judges the training effect of the players through the player gesture recognition. e research shows that the electronic economic training system based on intelligent gesture recognition proposed in this paper has certain effects.
Data Availability e labeled data set used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.