Performance Optimization Mechanism of Adolescent Physical Training Based on Reinforcement Learning and Markov Model

Upon the teenagers’ failure to obtain the plenty of physical exercises at the growth and development stage, the related central nervous system is prone to degeneration and the physical ﬁtness starts to decline gradually. In fact, through monitoring the exercise process real-timely and quantifying the exercise data, the adolescent physical training can be eﬀectively conducted. For such process, it involves two issues, i.e., the real-time data monitoring and data quantiﬁcation evaluation. Therefore, this paper proposes a novel method based on Reinforcement Learning (RL) and Markov model to monitor and evaluate the physical training eﬀect. Meanwhile, the RL is used to optimize the adaptive bit rate of surveillance video and help the real-time data monitoring; the Markov model is employed to evaluate the health condition on the physical training. Finally, we develop a real-time monitoring system on exercise data and compare with the state-of-the-art mechanisms based on this system platform. The experimental results indicate that the proposed performance optimization mechanism can be more eﬃcient to conduct the physical training. Particularly the average evaluation deviation rate based on Markov model is controlled within 0.16%.


Introduction
e physical fitness of teenagers has attracted the global attention for a long time because it has the considerably important influence on the rise and fall of each country. In fact, the related central nervous system gradually starts to degenerate, and the physical fitness also will decline with it, when the teenagers fail to obtain the plenty of physical exercises at the growth and development stage. Furthermore, according to what one hears and sees as well as the reliable news, the sudden death events among high school and college students happen frequently, which makes people turn their attention to the fitness problem during the process of physical training [1]. To put it crudely, the adolescent physical training is of great importance while the exercise (including public/private, individual/population) must be done properly. e researches show that the generated physical training data not only reflects the real trajectories of exercisers but also implies the abundant and valuable information related to the whole exercise process [2]. To be specific, the information includes time, speed, acceleration, steps, and energy consumption. Among them, the energy consumption is an important metric and it can release two key signals, i.e., the amount of exercise and the exercise intensity. Given this, the amount of exercise and exercise intensity information can be obtained easily via monitoring the consumed energy, and then the physical training can be conducted and adjusted, which can be regarded as the healthy and reasonable exercise. In addition, according to the monitored energy consumption, the unforeseen circumstances due to the overtraining can be discovered in a timely manner, avoiding the worse tragedies as much as possible.
According to the above statements, we can observe that the video surveillance pays the nonnegligible role during the process of monitoring the physical training data. However, the traditional video surveillance shows some limitations. On the one hand, the development of computation speed is unable to keep pace with the increasing of application data; on the other hand, the inherent transmission overhead is very large, and the provided network bandwidth and the transmission of sliced segments are also not always matched [3]. erefore, it is very necessary to optimize the Adaptive Bit Rate (ABR) [4] and guarantee obtaining the real-time monitoring data. e typical ABR algorithm includes caching stage and stable stage [5]. At the first stage, the ABR algorithm usually tends to fill up the cache as quickly as possible; at the second stage, the ABR algorithm usually tries the best to improve the quality of video segment and prevent the cache overflow. At present, there have been some ABR optimization algorithms, including the traditional ones and the Artificial Intelligence-(AI-) based ones. To the best of our knowledge, the traditional ABR optimization algorithms cannot obtain the realtime network status to adapt to the dynamic network environment. On the contrary, the AI-based ABR optimization algorithms can adaptively adjust the network parameters and obtain the relatively optimal video transmission [6]. In terms of AI, the Reinforcement Learning (RL) [7] is the most popular representative. For RL, it can retrieve the demanded data by information exchanging between the intelligent agents and the external environment, without preparing the additional training datasets before training. Compared with the other RL-based ABR optimization algorithms, the Qlearning-based ABR optimization algorithms have better experience quality. However, the current Q-learning-based ABR optimization algorithms fail to encode for the continuous state values and cannot complete the fast convergence in terms of the large state space. As a conclusion, this paper improve the Q-learning to optimize the ABR algorithm.
In addition to the data monitoring, the quantification evaluation for the physical training data is also very significant. Specifically, the physical training features can be extracted by analyzing the monitored exercise data [8]; on this basis, the embedded laws on the physical training can be explored and the corresponding physical fitness evaluation model can be built, such as exercise effect, exercise tolerance, and improvement situation. Based on the evaluation model, the differentiated physical training educations can be effectively developed.
With the above considerations, this paper proposes a novel method based on RL and Markov model to monitor and evaluate the adolescent physical training effect. e major contributions are summarized as follows. (i) e Qlearning-based RL is exploited to optimize the ABR of surveillance video by combining the nearest neighbor algorithm. (ii) e Markov model is used to evaluate the health condition on the physical training by considering the energy consumption metric. (iii) e real-time monitoring system on exercise data is implemented, and the performance optimization effect on the adolescent physical training is demonstrated based on the system platform. e rest of the paper is organized as follows. Section 2 reviews the related work. e improved Q-learning-based ABR optimization is proposed in Section 3. Section 4 gives the physical fitness evaluation model. e experimental results are shown in Section 5. Section 6 concludes this paper.

Related Work
e physical training has always been concerned and some most cutting-edge works have been developed. Buckinx et al. evaluated the effect of citrulline supplementation combined with high-intensity interval training on physical performance in healthy older adults [9]. Ana et al. proposed the multicomponent exercise training method by combining with the nutritional counselling to improve the physical education [10]. Konstantinos et al. presented a study to compare the effectiveness of virtual and physical training for teaching a bimanual assembly task and in a novel approach, where the task complexity was introduced as an indicator of assembly errors during final assembly [11]. Roland Van Den et al. studied the training order of explosive strength and plyometrics training on different physical abilities in adolescent handball players [12]. Rodrigo et al. investigated the plyometric training on the soccer players' physical fitness by considering muscle power, speed, and change-of-direction speed tasks [13]. Simpson et al. enhanced the physical performance in professional rugby league players via the optimised force-velocity training [14]. Unquestionably, the above references showed the professional research on the physical training. In spite of this, they did not provide the networked physical training mode, i.e., regardless of the transmission of data generated from the video surveillance. To this end, Sun and Zou concentrated on the video transmission and improved the performance of extended training by using mobile edge computing [3]. However, [3,[9][10][11][12][13][14] still did not pay attention to the ABR optimization during the transmission process of physical training data. e ABR optimization plays an important role in the networked physical training mode. e traditional optimization algorithms usually are heuristics. Cicco et al. announced two policies to optimize ABR, i.e., gradual increasing, but accelerate decreasing for the bit rate [15]. e heuristic ABR optimization existed the suboptimal problem and the shock problem of transmission quality, which had the considerable influence on the experience quality. As a result, Mok et al. paid attention to the improvement of quality of experience [16], which could guarantee that the transmission quality kept the stable level. Furthermore, the traditional ABR optimization algorithms also had a limitation, i.e., could not build the predictable and describable mathematical models for the concrete problems. For such purpose, some researchers used the control theory to optimize the ABR, where the controller was responsible for handling the input parameters. For example, Xiong et al. proposed the adaptive control model based on fuzzy logic, which could effectively meet the dynamic network change [17]. Besides, Vergados et al. used the fuzzy logic to design the adaptive policy by inputting the varying information on caching [18], preventing the cache overflow. Although [17,18] achieved the good effect on the ABR optimization, they could not obtain the real-time network status to adapt to the dynamic network environment. To this end, some AIbased ABR optimization algorithms were proposed. For example, Chien et al. mapped the feature values related to network bandwidth to the bit rate of video by using the random forest classification decision tree [19]. Basso et al. [20] trained the classification model and estimated the bit rate based on the classification model. In fact, these ABR optimization algorithms like [19,20] needed the ready-made dataset used for training. Instead, the RL-based ABR optimization algorithms could easily obtain the physical training data without preparing the additional datasets. Among them, the Q-learning-based ABR optimization algorithms could obtain the relatively best experience quality [21]. In spite of this, it was very difficult for the current Q-learningbased ABR optimization algorithms to encode for the continuous state values and realize the fast convergence in case of the large state space. e evaluation model building of physical training data is very significant because it can effectively conduct and adjust the physical training. ElSamahy et al. presented a computer-based system for safe physical fitness evaluation for subjects undergoing aerobic physical stress, in which a proportional-integral fuzzy controller was applied to control the applied physical stress to ensure not exceeding the predefined target heart rate to satisfy safety [22]. Zhong and Hu designed a WebGIS-based interactive platform to collect and analyze national physical fitness-related indicators, including realizing the seven functional modules [23]. Heldens et al. studied the care data evaluation model to address the association between performance parameters of physical fitness and postoperative outcomes in patients undergoing colorectal surgery [24]. Ma proposed a kind of multilevel estimation and fuzzy evaluation of physical fitness and health effect of college students in regular institutions of higher learning based on classification and regression tree algorithm [25]. Qu et al. considered the physical fitness evaluation in children with congenital heart diseases versus healthy population [26]. Although the above references built the evaluation models for the physical fitness, they did not address the adolescent physical training. Regarding this, Guo et al. proposed a machine learning-based physical fitness evaluation model oriented to wearable running monitoring for teenagers, in which a variant of the gradient boosting machine combined with advanced feature selection and Bayesian hyperparameter optimization was employed to build a physical fitness evaluation model [27]. In spite of this, [27] did not concentrate the ABR optimization, which cannot complete the optimal performance optimization for the adolescent physical training.

Q-Learning-Based RL for ABR Optimization
e RL-based ABR optimization algorithms show the tradeoff between state space division and convergence speed. To be specific, if the state space is divided in the more finegrained way, the more adequate states are generated and further the system behaviors can be more precisely described, while this causes the slow convergence problem. On the contrary, if the division granularity is relatively large, the number of states becomes small with it, while the convergence speed can be accelerated. Besides, these states in the ABR optimization problem are usually continuous, and the current Q-learning-based ABR optimization algorithms only make the simple discrete processing for these states. erefore, this section plans to combine the nearest neighbor algorithm to address the abovementioned problems.

ABR Decision Model.
Suppose that each code rate involves N video segments, denoted by seg 1 , seg 2 , . . . , seg N , respectively. e client can select the corresponding segment from some code rate according to the network status information, such as network bandwidth, caching condition, and so on. In fact, the video segment selection can be regarded as the sequential decision process, and the decision objective is to guarantee the stable video display with the high code rate on the condition where the network bandwidth keeps the dynamic change. Given this, this paper assumes that there is an intelligent agent to determine how to download the video segment. Mathematically, for any seg i , we can observe the information like network bandwidth (denoted by BD i ), caching state (denoted by CH i ), and the previous segment's quality (denoted by q i−1 ), and the corresponding environment state is defined as e intelligent agent selects the specific video segment from different code rates according to the perceived information. For the selection behavior regarding seg i , it can be called as an intelligent action, denoted by a i . After a i is completed, CH i will be updated by referring to BD i and the selected code rate (denoted by br(a i )). Let T i denote the duration time of seg i , and the corresponding download time is defined as follows: Furthermore, for seg i+1 , its corresponding cache is defined as follows: which suggests that the segment lag time equals the caching consumption time. To the best of our knowledge, the strategy conducted by the intelligent agent usually is not optimal. us, we give the reward function (denoted by R i ) to conduct the intelligent agent reaching the optimal level. In spite of this, the design of reward function should follow the requirements of service quality, which mainly includes the following three factors: the quality of video segment, the quality varying value between two frames, and the risk coefficient with respect to the cache overflow. erein, the higher evaluation on service quality comes from the higher first factor and the smaller second/third factor. With the above three factors consideration, R i is defined as follows: where c is a regulatory parameter to adjust the difference between q i and q i+1 ; φ i is used to measure the risk coefficient with respect to the cache overflow, which has two functions: guaranteeing that the caching keeps the safe level and avoiding such actions that cause the low caching since they trigger the duplicate caching events easily. As a result, φ i is defined as follows: where the two parts of the right-hand side of equation (4) represent safe caching level and duplicate caching event, respectively; α and β are two regulatory parameters to, respectively, determine whether safe caching level and duplicate caching event are considered.
In fact, the ABR optimization problem based on the sequential segment selections by the intelligent agent can be built as the Markov decision model [28], which is expressed by four attributes, i.e., state space, action space, conditional transition probability, and instant reward function, denoted by S, A, P, and R, respectively. As mentioned earlier, the state space includes network bandwidth, caching state, and the previous segment's quality. e action space is the set of all available code rates. Particularly, the strategy conducted by the intelligent agent is the mapping relationship from state space to action space, i.e., F: S ⟶ A. Furthermore, the corresponding s i will change with it and be converted into s i+1 when a i is finished, and the situation is called the conditional transition probability, denoted by P(s i+1 |s i , a i ). Moreover, the intelligent agent's decision objective is to obtain the optimal long-term benefits according to some strategy (F), and the long-term benefits function is defined as follows: where s 0 is the initial state; λ is the discount parameter and is between 0 and 1. When λ � 0, it means the action only pays attention to the current benefits irrespective of the expected benefits. With the increasing of λ, the action starts to pay attention to the expected benefits. Suppose that F * is the optimal strategy while guaranteeing that the long-term benefits are the maximal, and we have

K-Nearest Neighbor Proposal.
In order to obtain the optimal strategy of formula (6), the long-term benefits function is modified with the recursive form, as follows As we know, formula (7) can be solved by the dynamic linear programming algorithm to obtain the optimal strategy. However, the dynamic linear programming has high computation complexity, and thus Kröse [29] uses the Q-learning method to obtain the optimal strategy with the relatively low computation complexity. In terms of Qlearning, it maintains one Q-table which includes the entries on mapping from the state to the action. As above mentioned, the Q-learning has two limitations; thus, this paper prepares to use K-nearest neighbor algorithm to optimize it. e K-nearest neighbor algorithm is first proposed by Cover and Hart [30] in 1968, and it belongs to the instancebased learning method. In the K-nearest neighbor algorithm, the Euclidean distance between two samples is usually used to measure the similarity, where the larger Euclidean distance means the lower similarity. For example, suppose that X � x 1 , x 2 , . . . , x m and Y � y 1 , y 2 , . . . , y m are two data samples in terms of m-dimensional space, and the Euclidean distance between X and Y is defined as

ABR Optimization.
We employ the Q-learning based on K-nearest neighbor algorithm to optimize the ABR, and the corresponding state division is shown in Figure 1. We can observe that when and only when the middle value of each interval is determined, the state is found. Under such condition, the state's Q-value can be obtained by referring to the neighbouring Q-value(s). Suppose that d i is the Euclidean distance between two states, and it is defined as follows: where d BD i , d CH i , and d q i are the Euclidean distances regarding two network bandwidths, two cache states, and two previous segment's qualities respectively, and their computations are similar to formula (8). After obtaining these Euclidean distances between s i and the state in each Q-table, the first K-nearest distances are selected and their corresponding Q-values are used to compute the array of Q-values for s i , defined as follows: here, s i ′ is the neighbouring state with s i ; w i is the Q-value proportion of s i ′ . If the corresponding d i is larger, w i is set to be smaller. Given an intermediate variable ρ i � 1/d i , w i is defined as follows: Furthermore, after the action is finished, it requires to update the Q-table according to the returned instant reward and the new state. If the current state is found in the state division table, i.e., s i ∈ S, the updating of Q-value is defined as follows: On the contrary, if the current state cannot be found in the state division table, i.e., s i ∉ S, the updating of Q-value is defined as follows: where Υ is an intermediate variable; ξ is a regularity parameter; s i ″ is the next K state after s i ′ . According to the above statements, the pseudocode of ABR optimization based on Q-learning with consideration of K-nearest neighbor algorithm is described in Algorithm 1.

Physical Fitness Evaluation
e quantification evaluation for the physical training data is also very significant. In fact, the physical training process is intricate and has the strong randomness, which leads to the difficulty of quantification evaluation. e traditional physical evaluation models (e.g., [22][23][24][25][26]) usually consider the relatively simple factors with the subjectivity, which have a few limitations. us, it is required to find a proper model to evaluate the physical training.

ought Incubation.
e physical training process has the feature of randomness, that is, the subsequent exercise state only depends on the current exercise state and has no connection with the history exercise states. is conforms to the Markov process; thus, this paper uses the Markov model to simulate the physical training process and further present the related evaluation models, including individual exercise modelling and population exercise modelling.
e main ideas are summarized as follows: (i) Individual Exercise Modelling. (i) e generated physical training data is given a rank; (ii) the matrix of transition probability is obtained by the varying data rank; (iii) the vector of stability probability is computed by referring to the stability of Markov process, to predict the stable state; and (iv) the subsequent physical training is conducted based on the exercise limit.
(ii) Population Exercise Modelling. e first two steps are similar to those in the individual exercise modelling. e third step is to compare the generated population data and adjust the improvement degree to adapt to the whole physical training effect.
Furthermore, the generated energy during the physical training process usually reflects the situation of physical fitness; thus, this paper regards the energy consumption as the evaluation metric. Particularly for the adolescent physical training, the consumed energy gradually increases at the initial stage. After reaching the relatively stable level, it begins to decrease rapidly until the teenagers lose their strength. In terms of the individual exercise modelling, we adopt the energy consumption rate as the evaluation metric, which is defined as follows: Among them, t ∈ [1, t max ] is the time period used for data collection and t max is the maximal number of periods; EE t is the consumed energy of the tth data collection; and ET t is the spent time to complete EE t . In terms of the population exercise modelling, we adopt the improvement degree with respect to the energy consumption transition as the evaluation metric, which is defined as follows: Among them, ep ij is the transition probability; ed ij is the energy consumption span difference, and it can be computed by formula (8); and c is the regularity parameter.

Individual Modelling.
Based on the Markov model and the ABR optimization algorithm (see Section 3), the consumed energy data can be monitored and computed easily. After that, the sequence on energy consumption rates can be obtained by formula (15). For the sequence, the individual evaluation model based on the Markov process is described as follows: e maximal value and the minimal value in the sequence are found, denoted by cr max and cr min , respectively. Suppose that there are θ divided state intervals, and the length of interval is defined as On this basis, the divided intervals are (iii) Stable-State Vector Determination. When t max energy consumption rates are completed, we give an initial state vector denoted by eS t max to satisfy According to the stability of Markov chain, we can obtain a state vector eS * � es 1 , es 2 , . . . , es θ to satisfy eS * � eS t max eP, where eS * is called the stablestate vector. (iv) Limited Energy Consumption Rate Computation. For θ state intervals, their maximal values are selected, where er max i denotes the maximal value of the ith interval, and the limited energy consumption rate is defined as To sum up, if cr i is larger than cr lim , it means that the current physical training is dangerous and the system will notify the teenagers to slow down the physical training.

Population
Modelling. Suppose that one population includes Nump teenager individuals, and the average energy consumption for the population is defined as follows: where Ei i is the consumed energy by the arbitrary teenager individual in the population. In a similar way, the sequence on the average energy consumptions can be obtained by formula (20), denoted by EP 1 ave , EP 2 ave , . . . , EP t max ave ; that is, there are t max collection periods. For the sequence, the population evaluation model based on the Markov process is described as follows: In total, if eK ij is larger than eK i j/(θ − 1), it means that the physical training effect has been improved and the system will notify the population to enhance the physical training.

Simulation Results
In this section, we pay attention to the simulation experiments. At first, we develop the data monitoring system. en, we test the physical training evaluation models. Finally, the whole performance optimization on the adolescent physical training is verified. Meanwhile, the last two parts are based on the developed system platform. In particular, regarding the simulation settings, we make the different simulations and find one proper combination.

System Implementation.
e real-time data monitoring system depends on computer technology, communication technology, and sports science, which provides the real-time exercise monitoring services according to collecting the data information with respect to the physical training. In terms of the adolescent physical training, the data monitoring system platform architecture is shown in Figure 2. We observe that the system platform includes four modules, i.e., data collection, data receiving and data sending, data analysis and handling, and data display. Among them, the last module Input: State space and action space Output: Q-value (1) Initialize Q-table; (2) for each state, do Compute Q(s i , : ) with formula (10); (4) Confirm br(a i ) by Q(s i , : ); (5) Request to download seg i ; (6) Update CH i with formula (2); (7) Compute R(i) with formula (3); (8) if s i ∈ S, then (9) Update Q-value with formula (12); (10) else (11) Update Q-value with formula (13); (12) endfor ALGORITHM 1: ABR optimization algorithm. 6 Mobile Information Systems can provide the reference for the adolescent physical training directly according to the monitored data.

Model Evaluation.
is section will evaluate two models, i.e., individual evaluation model and population evaluation model. e involved parameters are set as follows: c � 3, t max � 300, θ � 24, and time period for 30 s. In addition, we use the deviation rate to measure whether the evaluation models can be acceptable. For the individual evaluation model, we test 1000 teenagers for 12 times experiments, where the frequency is once a day. e experimental results on the conducted physical training conditions are shown in Table 1. For the population evaluation model, we also test 1000 populations for 12 times experiments, where one population includes 20 teenagers and the frequency is once a day. e corresponding results on the population physical training conditions are shown in Table 2. Among them, the deviation rate is defined as the ratio of the number of improper conducts and the total number of experiments.
As seen from Tables 1 and 2, the deviation rate for each group experiment is always smaller than 0.3%. To be specific, the average deviation rate in terms of the individual evaluation model is 0.158% and that for the population evaluation model is 0.092%, and the two values can be controlled within the 0.16%, which implies that it is efficient to use the Markov model to evaluate the adolescent physical training.
Furthermore, it implies that the Markov model has better evaluation effect in terms of the population physical training situation because 0.158% < 0.092%.

Performance Verification.
is section will verify the optimization preformation of adolescent physical training by comparing with two benchmarks, i.e., [3,27] published by Internet Technology Letters (ITL) and Computer Networks (CN), respectively. Meanwhile, the whole transmission time and packet loss rate are adopted as two performance verification metrics. e involved parameters are set as follows: c � 0.45, α � 0.6, β � 0.4, λ � 0.9, K � 6, η � 0.35, and ξ � 0.4. In addition, the number of simulations is set as 10. e experimental results on the whole transmission time and packet loss rate are shown in Tables 3 and 4, respectively.
It can be seen from Tables 3 and 4 that this paper always consumes the smallest whole transmission time and the lowest packet loss rate. is implies that this paper has the optimal optimization performance for the adolescent physical training. is is because this paper uses RL to obtain the relatively optimal solution and uses the Markov model to obtain the relatively accurate training effect. In addition, regarding the two metrics, we show the corresponding dispersion coefficients to evaluate the stability, as shown in Figure 3. We observe that this paper always has the smallest dispersion coefficient due to the stability guarantee of using  Mobile Information Systems RL, which implies that the performance mechanism is the most stable.

Conclusions
e physical fitness of teenagers has attracted the global attention for a long time because it has a considerably important influence on the rise and fall of each country. is paper proposes to optimize the adolescent physical training based on RL and Markov model. Because the RL-based ABR optimization algorithms shows the tradeoff between state space division and convergence speed, this paper improves the Q-learning by using the K-nearest neighbor algorithm. In addition, we also present the evaluation models on the physical fitness, including individual exercise modelling and population exercise modelling, based on the Markov model. Moreover, we make the simulation experiments based on the developed data monitoring system platform, and the results have demonstrated that this paper has always the optimal optimization performance for the adolescent physical training with the most stable state. In the future, we will deploy more functions in our system platform, such as adaptive recognition and falling warning. Besides, we also make the large-scale experiments based on the real testbed instead of the system platform.