Predictive Modeling in Race Walking

This paper presents the use of linear and nonlinear multivariable models as tools to support training process of race walkers. These models are calculated using data collected from race walkers' training events and they are used to predict the result over a 3 km race based on training loads. The material consists of 122 training plans for 21 athletes. In order to choose the best model leave-one-out cross-validation method is used. The main contribution of the paper is to propose the nonlinear modifications for linear models in order to achieve smaller prediction error. It is shown that the best model is a modified LASSO regression with quadratic terms in the nonlinear part. This model has the smallest prediction error and simplified structure by eliminating some of the predictors.


Introduction
The level of today's high-performance sport is very high and very even. Both coaches and competitors are forced to search for and use newer and sometimes innovative solutions in the process of sports training [1]. A solution supporting this process may be the application of various types of regression models.
Prediction in sport concerns many aspects including the prediction of performance results [2,3] or predicting sporting talent [4,5]. Models predicting results in sport, taking into account the seasonal statistics of each team, were also constructed [6]. The application of predictive models in athletics was described by Maszczyk et al. [2], where the regression was used to predict results in a javelin throw. These models were applied to support the choice and selection of prospective javelin throwers.
Prediction of sports results using linear regression was also presented in the work by Przednowek and Wiktorowicz [7]. A linear predictive model, implemented by ridge regression, was applied to predict the outcomes of a walking race after the immediate preparation phase. As input for the model, the basic somatic features (height and weight) and training loads (training components) for each day of training were provided, and the output was the result expected over a distance of 5 km. In addition to linear models, artificial neural networks, whose parameters were specified in crossvalidation, were also used to implement this task.
In the paper by Drake and James [8], the regressions estimating the results over distances of 5, 10, 20, and 50 km and the levels of the selected physiological parameters (e.g., VO 2 max) were presented. The regressions applied were the classical linear models, and the 2 criterion was chosen for the quality evaluation. This study included 23 women and 45 men. The amount of collected data was different depending on the task and ranged from 21 to 68 records.
A nonlinear regression equation to predict the maximum aerobic capacity of footballers was proposed by Chatterjee et al. [9]. The data came from 35 young players aged from 14 to 16. The experiment was to verify the use of the 20 m MST (Multistage Shuttle Run Test) in evaluating the performance of VO 2 max. The talent of young hockey players was identified by Roczniok et al. [5] using a regression equation. The research involved 60 boys aged between 15 and 16, who attended selection camps. The applied regression model classified candidates for future training, based on selected parameters of the players. The logistic regression was used in the model as the classification method.

Computational Intelligence and Neuroscience
The nonlinear predictive models used in sport are also based on the selected methods of "data mining" [10]. Among them, an important role is played by fuzzy logic expert systems. Papić et al. [4] described practical application of such a system. The proposed system was based on knowledge of experts in the field of sport, as well as the data obtained as a result of motor tests. The model suggested the most suitable sport and it was designed to search for prospective sports talents.
The application of fuzzy modeling techniques in sports prediction was also presented by Mężyk and Unold [11]. The goal of their paper was to find the rules that can express swimmer's feelings the day after in-water training. The data was collected for two months among competitors practicing swimming. The swimmers were characterized by a good level of sports attainment (2nd sport class). The material obtained consisted of 12 attributes, and the total number of models was 480, out of which 136 were used in the final stage. The authors proved that their method was characterized by better predictive ability than the traditional methods of classification.
Other papers also concern the use of artificial neural networks in sports prediction [6]. Neural models are used to analyze the effectiveness of the training of swimmers, to identify handball players' tactics, or to predict sporting talent [12]. Many studies present the application of neural networks in various aspects of sports training [13][14][15]. These models support the planning of training loads, practice control, or the selection of sports.
An approach developed by the authors is the construction of models performing the task of predicting the results achieved by a competitor in the proposed sports training. This allows for the proper selection of training components and thus supports the achievement of the desired result. The aim of this study is to determine the effectiveness of selected linear and nonlinear models in predicting the outcome in a 3-kilometer walking race for the proposed training. The research hypothesis of the paper is stated as follows: the prediction error of 3 kilometers' result in race walking for nonlinear models can be smaller than for linear models.
The paper is organized as follows. In Section 2, the training data of the race walkers recorded during annual training cycle is described. Section 3 contains the methods used to build the linear and nonlinear predictive models, including ordinary least squares regression, regularized methods, that is, ridge, LASSO, and elastic net regressions, nonlinear least squares regression, and artificial neural networks as multilayer perceptron and radial basis function network. In Section 3, the criterion used to evaluate the performance of the models, calculated using mean square error in the process of cross-validation, is also defined. Section 4 describes the procedures used for building models and their evaluation in language and STATISTICA software. The obtained results are analyzed and discussed in Section 5. Finally, in Section 6, the performed work is concluded.

Material
The predictive models were built using the training data of athletes practising race walking. The analysis involved a group of colts and juniors from Poland. Among the competitors were the finalists in the Polish Junior Indoor Championships and the Polish Junior Championships. The data of race walkers was recorded during the 2011-2012 season in the form of training means and training loads. The training mean is the type of work performed while the training load is the amount of work at a particular intensity done by an athlete during exercise [1]. In the material, which has been collected, 11 means of training were distinguished. The material was drawn from the annual training cycle for the following four phases: transition, general preparation, special preparation, and starting phase. The training data has the form of sums of training loads completed in one month of the chosen training phase. The material included 122 training patterns made by 21 race walkers.
Control of the training process in race walking requires different tests of physical fitness at every training level. Because this research concerns the competitors in colt and junior categories, thus in order to determine a unified criterion of the level of training, a result for 3000 m race walking was used. The choice of the distance of 3000 m is valid because this is the indoor walking competition.
The description of the variables under consideration and their basic statistics are presented in Table 1. The variables are as follows: arithmetic mean of , minimum value min , maximum value max , standard deviation SD, and coefficient of variation = SD/ ⋅ 100%. The qualitative variables are 1 , 2 , 3 , 4 , which take their values from the set {0, 1}. The other variables, that is, 5 , . . . , 18 , are quantitative variables. If the value at inputs 1 , 2 , 3 is 0, it means that the transitional period is considered. Setting the value 1 on one of the inputs 1 , 2 , 3 , it means the training period is selected. The variable 4 represents the gender of the competitor, where the value 0 denotes a female, while the value 1 denotes a male, and the age is represented by 5 . Basic somatic features of race walkers such as weight and height are presented in the form of BMI ( 6 ) expressed by the formula where is the body weight [kg] and is the body height [m]. The variable 7 denotes the current result over 3 km in seconds. Training loads are characterized by the following variables: running exercises ( 8 ), walking with different levels of intensity ( 9 , 10 , 11 ), exercises forming different types of endurance ( 12 , 13 , 14 ), exercises forming techniques ( 15 ), exercises forming muscle strength ( 16 ), exercises forming general fitness ( 17 ), and warming up exercises ( 18 ).
An example of data used for building the model has the form

General preparation
Competitor's gender ----- The vector x 5 represents a 23-year-old race walker with BMI = 22.09 kg/m 2 , who completes training in the special preparation phase. The result both before and after the training was the same and is equal to 800 s.

Methods
In this study, two approaches were considered. The first approach was based on white box models realized by modern regularized methods. These models are interpretable because their structure and parameters are known. The second approach was based on black box models realized by artificial neural networks. Figure 1: A diagram of a system with multiple inputs and one output.

Constructing Regression Models.
Consider a multivariable regression model with the inputs (predictors or regressors) , = 1, . . . , , and one output (response) shown in Figure 1. We assume that the model is linear and has the form wherêis the estimated response and 0 , are unknown weights of the model. The weight 0 is called constant term or intercept. Furthermore, we assume that the data is standardized and centered and the model can be simplified to the form (see, e.g., [16]) Observations are written as pairs (x , ), where x = [ 1 , . . . , ], = 1, . . . , , is the value of the th predictor in the th observation, and is the value of the response in the th observation. Based on formula (4), the th observation can be expressed aŝ= formula (5) can be written aŝ whereŷ = [̂1, . . . ,̂] .

Computational Intelligence and Neuroscience
In order to construct regression models, an error (residual) is introduced as the difference between the real value and the estimated valuêin the form of Using matrix (6), the error can be written as where e = [ 1 , . . . , ] and y = [ 1 , . . . , ] .
Denoting by (w, ⋅) the cost function, the problem of finding the optimal estimator can be formulated as to minimize the function (w, ⋅), which means solving the problem whereŵ is the vector of solutions. Depending on the function (w, ⋅), different regression models can be obtained. In this paper, the following models are considered: ordinary least squares regression (OLS), ridge regression, LASSO (least absolute shrinkage and selection operator), elastic net regression (ENET), and nonlinear least squares regression (NLS).

Linear Regressions.
In OLS regression (see, e.g., [16][17][18]) the model is calculated by minimizing the sum of squared errors: where ‖⋅‖ 2 denotes the Euclidean norm ( 2 ). Minimizing the cost function (11), which is the quadratic function of w, we get the following solution: It should be noted that solution (12) does not exist if the matrix X X is singular (due to correlated predictors or if > ). In this case, different methods of regularization, including the previously mentioned ridge, LASSO, and elastic net regressions, can be used.
In ridge regression by Hoerl and Kennard [19], the cost function includes a penalty and has the form The parameter ≥ 0 determines the size of the penalty: for > 0, the model is penalized, for = 0, ridge regression reduces to OLS regression. Solving problem (10) for ridge regression, we getŵ where I is the identity matrix with the size of × . Because the diagonal of the matrix X X is increased by a positive constant, the matrix X X + I is invertible and the problem becomes nonsingular. In LASSO regression by Tibshirani [20], similarly to ridge regression, the penalty is added to the cost function, where the 1 -norm (the sum of absolute values) is used: where z = [ 1 , . . . , ] , = sgn( ), and ‖ ⋅ ‖ 1 denotes the Manhattan norm ( 1 ). Because problem (10) is not linear in relation to y (due to the use of 1 -norm), the solution cannot be obtained in the compact form as in ridge regression. The most popular algorithm used in this case is the LARS algorithm (least angle regression) by Efron et al. [21].
In elastic net regression by Zou and Hastie [22], the features of ridge and LASSO regressions are combined. The cost function in the so-called naive elastic net has the form of (w, 1 , 2 ) = e e + 1 z w + 2 w w = (y − Xw) (y − Xw) + 1 z w To solve the problem, Zou and Hastie [22] proposed the LARS-EN algorithm, which was based on the LARS algorithm developed for LASSO regression. They used the fact that elastic net regression reduces to LASSO regression for the augmented data set (X * , y * ).

Nonlinear Regressions.
To take into account the nonlinearity in the models, we can apply the transformation of predictors or use nonlinear regression. In this paper, the latter solution is applied.
In OLS regression, the model is described by formula (5), while in more general nonlinear regression the relationship between the output and the predictors is expressed by a certain nonlinear function (⋅) in the form of In this case, the cost function (w) is formulated as Computational Intelligence and Neuroscience 5 Since the minimization of function (18) is associated with solving nonlinear equations, numerical optimization is used in this case. The main problem connected with the construction of nonlinear models is the choice of the appropriate function (⋅).

Artificial Neural Networks.
Artificial neural networks (ANNs) were also used for building predictive models. Two types of ANNs were implemented: a multilayer perceptron (MLP) and networks with radial basis function (RBF) [18]. The MLP network is the most common type of neural models. The calculation of the output in 3-layer multipleinput-one-output network is performed in feed-forward architecture. In the first step, linear combinations, or the so-called activations, of the input variables are constructed as where = 1, . . . , and (1) denotes the weights for the first layer. From the activations , using a nonlinear activation function ℎ(⋅), hidden variables are calculated as The function ℎ(⋅) is usually chosen as logistic or "tanh" function. The hidden variables are used next to calculate the output activation where (2) are weights for the second layer. Finally, the output of the network is calculated using an activation function (⋅) in the form of = ( ) .
For regression problems, the function (⋅) is chosen as identity function, and so we obtain = . The MLP network utilizes iterative supervised learning known as error backpropagation for training the weights. This method is based on gradient descent applied to the sum of squares function. To avoid the problem with overtraining the network, the number of hidden neurons, which is a free parameter, should be determined to give the best predictive performance.
In the RBF network, the concept of radial basis function is used. Linear regression (5) is extended by linear combinations of nonlinear functions of the inputs in the form of where = [ 1 , . . . , ] is a vector of basis functions. Using nonlinear basis functions, we get a nonlinear model, which is, however, a linear function of parameters . In the RBF network, the hidden neurons perform a radial basis function whose value depends on the distance from selected center : where c = [ 1 , . . . , ] and ‖ ⋅ ‖ is usually the Euclidean norm. There are many possible choices for the basis functions, but the most popular is Gaussian function. It is known that RBF network can exactly interpolate any continuous function; that is, the function passes exactly through every data point. In this case, the number of hidden neurons is equal to the number of observations and the values of coefficients are found by simple standard inversion technique. Such a network matches the data exactly, but it has poor predictive ability because the network is overtrained.

Choosing the Model.
In this paper, the best predictive model is chosen using leave-one-out cross-validation (LOOCV) method [23], in which the number of tests is equal to the number of data and one pair (x , ) creates a testing set. The quality of the model is evaluated by means of the square root of the mean square error (RMSE CV ) defined as wherê− denotes the output of the model built in the th step of validation process using a data set containing no testing pair (x , ) and MSE CV is the mean square error. In order to describe the measure to which the model fits the training data, the root mean square error of training (RMSE T ) is considered. This error is defined as wherêdenotes the output of the model built in the th step using the full data set and MSE T is the mean square error of training.

Implementation of the Predictive Models
All the regression models were calculated using language with additional packages [24]. The lm.ridge function from "MASS" package [25] was used for calculating OLS regression (where = 0) and ridge regression (where > 0). With the function enet included in the package "elastic net" [26], LASSO regression and elastic net regression were calculated. The parameters of the enet function are ∈ [0, 1] and ≥ 0, where is a fraction of the 1 norm, whereas denotes 2 in formula (16). The parameterization of elastic net regression using the pair ( , ) instead of ( 1 , 2 ) in formula (16) is possible because elastic net regression can be treated as LASSO regression for an augmented data set (X * , y * ) [22]. Assuming that = 0, we get LASSO regression with one parameter for the original data (X, y).
All the nonlinear regression models were calculated using the nls function coming from the "stats" package [27]. It 6 Computational Intelligence and Neuroscience calculates the parameters of the model using the nonlinear least squares method. One of the parameters of the nls function is a formula that specifies the function (⋅) in model (18). To calculate the weights, Gauss-Newton's algorithm was used which was selected by default in the nls function. In all the calculations, it was assumed that the initial values of the weights are equal to zero.
For the implementation of artificial neural networks, StatSoft STATISTICA [28] software was used. The learning of MLP networks was implemented using the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm [18]. While calculating the RBF network, the parameters of the basis functions were automatically set by the learning procedure.
The parameters in all models were selected using leaveone-out cross-validation. In the case of regularized regressions, the penalty coefficients were calculated, while, in the case of neural networks, the number of neurons in the hidden layer was calculated. The primary performance criterion of the model was RMSE CV error. Cross-validation functions in the STATISTICA program were implemented using Visual Basic language.

Results and Discussion
From a coach's point of view, the prediction of results is very important in the process of sport training. A coach using the model, which was constructed earlier, can predict how the training loads will influence the sport outcome. The presented models can be used for predictions based on the proposed monthly training introduced as the sum of the training loads of each type implemented in a given month.
The results of the research are presented in Table 2; the description of the selected regressions will be presented in the next paragraphs. Linear models such as OLS, ridge, and LASSO regressions have been calculated by the authors in work [3]. They will be briefly described here. The nonlinear models implemented using nonlinear regression and artificial neural networks will be discussed in greater detail. All the methods will be compared taking into account the accuracy of the prediction.  Table 2). In the second column of Table 2, the weights 0 and are presented.
The search for the ridge regression model is based on finding the parameter , for which the model achieves the smallest prediction error. In this paper, ridge regression models for changing from 0 to 2 with step of 0.1 were analyzed. Based on the results, it was found that the best ridge model is achieved for opt = 1. The prediction error RMSE CV = 26.76 s was smaller than in the OLS model, while the training error RMSE T = 22.82 s was greater ( Table 2). The obtained ridge regression improved the predictive ability of the model. It is seen from Table 2 that as in the case of OLS regression, all weights are nonzero and all the input variables are used in computing the output of the model. The LASSO regression model was calculated using the LARS-EN algorithm, in which the penalty is associated with the parameter changing from 0 to 1 with step of 0.01. It was found that the optimal LASSO regression is calculated for opt = 0.78. The best LASSO model generates the error RMSE CV = 26.20 s, which improves the results of OLS and ridge models. However, it should be noted that this model is characterized by the worst data fit with the greatest training error RMSE T = 22.89 s. The LASSO method is also used for calculating an optimal set of input variables. It can be seen in the fourth column of Table 2 that the LASSO regression eliminated the five input variables ( 4 , 6 , 14 , 15 , and 17 ), which made the model simpler than for OLS and ridge regression.
The use of elastic net regression model has not improved the value of the prediction error. The best model was obtained for a pair of parameters opt = 0.78 and opt = 0. Because the parameter is zero, the model is identical to the LASSO regression (fourth column of Table 2).

Nonlinear Regressions.
Nonlinear regression models were obtained using various functions (⋅) in formula (18). It was assumed that the function (⋅) consists of two components: the linear part, in which the weights are calculated as in OLS regression, and the nonlinear part containing expressions of higher orders in the form of a quadratic function of selected predictors:  (Table 3), wherein each of the following models does not take into account the squares of qualitative variables 1 , 2 , 3 , and 4 (V 1 = V 2 = V 3 = V 4 = 0): (i) NLS1: both the weights of the linear part and the weights V 5 , . . . , V 18 of the nonlinear part are calculated.
(ii) NLS2: the weights of the linear part are constant, and their values come from the OLS regression (the second column of Table 2); the weights V 5 , . . . , V 18 of the nonlinear part are calculated (the third column of Table 3).
(iii) NLS3: the weights of the linear part are constant, and their values come from the ridge regression (the third column of Table 2); the weights V 5 , . . . , V 18 of the nonlinear part are calculated (the fourth column of Table 3).
(iv) NLS4: the weights of the linear part are constant, and their values come from the LASSO regression (the fourth column of Table 2); the weights V 5 , V 7 , . . . , V 13 , V 16 , V 18 of the nonlinear part are calculated (the fifth column of Table 3).
Based on the results shown in Table 3, the best nonlinear regression model is the NLS4 model, that is, the modified LASSO regression. This model is characterized by the smallest prediction error and the reduced number of predictors.

Neural Networks.
In order to select the best structure of a neural network, the number of neurons ∈ [1, 10] in the hidden layer was analyzed. In Figures 2, 3, and 4, the relationships between cross-validation error and the number of hidden neurons are presented. The smallest crossvalidation errors for the MLP(tanh) and MLP(exp) networks were obtained for one hidden neuron (18-1-1 architecture) and they were, respectively, 29.89 s and 30.02 s (Table 4). For the RBF network, the best architecture was the one with four neurons in the hidden layer (18-4-1) and cross-validation error in this case was 55.71 s. Comparing the results, it is seen that the best model is the MLP(tanh) network with the 18-1-1 architecture. However, it is worse than the best regression model NLS4 (Table 3) by more than 5 seconds.

Conclusions
This paper presents linear and nonlinear models used to predict sports results for race walkers. Introducing a monthly training schedule for a selected phase in the annual cycle, a decline in physical performance may be predicted based on the generated results. Thanks to that, it is possible to take into account earlier changes in the scheduled training. The novelty of this research is the use of nonlinear models, including modifications of linear regressions and artificial neural networks, in order to reduce the prediction error generated by linear models. The best model was the nonlinear modification of LASSO regression for which the error was 24.6 seconds. In addition, the method has simplified the structure of the model by eliminating 9 out of 32 predictors. The research hypothesis was confirmed. Comparing with other results is difficult because there is a lack of publications concerning predictive models in race walking. Experts in the fields of sports theory and training were consulted during the construction of the models in order to maintain the theoretical and practical principles of sport training. The importance of the work is that practitioners (coaches) can use predictive models for planning of training loads in race walking.