Multilayer perceptron (MLP) with back-propagation learning rule is adopted to predict the winning rates of two teams according to their official statistical data of 2006 World Cup Football Game at the previous stages. There are training samples from three classes: win, draw, and loss. At the new stage, new training samples are selected from the previous stages and are added to the training samples, then we retrain the neural network. It is a type of on-line learning. The 8 features are selected with ad hoc choice. We use the theorem of Mirchandani and Cao to determine the number of hidden nodes. And after the testing in the learning convergence, the MLP is determined as 8-2-3 model. The learning rate and momentum coefficient are determined in the cross-learning. The prediction accuracy achieves 75% if the draw games are excluded.
Neural network methods had been used in the analysis of sport data and had good performance. Purucker employed the supervised and unsupervised neural networks to analyze and predict the winning rate of National Football League (NFL), and found that multilayer perceptron neural network (MLP) with supervised learning performed better than Kohonen’s self-organizing map network with unsupervised learning [
According to the schedule of 2006 WCFG, shown in Figure
Total 64 matches at 5 stages for 32 teams in the schedule of 2006 WCFG.
Stage 1 is the group match, also known as round robin tournament. There is no extending time after 90 minutes regular time. In this stage, there are 32 teams in 8 groups (Group A–H), each group has 4 teams, and each team plays 3 matches. There are 6 matches in each group and there are 8 groups, so totally it has 48 matches (Match 1–48) in stage 1. The criterion of gaining points is that winning one game has 3 points, drawing one game has 1 point, and losing one game has 0 point. After stage 1, two teams that have the higher points in each group enter the next stage. Table
The competition rule of stages 2–5 is single elimination tournament. It is necessary to have penalty kick if two teams tie after regular time (90 minutes) and additional time (30 minutes). The winner enters the next stage and the loser is eliminated from the competition. Stage 2 is the round of 16, and there are 8 matches (Match 49–56) for 16 teams. Stage 3 is the quarter-finals, and there are 4 matches (Match 57–60) for 8 teams. Stage 4 is the semifinals, and there are 2 matches (Match 61-62) for 4 teams. Stage 5 is the final-game, and there are 4 teams (the same teams as in stage 4) for 2 games. One is the third place game (Match 63), and the other is the final game (Match 64).
Score table after finished 48 matches at stage 1.
Keys | ||||||
Group (team) | Win | Draw | Loss | Play | Point | |
A | 0 | 0 | 3 | |||
Ecuador | 2 | 0 | 1 | 3 | 6 | |
Poland | 1 | 0 | 2 | 3 | 3 | |
0 | 0 | 3 | ||||
B | England | 2 | 1 | 0 | 3 | 7 |
Sweden | 1 | 2 | 0 | 3 | 5 | |
Paraguay | 1 | 0 | 2 | 3 | 3 | |
Trinidad and Tobago | 0 | 1 | 2 | 3 | 1 | |
C | Argentina | 2 | 1 | 0 | 3 | 7 |
Netherlands | 2 | 1 | 0 | 3 | 7 | |
Côte d'Ivoire | 1 | 0 | 2 | 3 | 3 | |
0 | 0 | 3 | ||||
D | 0 | 0 | 3 | |||
Mexico | 1 | 1 | 1 | 3 | 4 | |
Angola | 0 | 2 | 1 | 3 | 2 | |
Iran | 0 | 1 | 2 | 3 | 1 | |
E | Italy | 2 | 1 | 0 | 3 | 7 |
Ghana | 2 | 0 | 1 | 3 | 6 | |
Czech Republic | 1 | 0 | 2 | 3 | 3 | |
USA | 0 | 1 | 2 | 3 | 1 | |
F | 0 | 0 | 3 | |||
Australia | 1 | 1 | 1 | 3 | 4 | |
Croatia | 0 | 2 | 1 | 3 | 2 | |
Japan | 0 | 1 | 2 | 3 | 1 | |
G | Switzerland | 2 | 1 | 0 | 3 | 7 |
France | 1 | 2 | 0 | 3 | 5 | |
Korea Republic | 1 | 1 | 1 | 3 | 4 | |
0 | 0 | 3 | ||||
H | 0 | 0 | 3 | |||
Ukraine | 2 | 0 | 1 | 3 | 6 | |
Tunisia | 0 | 1 | 2 | 3 | 1 | |
Saudi Arabia | 0 | 1 | 2 | 3 | 1 |
From the website of 2006 WCFG held in Germany [
Supervised prediction system.
We get 64 match reports from [
We consider relative ratio as input feature. Training samples and prediction samples are normalized by relative ratio as follows:
The input features of
Data normalization of GER versus CRC.
Feature name | Features | Before Normalization | After Normalization | ||
Team A (GER) | Team B (CRC) | Team A (GER) | Team B (CRC) | ||
GF | 4 | 2 | 0.6666 | 0.3333 | |
S | 21 | 4 | 0.84 | 0.16 | |
SOG | 10 | 2 | 0.8333 | 0.1666 | |
CK | 7 | 3 | 0.7 | 0.3 | |
DFKG | 1 | 0 | 1 | 0 | |
IDFKG | 0 | 0 | 0.5 | 0.5 | |
BP | 63% | 37% | 0.63 | 0.37 | |
FS | 12 | 11 | 0.5217 | 0.4782 |
MLP model with BP learning algorithm is important since 1986 [
MLP network used in predicting the winning rate of 2006 WCFG.
At stage 1, we select the teams which win or lose all the three games as the training samples. The data are representative samples of the win or loss. But the selection is ad hoc also. They are italicized in Table
From stage 2, the winning team’s record will be added to training samples for all subsequent stages’ training process only if the team had won three games at stage 1.
The selected training teams (background color is gray in Figure The selected training samples for predicting the games in round of 16 (stage 2)—we select the teams whose score is either 9 (win 3 games) or 0 (lose 3 games) at stage 1 as the representatives of winning team or losing teams. Also, teams in draw games are selected as training samples. Then those game’s records are normalized to relative ratios as the training data. Consequently, we can find 21 samples from the 20 matches of 7 teams as the training data. We have 4 teams with 3 wins: GER, POR, BRA, and ESP, and 3 teams with 3 losses: CRC, SCG, and TOG. GER and CRC have one match and their records are selected as two training samples. Besides, there are 11 draw games (Match 4, 13, 16, 23, 25, 28, 29, 35, 37, 40, and 44) in group matches. So there are 22 training samples for draw games. Totally, there are 43 ( The selected training samples for predicting the games in quarter-finals (stage 3)—besides the 43 training data at the stage 1, we add the match data of stage 2 from those teams, which are not only the winner at stage 2, but also have all 3 wins at stage 1, as the training samples. We can find 3 teams’ records (GER, POR, and BRA) as the training samples at stage 2. Also, there is a game (SUI versus UKR), which ends in penalty kick, and we consider it as a draw game. Therefore, currently we have 48 ( The selected training samples for predicting the games in the semifinals (stage 4)—besides the above 48 training samples, we add 4 samples (GER, ARG, ENG, and POR) from 2 draw games which need penalty kick at stage 3 as the training samples. Therefore, we totally have 52 ( The training samples for predicting the games in the finals (stage 5)—because the two teams (ITA and FRA) do not have all 3 wins at stage 1, they are not selected as the training samples. The training samples at stage 5 are the same as that at stage 4 (52 training samples).
We do not have to predict the game result at stage 1, but records at stage 1 are extracted in order to predict the game results at the next stages. Therefore, the input data used to predict game results at stages 2–5 are described as follows: The input data for predicting round of 16 (stage 2)—we respectively take the The input data for predicting the quarter-finals (stage 3)—the input data are taken from the records of stages 1-2. We respectively take the average value from the 4 match records (3 games from stage 1 and 1 game from stage 2) of each team as the input data. Totally, there are 8 input team data for the 4 games that we want to predict in quarter-finals. The input data for predicting the semifinals (stage 4)—the input data are got from stage 1~3. We respectively take the average value from the 5 match records of each team as the input data. Therefore, we totally get 4 input team data for the two games to predict result of semifinals. The input data for predicting the finals (stage 5)—the input data are got from stage 1~4. We respectively take the average value from the 6 match records of each team, which have entered the final stage, as the input data. Therefore, totally 4 input team data are ready for the last two final games we want to predict.
Input data used to predict the winning rate of GER versus SWE at stage 2.
Features | Team A (GRE) | Team B (SWE) | ||||||
Rec. 1 | Rec. 2 | Rec. 3 | Average | Rec. 1 | Rec. 2 | Rec. 3 | Average | |
GF | 0.6664 | 1 | 1 | 0.8889 | 0.5 | 1 | 0.5 | 0.6667 |
S | 0.84 | 0.7619 | 0.6818 | 0.7929 | 0.75 | 0.7692 | 0.4286 | 0.6493 |
SOG | 0.8333 | 0.7619 | 0.8182 | 0.7612 | 0.75 | 0.5152 | 0.3913 | 0.5522 |
CK | 0.7 | 0.7143 | 0.45 | 0.6214 | 0.4737 | 0.6667 | 0.6667 | 0.6023 |
DFKG | 1 | 0.5 | 0 | 0.5 | 1 | 0.5 | 0.5 | 0.6667 |
IDFKG | 0.5 | 0.5 | 0 | 0.3333 | 0.5 | 0.5 | 0.5 | 0.5 |
BP | 0.63 | 0.58 | 0.43 | 0.5467 | 0.6 | 0.57 | 0.37 | 0.54 |
FS | 0.5217 | 0.5526 | 0.538 | 0.5374 | 0 | 0.4412 | 0.4194 | 0.2868 |
Mirchandani and Cao [
Kecman ever recommended the ranges of learning rate
Procedures of determining the parameters for BP learning rule.
To determine parameters | Fixed conditions | Variable conditions | Observation items |
---|---|---|---|
Momentum coefficient | (1) Hidden node | (1) | Average convergent iterations and standard deviation of convergent iterations |
Learning rate | (1) Hidden node | (1) | Average convergent iterations |
To determine the momentum coefficient
Average iterations and standard deviation of 40 tests under different settings of
0.5 | 0.6 | 0.7 | 0.8 | 0.9 | |
Average convergent iterations | 821.3 | 758.4 | 702.9 | 667.5 | 629.7 |
Standard deviation of convergent iterations | 51.27 | 42.58 | 35.28 | 27.24 | 35.79 |
After setting the
Set
0.1 | 0.3 | 0.5 | 0.7 | 0.9 | |
---|---|---|---|---|---|
Average convergent iterations | 667 | 219 | 133.97 | 97.9 | 77.9 |
From a previous theorem, it needs 6 hidden nodes to converge for 52 training samples with 8 input dimensions. However, in practice, the number of required hidden nodes may be less than that in theory due to data distributions. It is worth checking the number of hidden nodes for MLP to converge in this application. Tests are made from 6 hidden nodes to 1 hidden node with total 52 training samples at stage 4. We decrease one hidden node at each time for learning convergent test. Figure
Plots of MSE versus iterations under 6 different hidden node MLP models. Set
From previous analysis, the final prediction model used in this study is 8-2-3 MLP with BP learning. The parameter settings for BP learning are
We input the average data of each team into the well-trained MLP. Then we compare the output values of the first output node if two teams in a game to determine the win or loss. The team with bigger output value of the first output node, meaning the greater ability to win the game, is the winner. The winning rate prediction results of two teams at each game from stage 2 to stage 5 are listed in Table
Prediction results of winning rates from stage 2 to stage 5.
Stage | Match | Team | Outputs from MLP | Prediction result | Game result | Prediction correct | Prediction accuracy | Odds | ||
Node 1 (win) | Node 2 (draw) | Node 3 (lose) | ||||||||
2 | 49 | GRE | 0.9914 | 0.2010 | 0 | W | W | Y | 0.79 | |
SWE | 0.7866 | 0.3230 | 0.0001 | L | L | 1.26 | ||||
50 | ARG | 0.9604 | 0.0799 | 0 | W | W | Y | 0.04 | ||
MEX | 0.0338 | 0.9631 | 0.0102 | L | L | 28.4 | ||||
51 | ENG | 0.8325 | 0.2620 | 0.0001 | W | W | Y | 0.88 | ||
ECU | 0.7348 | 0.3779 | 0.0001 | L | L | 1.13 | ||||
52 | POR | 0.9909 | 0.0221 | 0 | W | W | Y | 0.97 | ||
NED | 0.9570 | 0.0864 | 0 | L | L | 85.7% (6/7) | 1.04 | |||
53 | ITA | 0.9856 | 0.0330 | 0 | W | W | Y | 0.01 | ||
AUS | 0.0129 | 0.9786 | 0.0329 | L | L | 76.5 | ||||
54 | SUI | 0.9843 | 0.0353 | 0 | W | D | N/A | 0.79 | ||
UKR | 0.7823 | 0.3176 | 0.0001 | L | D | 1.26 | ||||
55 | BRA | 0.9912 | 0.0212 | 0 | W | W | Y | 0.22 | ||
GHA | 0.2219 | 0.8161 | 0.0013 | L | L | 4.47 | ||||
56 | ESP | 0.9914 | 0.0209 | 0 | W | L | N | 0.89 | ||
FRA | 0.8848 | 0.1954 | 0.0001 | L | W | 1.12 | ||||
3 | 57 | GER | 0.9878 | 0.0169 | 0.0001 | W | D | N/A | 0.91 | |
ARG | 0.9017 | 0.1382 | 0.0003 | L | D | 1.10 | ||||
58 | ITA | 0.9841 | 0.0220 | 0.0001 | W | W | Y | 0.33 | ||
UKR | 0.3225 | 0.7406 | 0.0031 | L | L | 50% (1/2) | 3.05 | |||
59 | ENG | 0.9546 | 0.0639 | 0.0002 | L | D | N/A | 1.03 | ||
POR | 0.9868 | 0.0182 | 0.0001 | W | D | 0.97 | ||||
60 | BRA | 0.9874 | 0.0175 | 0.0001 | W | L | N | 0.88 | ||
FRA | 0.8703 | 0.1784 | 0.0005 | L | W | 1.13 | ||||
4 | 61 | GER | 0.9864 | 0.0252 | 0 | L | L | Y | 1.0008 | |
ITA | 0.9872 | 0.0238 | 0 | W | W | 50% (1/2) | 0.9992 | |||
62 | POR | 0.9843 | 0.0289 | 0 | W | L | N | 0.98 | ||
FRA | 0.9653 | 0.0608 | 0 | L | W | 1.02 | ||||
5 | 63 | GER | 0.9372 | 0.1354 | 0 | W | W | Y | 0.98 | |
POR | 0.9221 | 0.1628 | 0 | L | L | 100% (1/1) | 1.02 | |||
64 | ITA | 0.9917 | 0.0257 | 0 | W | D | N/A | 0.99 | ||
FRA | 0.9846 | 0.0433 | 0 | L | D | 1.01 |
The prediction results must compare with the real game results. The symbol “Y” means that the prediction result is correct, and the symbol “N” means that the prediction result is wrong. The symbol “N/A” means that the prediction result is not counted because two teams draw.
From Table
The prediction for football games is not easy because the players use feet to control the ball. Too many factors and situations are sometimes changeable, and thus the game results are usually unpredictable. Scoring is not easy in football games, so there are many draw games in the records. Most of the time neural network can only predict the winner and loser. In fact, it is not easy to get an equal winning rate from the output value of the first output node of MLP for two teams, and to predict the draw game. So we exclude the draw games in the calculation of prediction accuracy. If we exclude four draw games (Match 54, 57, 59, 64) and calculate the average prediction accuracy of other 12 games from stage 2 to stage 5, the percentage of the prediction accuracy is 75% (9/12).
The odds can be calculated from the MLP for betting reference and its formula is defined as follows:
In this study, we adopt multilayer perceptron with back-propagation learning to predict the winning rate of 2006 WCFG. We select 8 significant statistical records from 17 official records of 2006 WCFG. The 8 records of each team are transformed into relative ratio values with another team. Then the average ratio values of each team at previous stages are fed into 8-2-3 MLP for predicting the win and loss. The teams of 3 wins, 3 losses, and draws at stage 1 are selected as the training samples. The 8 records and training samples are selected by ad hoc choice. It is a common process to the real application of an algorithm. New training samples are added to the training set of the previous stages, and then we retrain the neural network. It is a type of on-line learning. The learning rate and the momentum coefficient are determined by the less average and deviation time in the cross-learning.
We use the theorem of Mirchandani and Cao to determine the number of hidden nodes. It is 6, and the MLP model is 8-6-3. After the testing in the learning convergence, the MLP is determined as 8-2-3 model. We can infer that some training samples are grouped together and we do not need the 6 hidden nodes.
If the draw games are excluded, the prediction accuracy can achieve 75% (9/12).
The 8 features are selected in ad hoc choice. But if we want to select 8 best features, we must work on C(17,8) combinations in analysis. We can select the feature set such that the error is the smallest or the distance measure is the maximum. Usually we may use divergence computation, Bhattacharyya distance, Matusita distance, and Kolmogorov distance in the use of distance measures for feature selection [
There are other methods: conjugate gradient method, Levenberg-Mardquardt method, simulated annealing, and genetic algorithm that can be used in the learning [
From pattern recognition point of view, compared with the two class training sets (win and loss), the three class training sets (win, draw, and loss) can have the more reliable prediction accuracy, because the decision regions or boundaries after training can be more precise.
The authors would like to thank the reviewer for his suggestion of selecting draw teams into the training sets to improve the prediction accuracy. The authors also thank Mr. Wen-Lung Chang for his collection of data.