Forecasting Tennis Match Results Using the Bradley-Terry Model

Forecasting has been playing an important role in different fields of life, i.e., in decision-making activities of management, to predict uncertain events within an organization, in weather forecasting, in flood forecasting, etc. Stakeholders involved in betting market take advantage of tennis forecasting directly or indirectly. Winning probability calculated using forecasting models helps the bettors in deciding whether to place a bet or not. Keeping in view the importance of tennis forecasting, the Bradley-Terry model is used to model men’s professional tennis for predicting match outcomes in tennis matches of men’s singles. Model coefficients are estimated using data from January 2019 to September 2020 of 3439 matches. Ratings for each player are calculated using model coefficients. Player rankings are then calculated using these ratings. Comparison of model rankings with ATP rankings has shown satisfactory results. Winning probability for each player is calculated using model coefficients and ratings. These probability predictions are evaluated against four measures of performance. The results reveal that surface on which a game is played on contributes significantly towards a player’s performance. Due to this impact of the surface, our model has produced superior player rankings for certain players who had been ranked very low in official ATP rankings. According to most of the performance measures, the model has shown good results for clay court data which are closely followed by hard court data. To calculate return on investment, model results are compared with the bookmakers’ average odds and best available odds. It has been found that return on investment for a fitted model is highest in the case of clay court data in comparison to bookmaker’s average odds and best odds.


Introduction
Forecasting has been playing an important role in different fields of life. In recent years, it has also got significant importance in sports, i.e., golf, cricket, soccer, and tennis. The ability to predict the outcome of a sporting event accurately is something which fascinates the sporting world [1][2][3][4][5]. Application of statistical analysis for forecasting sports has evolved rapidly to meet demand from gambling, coaches, and media. For example, media channels provide more insightful coverage of a sporting tournament due to predictions [6]. Modeling and predicting the results of tennis matches in particular have gained significant attention in recent years. The prediction models reveal the interesting characteristics of various playing styles, making them suitable for coaching purposes [7,8]. The increasing trend of online betting markets, such as Betfair, offers an additional and extremely competitive motivator for research in prediction models [9]. Tennis sport is an attractive research candidate due to the large amount of freely available historical data.
Most of the research done in modeling tennis matches focuses on prematch prediction, in which the aim is to predict the winning probability of either player before starting the match. Prematch prediction is in opposition to the prediction of in-play matches, in which winning probabilities during a match are predicted [10,11]. Our focus in this research is solely on the prediction of winning probability before starting the match.
In this study, paired comparisons for forecasting tennis match results will be done. Several methods are available to forecast match outcomes in tennis. For example, Bradley-Terry-type model was employed to forecast match results for the top tier of men's professional tennis, the ATP tour [12]. Similarly, a high-dimensional dynamic model was used to forecast tennis match results [13]. Furthermore, tennis match outcomes were forecasted using within-match betting markets [14]. Kovalchik compared predictive performance of different forecasting models for predicting the tennis outcomes [15].
Paired comparison method is a handy decision-making tool, which describes and compares the values to each other. For making decision in the absence of any objective data, the paired comparison method is a very handy tool. This method is also known as the pairwise comparison [16]. In pairwise comparison, the items which we have to compare are judged in pairs to check which one is preferred and to rank the items being compared [17,18]. The main reason for applying pairwise comparison methods is simplicity in judging two items instead of several items at once. This method can be used in various situations, i.e., in subjective evaluation criteria or when important priorities are unclear, for example, modeling competitive ability in sports and choice behavior, i.e., preference of the democratic presidential candidate to the Republican candidate or the preference of one soft drink to another.

Rating.
A rating is a numerical value assigned to each team or player created on their contributions, while a ranking is the ordinal position based on the ratings [19]. The rating plays an important role in forecasting and prediction. It is used to indicate the strength of items or objects relative to each other and is used in almost every field. The player's rating is a system used to rate players individually. The rating can be done domestically or internationally, and each of the teams/players is ranked according to their rating or individual performance. The focus of this research is on the player's ranking to predict match outcomes [20,21].

Betting
Odds and Betting Strategies. Tennis bets are mainly placed in two kinds of markets: bookmakers or betting exchanges. In the former type, odds are offered by a bookmaker and customers who accept odds and directly place money at these odds against the bookmaker [22]. In betting exchange, odds are offered by customers and are placed against each other; for each paired bet, exchange simply takes a small commission. Generally, odds are more favorable in exchange markets, but due to the limitation of historical data on such odds, the models are mostly compared against traditional bookmaker's odds in research. The problem in making bets is to find successful bets, where the considered probability of occurrence is greater than the corresponding probability calculated by the bookmakers' odds, to get a positive expected return. A statistical model capable to predict probabilities of the results of tennis matches accurately can form a profitable betting strategy. Bets can be placed on different events relevant to various aspects of tennis match, both before starting the match and during the match. However, bets placed before starting the match will be considered here [23,24].
The purpose of betting strategies is to take advantage from those cases in which odds are undervalued. In these cases, the actual probability of the event in an outcome occurring is greater than the implied probability calculated using odds. If the predictive model is available, a simple betting strategy is placing a unit bet when the model shows that the odds are undervalued.

Data and Model
For the current study, data on tennis matches of men's singles have been obtained from tennis-data.co.uk. The dataset consists of 3439 men's single matches from January 2019 to September 2020. The dataset contains information about tournament name; date of match; location; ATP rankings; name of the winning and losing player; surface (carpet, clay, hard court, or grass); the results in sets, games, and matches; and the series. "Series" relates to the importance of tournament in terms of the prize money available and ranking points; it is divided into ATP250, ATP500, Grand Slam, Masters 1000, and Masters Cup. Results of previous 52 weeks of tournament competition have been used to derive ATP rankings. Awarded points are based on prestige of the tournament and on the progress of player. Records for the odds of bookmakers are also available at the website of tennis data for each game, which are utilized to measure the forecasting performance.
The purpose of the study is to forecast the result of the tennis matches between two competing players. The available data has been analyzed under Bradley-Terry modeling framework using the BradleyTerry2 package in R to obtain ranking of tennis players for the future.
2.1. Model. The Bradley-Terry model is a simple, popular, and widely used method for handling data on paired comparisons which is used to find the probabilities of the potential outcomes when individuals or objects are judged against each other in the form of pairs. This model is used to predict matches and tournaments and for testing the efficiency of betting markets. The default modeling approach to the statistical analysis of tennis matches is also based on the Bradley-Terry model. Forecasting tennis is further complicated by the surface effect because some players play better on some surfaces than on others, and the influence of the surface on match outcomes is also assessed by the Bradley-Terry model [25,26].
Suppose a set of K elements are compared with each other in the form of pairs. To find the probability that "i" is preferred to "j," for two elements "i" and "j" taken from this set, the following model was suggested by Bradley and Terry in 1952.
where the parameter λ l > 0 belongs to element l ∈ f1, 2, ⋯, Kg which represents rating and is denoted by 2 International Journal of Photoenergy This model has many applications in different fields as mentioned by Hunter; an extensive bibliography published by Davidson and Farquhar (1976) on the method of paired comparisons also includes a considerable amount of work [27,28]. For example, it has been adopted to rank players by the European Go Federation and the World Chess Federation and Hastie and Tibshirani suggested that this model is a standard approach for constructing multiclass classifiers using the results of binary classifiers [29]. Different extensions of this model have been suggested to handle draws, home advantage, team comparisons, and multiple comparisons [30,31]. In particular, the Plackett-Luce model was developed which is the popular extension for multiple comparisons, has been used for ranking of multiple objects, defined a prior distribution of permutations, and has also been used for choice models [32]. In the monographs of David [33] and Diaconis [34], detailed discussion about the statistical foundations of these models has also been provided.

Ranking of Tennis Players on All Surfaces
Keeping in view the scope of the current study, it is of great interest to utilize the abilities of players fαig to model a new ranking system as an alternative to official rankings published by ATP and compare both the ranking systems. Model parameters have been estimated using data for the period from January 2019 to September 2020, to obtain ranking of tennis players, assigning equal weights to all matches played on any surface.
Rankings produced by the model at the end of September 2020 for the top fifteen players have been shown in Table 1 along with ATP rankings. The rankings produced by the model and the ATP rankings are more or less the same for the top 7 players, except a few discrepancies such as Thiem D. and Federer R., whose ATP ranking is 3 and 4, respectively, whereas ranking of these players according to our model is 4 and 3, respectively. This change in ranking of Thiem D. and Federer R. is due to the difference in the performance of these players on different surfaces; i.e., Thiem D. performs well on clay whereas clay is Federer's least favored surface and he has managed to win only one title at the Roland Garros (French Open). Moreover, Federer R. has played on all surfaces, i.e., hard court, clay court, and grass court whereas Theim D. never won a game on grass court in our dataset. Theim D. has won more games, i.e., 63 in our dataset, than Federer R., who managed to win 58 games. Federer R. has won less games than Theim D., but he has played on all surfaces that contributed in his better overall performance than Theim D. It is also a historical fact that Federer R. has won more Wimbledon (grass court game) titles than any other player. Therefore, it is quite evident that grass is his most preferred playing surface.
Similarly, Berrettini M. has been ranked 7 in our model whereas he is at number 8 according to official ATP rankings. Zverev A., who is at number 7 in official ATP ranking, has been ranked 19 in our model. This is due to the fact that Zverev A. has managed to win only two games on grass court. Although Zverev A. has won more games, i.e., 53 in total than Berrettini M., who managed to win 49 games, Berrettini M.
played equally well on all surfaces which accounted for his better ranking position. For Dimitrov G., our model has amazingly predicted the same ranking as awarded by ATP ( Figure 1).
After 7 players in our model ranking, there are some discrepancies, due to the fact that ATP official ranking system ranks players based on how frequently a player has participated in events, instead of considering absolute performance of players, whereas the model being used in this study ranks players merely on the basis of past performance drawing no distinction such as how frequently a player has participated in events.
The ability of players can also be estimated using comparison intervals calculated using quasistandard errors as depicted in Figure 2. This method not only provides comparison with the reference player, but readily made comparisons between any pair of competing players are also provided.
3.1. Winning Probabilities on All Surfaces. Using values of estimates from model fit given in Table 2 and Figure 3, without considering the surface game played on, at the end of 2020, the estimated winning probabilities of competing players can be calculated using the following equation:      To check the statistical significance of the difference between Djokovic N. and Nadal R., difference of the estimates ðα 1 ∧ − α 2 ∧ = 0:2725Þ is compared with its standard error (SE = 0:5181) having p value = 0.5990 indicating that the difference between the performance of Djokovic N. and Nadal R. is insignificant.

Ranking of Tennis Players on Hard Court
In our model ranking for all surfaces, we saw some discrepancies which were actually due to the difference of performance on different surfaces. Some players perform well on hard court, some perform well on clay court, and some give better results on grass court. Keeping in view this fact, a separate analysis has been carried out for hard court and clay court competitions. This separate analysis has proved that surface has a very strong impact on a player's performance. Especially, we found that true ability of a player is judged by his performance on hard court. In the following, results from our model for hard court are presented.
Ranking produced by the model at the end of September 2020 for the top fifteen players on hard court has been shown in Table 3 along with ATP rankings (Figure 4). The model ranking and ATP ranking for the 1 st six players are the same which shows the model has produced good results when surface effect has been taken into consideration. Raonic M. whose ranking is 21 according to ATP ranking has been ranked 7 by the model. This is due to the reason that Raonic M. has played and won more matches on hard court than any other surface. On clay court, he has played and won only 1 match, and on grass court, he managed to win 8 matches. Similarly, De Minaur A. has won 35 matches on hard court out of a total of 38 he has won. He has won    Because of this reason, he has been ranked 8 in our model. Likewise, Opelka R. whose ATP rank is 36 has been ranked 9 in our model due to the reason that he has played and won more matches on hard court than any other surface. Opelka R. has won 32 matches on hard court out of a total of 36 wins. He has won only two matches on clay court and two on grass court.
The ability of players can also be estimated using comparison intervals calculated using quasistandard errors as depicted in Figure 5. This method not only provides compar-ison with the reference player, but readily made comparisons between any pair of competing players are also provided. Table 4 and Figure 6, estimated winning probabilities of competing players can be calculated using the following equation:     International Journal of Photoenergy

Winning Probabilities on Hard Court. Using values of estimates from model fit given in
The estimated winning probability of Djokovic N. vs. Nadal R. is To check the statistical significance of the difference between Djokovic N. and Nadal R., difference of the estimates ðα 1 ∧ − α 2 ∧ = 0:1416Þ is compared with its standard error (SE = 0:6918) having p value = 0.8378 indicating that the difference between performance of Djokovic N. and Nadal R. is insignificant.

Ranking of Tennis Players on Clay Court
Rankings produced by the model at the end of September 2020 for the top fifteen players on clay court have been shown in Table 5 along with ATP rankings (Figure 7). The model ranking and ATP ranking for first four players are the same which shows the model has produced good results when surface effect has been taken into consideration. It is pertinent to note that Wawrinka S., who was ranked 11 and 15 in all surface model and hard-court model, respectively, is now ranked 5 in the clay court model, whereas Garin C., who was ranked 26 and 42 in all surface models and hard-court model, respectively, is now ranked 6 in the clay court model. This significant upward change in ranking is due to the fact that Wawrinka S. and Garin C. have performed well on clay court as compared to hard court. Wawrinka S. has won 8 matches out of 14 on clay court while Garin C. has won 32 matches out of 43 on clay court. The ability of players can also be estimated using comparison intervals calculated using quasistandard errors as depicted in Figure 8. This method not only provides comparison with the reference player, but readily made comparisons between any pair of competing players are also provided.

Winning Probabilities on Clay
Court. Using values of estimates from model fit given in Table 6 and Figure 9, estimated winning probabilities of competing players on clay The estimated winning probability of Djokovic N. vs. Nadal R. is To check the statistical significance of the difference between Djokovic N. and Nadal R., difference of the estimates ðα 1 ∧ − α 2 ∧ = 0:1119Þ is compared with its standard error (SE = 0:8803) having p value = 0.8988 indicating that the difference between performance of Djokovic N. and Nadal R. is insignificant.

Measures of Performance
In Table 7, four measures of predictive performance are presented. According to all performance measures, the model is overall well fitted for all data, but for clay court matches, it shows best results for which model accuracy is 73.27%, higher than any other model in our data. Moreover, average probability and average log probability which are two important performance measuring criteria also show that the model is well fitted for clay court data. Similarly, return on investment is highest for clay court matches on average bookmakers' available odds. From Table 7, it is obvious that when the effect of surface is considered, the model provides good return on investment as model predictions have been compared against bookmaker's average odds and best available odds.
From Figure 10, it is quite clear that for all the measures of predictive performance, the model is overall well fitted. It shows best results for clay court in the fitted model, closely followed by results for hard court. So, it can be concluded that surface has significant impact on the performance of the players and ultimately on the outcome of the match. Table 8 presents the measures of predictive performance for bookmakers' average odds. In this case, classification accuracy for "all surface" data is higher as compared to the fitted model case and is closely followed by classification accuracy for hard court data.
Average probability and average log probability are highest for clay court data. As far as return on investment is concerned, it is highest when the model is applied on data of all surface types. The same is the case when the bookmakers' best available odds are utilized for comparison purpose as shown in Table 9.

Closing Remarks
The objective of this study was to introduce a model for predicting match outcomes in men's professional tennis using ratings obtained from the fitted model to rank players keeping in view the effect of different surfaces. In studies conducted previously on this topic, the modeling approach was restricted to the models in which information on official rankings was used. In the current study, the Bradley-Terry approach has been applied on historical match results to obtain match forecasts which are more accurate according to several criteria because the evidence from soccer, tennis, and golf suggests that the official rankings of teams and players although useful predictors do not contain the entire information required for forecasting results. To achieve the objectives of the study and implement the Bradley-Terry approach, historical data on men's singles was obtained from January 2019 to September 2020 of 3439 matches. The dataset was categorized into three major categories with respect to surface type; all surfaces (consisting of matches' data on hard court, clay court, and grass court), hard court, and clay court. The Bradley-Terry model was applied on the dataset for each surface category. In our analysis on dataset of all surface types, equal weightage has been given to hard court, clay court, and grass court. Analysis for hard court and clay court has been done separately for each surface.
The analysis revealed that surface of the match has significant impact on the performance of the players. Due to this impact of the surface, our model produced superior player rankings for certain players who had been ranked very low in official ATP rankings. Using coefficients of model, ratings of players were found for each category of surface. These ratings were then used to calculate wining probabilities of players. To check the adequacy of the model,  10 International Journal of Photoenergy four performance criteria were used to measure the performance of the model predictions. According to most of the performance measures, the model has shown good results for clay court data which are closely followed by hard court data. To calculate return on investment, model results were compared with the bookmakers' average odds and best available odds. It was found that return on investment was highest for clay court data when compared with both bookmaker's average odds and best odds.

Data Availability
The data used in the article is given therein.

Conflicts of Interest
No conflict of interest was declared by the authors.