Rating Batters in Test Cricket

Department of Mathematics and Statistics, e University of Haripur, Haripur, Khyber Pakhtunkhwa, Pakistan Department of Statistics, Government College University Lahore, Lahore, Pakistan EIAS, Data Science and Blockchain Laboratory, College of Computer and Information Sciences Prince Sultan University, Riyadh 11586, Saudi Arabia Department of Computer Sciences, Bahria University, Karachi Campus, Karachi, Pakistan


Introduction
Test cricket is a game of patience [1,2], and it is littered with renowned batsmen who have set extremely high standards [3]. A batsman has nearly limitless time to set and play each ball individually. It is a ball-and-bat duel that is not de ned by the number of deliveries [4]. Despite this, batsmen have struggled to stay at the crease, as conditions and lapses in focus result in the loss of their wicket, particularly in the modern game. In general, traditional measures such as batting average are used to evaluate a batsman in test cricket [5,6]. Players were awarded points based on the number of runs they score during the game. is time-honored method has several drawbacks [5]. e context of the contest in which the runs are scored is not revealed by the runs scored in the match or the average runs in the series. For instance, scoring 100 runs in a low-scoring contest is not the same as scoring 100 runs in a high-scoring match. Scoring 100 runs in the rst innings is not the same as scoring 100 runs in the fourth innings. is is due to the match's circumstances. Due to deterioration in the pitch, the pitch of the rst inning is radically di erent from the pitch of the fourth inning. Traditional measures such as batting average, on the other hand, overlook such instances. In the same way, playing in Melbourne is not the same as playing in Qadda Stadium. e impact of performing in one's own country or overseas, as well as other factors such as bowling rst or second and losing or not losing the toss, are all said to have an impact on cricket results [7]. In any sport, there are various approaches for determining who the top player or team is [8,9]. You can give them points based on which team performs best. ere are both technical and nontechnical approaches for evaluating players or teams. Di erentiating points are used to determine the winner in several sports [10]. It is possible to model point di erence, but we will not. We are attempting to develop a new statistical measure that will allow us to assess batsman performance in the context of the contest in which runs are scored. We are concentrating on the outcomes of Test Cricket matches. We explored the extent to which a variety of characteristics, such as playing at home or away, batting or fielding first, and pitch condition, influence match outcomes. Initially, we will forecast Test match outcomes using a multinomial logistic regression model. ese forecasts will subsequently be used to evaluate the batter's performance.
A large number of studies have attempted to focus on Test Cricket players in various ways around distinct theoretical frameworks [5,[11][12][13][14][15][16][17]. Recent research studies on cricket have highlighted the need of examining and comprehending prematch indicators such as toss, ground effects, home ground, and rating of both participating teams, among others [18][19][20][21][22]. Kimber and Hansford studied cricket batting strategy at various levels [23]. ey showed how scoring rate, opposition bowling strength, and pitch condition can be accurately integrated with runs scored to create an overall picture of batsmen's relative attributes.
e Test match results were studied by Allsopp and Clarke [24]. ey concluded that a team's first-inning bowling and batting strength, first-inning batting order lead, and home advantage are all good indicators of a winning test match outcome. Barooah and Mangan looked into some of the problems in evaluating batsmen for test matches [25]. ey discovered that batters in cricket are mostly valued according to their average score: in test matches, an average of 50 or more provides a rule-of-thumb for distinguishing inordinate players from the purely good. Singh et al. assessed cricket players' batting performance and calculated the impact of their performance on the ICC ranking system [26]. Male test cricket batters and female test cricket batters were ranked by Rohde [12]. He proposed a straightforward approach for ranking batters based on their performance. Mukherjee used a diffusion-based PageRank algorithm on the networks to figure out how important it is to rate teams and captains [27]. In Test Cricket, Akhtar and Scarf predicted match results session by session [28]. ey looked at how to match result probability (win, draw, and loss), and consequences differed from one session to the next. Daud and Muhammad collected a collection of Test matches [29]. ey proposed a new ranking system for Test Cricket teams based on the number of runs scored and wickets taken. ey suggested that a standard accuracy index be developed to determine the relevance of the discrepancy between the researcher's proposed rating system and the ICC rating system. Akhtar et al. developed a new rating system for players [5]. ey determined the criteria for the best player in test cricket. Shah and Patel applied principal component analysis and weighted average method to rate the captain of captains among all 29 captains included in the study. Brewer and Stevenson suggested a survival analysis to forecast batting abilities in Test Cricket matches [30]. ey developed a model in two stages, the first for individual players to assess their initial and balanced batting talents, as well as the rate of change in both. ey matched and identified the cricketers who open the batting, which has a positive impact on the batting order. Hussain et al. utilized the International Cricket Council's ad-hoc point system to assess cricket teams, and it is exclusively based on the number of wins and losses in cricket matches [31]. ey compared their findings to those of the ICC. Boys and Philipson used an addictive log-linear model to model run scores [13]. ey looked at how an individual batsman's innings-by-innings variation in runs becomes a source of doubt in their ranking position. Stevenson and Brewer developed a Bayesian parametric model to calculate and estimate how intercontinental cricketers' batting ability alters across innings using a Gaussian process [32]. ey identified which batsmen are struggling or improving their batting skills, which has a realworld influence on sportsman evaluation, aptitude recognition, and team selection strategy. Researchers have long hypothesized that the batter's performance influences the outcome of test matches [11]. In cricket, the concept of a player's rating appears to have always piqued the interest of sports analysts. e research on batsmen's performance also shows the importance of home ground, which can have a substantial impact on the outcome of a match.

Forecasting Test Matches' Results
All test cricket matches played between January 1, 2017, and December 31, 2019, will be considered. e cricket website ESPNcricinfo (https://www.stats.cricinfo.com/ci/content/ records/307847.html/) is used to get session-by-session data. Rain-affected contests and those with poor lighting will be disqualified. A Test Cricket match lasts five days, with each day consisting of three sessions (lunch, tea, and end of the match). e study only included nine (out of ten) recent ICC (International Cricket Council) sanctioned Test Cricket playing countries. Afghanistan has been removed due to its current status as a Test-playing nation, and, as a result, its participation in a disproportionately small number of ICCsanctioned matches. Outcomes are measured over three years since it is assumed that for the most part, the core playing group has stayed consistent throughout this time frame. At the end of each session, a series of multinomial nominal logistic regressions is fitted to forecast Test match outcome probabilities. Here, we will look at a model with a multinomial response (win, draw, and loss). Y depicts the match result by assigning values (1, 0, and −1), with each value equating to a victory, a tie, or a defeat. e reference category is draw (0). We employed the Akaike information criteria (AIC) (Sakamoto, Ishiguro, and Kitigawa [33]), which is formulated as AIC� 2 * (number of estimated parameters involved in the model) − 2 * (log-likelihood) and Nagelkarke's R square to examine the model fit (Nagelkerke, [34]), which is given as In each session of each day, we modeled match outcome session-bysession in Table 1 and forecasted the test match outcome probabilities. In this section, we used those probabilities to assess each batter's contribution to both teams. To compare our suggested rating system to the existing batting average approach, three distinct Test match series (7 matches) were included. We display the rating points for covariates such as ground effect, no ground effect, home advantage, and no home advantage. We use comparisons to see how these prematch factors affect batters' ratings.

Measuring Batters' Contribution
To determine the batter's contribution, you must first obtain the odds of the test match's outcome. Nominal multinomial logistic regression is used to calculate the match outcome probability (Sohail and Scarf, 2012). ese actual probabilities are written as follows: where P(Y) denotes the probability (win � 1, draw � 0, or loss � −1) at the end of each session t (t � 1st, 2nd, 3rd,..., 15th), l denotes the lead until session t, w_1 denotes the first team's wickets, w_2 denotes the second team's wickets, g denotes the ground effect, and h denotes home advantage. e model assumes Y has a multinomial distribution, that is, Y follows MN (p win , p draw , p loss ) with, We forecast match outcomes based on the abovementioned explanatory variables for each session of the test match. e potential position for both the reference team and the opponent squad has also been well-defined. At the end of each session, the hypothetical position of the batsmen is defined as follows: We assess their contributions after computing their points to determine the best batsman in the Test Cricket matches.

Example 1.
Consider an Australia-New Zealand test match at the Perth Cricket Stadium in Australia. When we fit a model at the commencement of a test match, the likelihood of the reference team (Australia) winning, drawing, and losing is 0.67, 0.03, and 0.30, respectively. Table 2 contains session-level data. Australia wins the match by 296 runs.
We depict the Trans-Tasman series, which is played in Australia between New Zealand and Australia. Australia has won this series (3-0). In the series, the Australian cricket team had the benefit of playing at home. e batters' rating points throughout the series are shown in Table 3. shows the results of our proposed methodology when both ground effect and home advantage are taken out of the equation. Table 5 shows the batters' rankings, which are based on traditional batter averages. Labuschagne of Australia received the highest average of 91.50 points. Second, our criteria assign a score to batters based on the probability of each session's test match result. Instead of the contribution shown in the typical position, a batter who performs well in a critical scenario receives additional rating points. As it stands, batters who perform well against highly rated teams earn more points than batters who do well against average teams. M Labuschagne received the best batsman of the series award in the Trans-Tasman series, as per traditional ratings; he scored the most runs with the highest average. e outcome would be different if the batsman of the series award was awarded using our proposed criteria. At last, we found correlation between our proposed rating system and ICC rating system with r � 0.636 and pvalue � 0.000.

Example 2. Consider the 2019 Test match between
Pakistan (batting first) and Australia (batting second) at Brisbane, Australia. We used the coefficients of several covariates to fit a model using sessional data. Pakistan's chances of winning, drawing, and losing at the start of the match are 0.53, 0.13, and 0.34, respectively. Table 6 contains session by session data. Australia wins this match. Another example is a Test Cricket match series between Australia and Pakistan that took place in Australia in 2019. e series was won 2-0 by Australia. e Test Cricket series is depicted in Table 7. Table 7 shows the results of the analysis when all predictors are considered, whereas Table 8 shows the results when home advantage and ground effect are not considered. All batters who have at least one chance to bat    Table 7. In Tables 7,  8, and 9, DA Warner, an Australian batter, was ranked best among batters from both teams. In Tables 7 and 8, he had different scores. When the ground impact and home factor are removed from the model, he loses some ranking points in Table 7. According to Table 9, DA Warner remained the greatest batter with the highest batting average based on the ICC's basic average criteria. In the series, DA Warner was named batsman of the series. It is concluded that there exists a correlation between our proposed criteria and ICC criteria with r � 0.835 and p-value � 0.001.

Example 3.
Take, for example, a test match played at Chattogram in 2018 between Bangladesh (reference team) and Sri Lanka. We used coefficients for different explanatory variables to fit the model on session-by-session data. For the reference team (Bangladesh), the chances of winning, drawing, and losing are 0.73, 0.12, and 0.15, respectively. e match has been called a draw. Table 10 contains session-bysession lead data. Consider another two Test Cricket match series in Bangladesh in 2018 between Sri Lanka and Bangladesh to further investigate the proposed criteria. e series was won by Sri Lanka with a score of 2-1. Players' batting performance in the series is described in Table 11 . When all predictors are included and techniques are used, Table 11 is produced. According to the results, Sri Lankan batter BKG Mendis received the most points (0.173) and was ranked first among all batters. When the covariate home advantage was not taken into account, BKG Mendis came in second with 0.201 points in Table 12.
When the covariates home factor and ground factor are removed from the collection of predictors, Table 12 is generated. In Tables 11 and 13, the same batter takes the first place. Table 13 is created to rate the players' batting performances in the Test series using traditional averages. According to Table 13, Sri Lankan batsman BKG Mendis   0  5  10  10  10  10  10  10  13  15  16  20  W2  0  0  0  0  0  1  3  5  10  10  10  10 Mathematical Problems in Engineering 5

Discussion
e results of the analysis revealed that each outcome had a varied impact at various stages of a Test Cricket match. Explanatory variables such as home factor, ground effect, and team strength have an effect on outcomes at the start of a Test match, but this effect fades as the match progresses. Lead has a minor impact at the start of a test match, but it grows in importance as the match develops. During the match, the number of wickets is also significant. A Test Cricket match is made up of five days, each of which has three sessions, for a total of fifteen sessions in a five-day contest. Predictors fluctuate their effect on match results over the course of the five-day match; therefore, we measured all of these sessions one-by-one to anticipate the outcomes at each phase, making it easier for forecasters to forecast on a specific position.
rough the statistical analysis, a rating system for Test Cricket matches is presented in this study. Multinomial logistic regression is used to calculate Test match outcome probability. To extend the scope of this study, a larger data set with additional explanatory variables can be used.
ere is fluctuation in our suggested rating system at the start and end of the Test match. A larger dataset can be used to tackle this problem. In our rating method, batters' contributions are judged by the difference between the hypothetical probability and the observed probabilities for the first inning and the difference between the supposed probability and the observed probabilities for the second inning. Researchers can utilize a variety of ways to overcome issues relating to the batter's contributions in a reduced-scoring game. e study methodologies used to rate batters are fairly practical because the proposed rating system is based on session probability, which assesses a batter's performance in relation to his contribution to the match outcome. We found correlation (0.883) with pvalue � 0.001 between proposed criteria and the traditional criteria introduced by ICC.  Mathematical Problems in Engineering 7

Data Availability
Datasets are derived from public resources website (http:// www.espncricinfo.com) and made available with the article.

Conflicts of Interest
e authors declare no conflicts of interest.