^{1}

^{2}

^{1}

^{2}

It is of great significance to predict the results accurately based on the statistics of sports competition for participants research, commercial cooperation, advertising, and gambling profit. Aiming at the phenomenon that the PageRank page sorting algorithm is prone to subject deviation, the category similarity between pages is introduced into the PageRank algorithm. In the PR value calculation formula of the PageRank algorithm, the factor W(u, v) between pages is added to replace the original Nu (the number of links to page u). In this way, the content category between pages is considered, and the shortcoming of theme deviation will be improved. The time feedback factor in the PageRank-time algorithm is used for reference, and the time feedback factor is added to the first improved PR value calculation formula. Based on statistics from 1230 games during the NBA 2018-2019 regular season, this paper ranks the team strength with improved PageRank algorithm and compares the results with the ranking of regular-season points and the result of playoffs. The results show that it is consistent with the regular-season points ranking in the eastern division by the use of improved PageRank algorithm, but there is a difference in the second ranking in the western division. In the prediction of top four in playoffs, it predicts three of the four teams.

There are many factors involved in the results of competitive games, and many factors need to be considered when forecasting. The prediction of competitive competitions in team battles is more complicated. In addition to personal abilities and personal on-the-spot performance, the factors involved in the results of the competition also include cooperative combat capabilities such as team cooperation. Therefore, the prediction of the outcome of the game is a very professional field problem. The NBA’s data system is amazing to the degree of quantification of the game. The NBA has always relied on cutting-edge technology for support, while providing a large amount of data for game prediction and game analysis. The strength gap between each team is small, and each game is full of infinite possibilities. This makes predicting the game a challenging and meaningful thing.

PageRank algorithm is an algorithm based on link analysis. The principle of the algorithm involves the knowledge of hyperlinks on web pages. The basic idea of PageRank algorithm can be understood in this way. First, the PageRank algorithm evaluates whether a web page is important, based on the number of webpages linked to this web page. We all know that the importance of the Phoenix.com homepage is higher than that of a personal blog page, but the specific importance is measured by the number of web pages linked to these two web pages. Specifically, the number of web pages linked to the homepage of Phoenix.com is more than the number of pages linked to a personal blog. Therefore, the homepage of Phoenix.com is more important. However, in order to improve the importance of some webpages, in addition to improving the quality of their own web page content, they will also create some webpages linking themselves, and many of them are even spam webpages. Although the index of importance has increased, these pages are not important pages. In order to avoid the drawbacks of evaluating the importance of webpages by linking, the PageRank algorithm uses a method of weighting the importance of linked webpages for assessment. For example, if a web page linked to a web page contains some webpages from well-known websites such as Google, the importance of this page is even higher.

It is significantly meaningful to evaluate the strength of the competitors and predict the results of the competition according to the strength. Zak et al. (1979) calculated the offensive strength and defensive strength of each team based on the statistical analysis of the technical characteristics of NBA games, so as to rank the comprehensive strength of teams and predict the results of the game [

For the research of sorting algorithm, foreign countries are earlier than domestic. PageRank algorithm and HITS algorithm are two representative sorting algorithms [

The difference between the Hits algorithm and PageRank is that certain web pages are identified as Authority pages and Hub pages in the Hits algorithm. The traditional PageRank algorithm is calculated based on web page hyperlinks, but the value of each web page link cannot be used to measure its importance and can only be calculated by using the average value. The Hits algorithm solves this problem well. The Hits algorithm is one of the very classic algorithms in link analysis. The current search engine Teoma uses the Hits algorithm as a link analysis algorithm. After the Hits algorithm receives the user’s query, it submits the query to an existing search engine (or a search system constructed by itself) and extracts the top web pages from the returned search results to obtain a set of queries related to the user collection of highly related initial web pages. This collection is called the Root Set. On the basis of the root set, the Hits algorithm expands the set of web pages. All web pages that have a direct link to the web pages in the root set will be expanded, and it is expanded into the extended page collection. The Hits algorithm searches for a good “Hub” page and a good “Authority” page in this expanded web page collection. When the PageRank algorithm calculates the relevance ranking, only one PageRank value is obtained, while when the Hits algorithm calculates, each page will generate two scores, namely, the Authority weight and the Hub score. The former is very useful in the search engine field.

Generally speaking, the commonly used evaluation methods of competitive games belong to the evaluation model of multiparameter input and single result output. Although it can reflect the strong weak relationship of individual matches, it cannot reflect the overall characteristics and interaction of the whole data group. However, these factors are the key basis for determining the ranking of teams. In order to solve the shortcomings of the abovementioned research methods, this paper constructs a new PageRank algorithm based on the weight transfer between research objects. Then, it is applied to the NBA game research, and the prediction results are compared with the previous points ranking data. The results show that the method is effective for predicting the results of competitive competitions.

This paper attempts to apply the method of ranking the importance of Google search engine pages to the prediction of NBA team playoff results. Firstly, the weight transfer matrix is constructed by using the score relationship between teams, and then, iterative calculation is carried out according to the improved PageRank matrix. Finally, the results of the game are predicted according to the strength of the teams.

After considering the topic identification, keyword identification, and other factors, the Google website sorts the search results fed back to users according to the PageRank value of each page. Some of the more important or classic page rankings have been improved as a result. This sorting result has been widely recognized by Google users. Specifically, Google divides the level of web pages into 10 levels based on the PageRank value, of which level 10 is the highest level. Generally speaking, when the PageRank is as low as 1 or 2, it indicates that this web page is not very popular, and when the PageRank value is greater than 7, it means that the importance of this web page is very high and it is recognized by Internet users. Generally, web pages with a PageRank value of 4 or higher are higher-quality web pages. Google’s own PageRank value is 10, which shows that this site is very well received by web users and is used frequently.

The search engine PageRank algorithm evaluates web pages based on web links. Specifically, the higher the quality and quantity of links in and out of a web page, the higher the PageRank value and the more important the web page. The idea of the PageRank algorithm is that every time a web page link enters the web page, it is equivalent to a vote for this web page. The more times it is linked, the more votes this web page gets. This gives rise to “link popularity.” When other websites are willing to link in with your website, it means that when your website is more popular, you can use “link popularity” to evaluate the popularity of your website. This concept is similar to the impact factor of academic journals, when an article in a journal is cited more often by others, the influence of the journal will be greater.

The search engine Google has its own system to calculate the PageRank value. The PageRank value on the Google website has the highest level of 10 and less, and the relationship from 0–10 levels does not increase by an equal amount, but presents a kind of nonlinear relationship, that is, the difference between the PageRank value of 6 and the PageRank value of 5 is much larger than the difference between the PageRank value of 5 and the PageRank value of 4. Also, the higher the number of stages, the greater the difference.

Because the PageRank value obtained by the search engine PageRank algorithm determines the ranking of the web page in the search results and the PageRank value is calculated by the number of links in and out of the web page, people show great interest in web links. In the past few years, some people have been thinking of ways to increase the number of links to their websites and even resorted to exchanges, purchases, etc., which caused adverse effects, so that Google changed its PageRank ranking system. At that time, some types of links were blocked. For example, the “Link Factory” website linked for linking does not have any substantive content, so all its pages are not assigned a PageRank value; some websites are linked to some websites with a high PageRank value. But, in fact, there is no great correlation between the two websites (for example, the website of a famous TV show links to a page on the basic principles of chemical engineering), and the PageRank value will not be obtained. At the same time, Google also extended the time period for updating the PageRank value of each web page each time to facilitate network users to supervise the ranking.

PageRank algorithm, invented by Larry Page and Sergey Brin, two founders of Google company, is applied to rank the importance of web pages by the Google search engine. PageRank algorithm determines the importance of a web page according to the interconnection of all pages in the Internet. If A link points to page B, page A will pass on its importance to page B. Google will calculate the importance of the new page according to the quantity and quality of the links. Generally, if a web page gets more links, the page will be given more importance; if a web page gets more links, the page will be delivered more importance. The transfer of quality and quantity is also applicable to the transfer of importance between teams in competitive sports.

A classic PageRank model is as follows:

The PageRank algorithm is one of the classic search engine algorithms, which has always received attention and application from researchers, but this algorithm still needs improvement. A conclusion can be drawn from the PageRank calculation formula. The weight of a web page has a great relationship with the number of links to the web page. Newly published web pages on the Internet have a short publication time and few linked pages. The value will be low, and the corresponding PageRank value of the old web page with a long publishing time will be high because of the number of links. Therefore, the latest search information required by the user is usually ranked relatively lower in the query result, which cannot meet the actual needs of the user. In addition, the PageRank algorithm takes the number of links in and out of webpages as the main factor and cannot distinguish whether a web page document linked to or out of a web page is related to the content of the search, which may cause the subject of search results to deviate. For example, Sina and Sohu are well-known websites on the Internet, and there are many web pages linked to them, and the PageRank value is high. If the user uses Sina or Sohu as a keyword or part of a keyword to search, these webpages will usually be reflected in the query results and will be in a relatively high position, but in fact, the user may not need these webpages.

In view of the defects of the PageRank algorithm, the PageRank algorithm can be used as the basic algorithm for improvement. For example, on the basis of the PageRank algorithm, the influencing factors of the web page HTML language are added. Combining the PageRank algorithm of web topics, all pages are classified according to topics, and the PageRank is calculated according to the classification results for each topic. In this way, each page will have a corresponding page level score for different topics, so as to reflect the importance of the page according to different scores. As the time for web pages to be published on the Internet increases, the importance of web information will continue to decline. A time feedback factor is added on the basis of the PageRank algorithm to feed back the impact of web page publication time on search engine rankings. Webpages with the same content will have different calculated PageRank values due to different publication times. The search engine ranking results given by the latter algorithm meet the expectations of most users and effectively improve the efficiency of search engines.

As can be seen from formula (

A probability function

As the object of this paper is a closed system, each team has played with other teams, and there is no case that the team is not outbound link. The damping coefficient can be ignored here. Formula (

Formula (

The segmentation of web pages based on VIPS is an iterative process, mainly divided into three steps. First is the page block extraction; that is, the HTML DOM tree corresponding to the current page is obtained from the browser, and the visual information of the DOM tree is used to segment the semantics. Second is divider detection, which is to find the horizontal divider and vertical divider in the page according to the visual information of the page. Finally, you reconstruct the semantic block, that is, relayout the page level on the basis of the divider obtained in step 2, and merge some blocks to form a new semantic block.

After the web page is divided into blocks, the web page is purified, by purifying and filtering noise blocks such as advertisements and navigation bars in the block. At the same time, according to the ratio of the link text to the nonlink text in the block and the ratio of the number of words to the number of pictures, the category of the page can be determined. Sort content of irrelevant pages can be filtered out directly.

There are two ways to determine the weight W(v) of PageRank value transfer between teams. One way of thinking is to measure the weight transfer between teams according to the victory or defeat relationship between teams. Taking the game between team A and team B as an example, this paper finds out the matches between team A and team B, divides the number of winning games of team a by the number of games of two teams, and obtains the weight

The second way of thinking is to add up the scores of team A against team B and divide the score of team A by the total score of the two teams to get the weight of team a against team B. This paper chooses this method to demonstrate.

The NBA main game (summer league is not counted in team performance) is divided into two stages: regular season and playoffs. The regular-season ending in April each year will determine the 16 teams participating in the playoffs, namely, 8 Eastern teams and 8 Western teams. In the middle of the regular season, there is also a very special time and game, namely, the All-Star Exhibition Game in February every year. On Thursday of the 16th week of the NBA regular season, the trade deadline for team players is the day. After the trade deadline, each team can only complete the remaining regular-season games and playoff games of the year on the basis of existing players. Also, this time (trade deadline) is usually around the All-Star exhibition game.

NBA team games are divided into home and away games. For a team, under the influence of a series of factors such as familiarity of the home court, support from the audience, and referees, its performance in home games is usually stronger than its performance in away games against the same opponent. Therefore, this article mainly classifies and counts the team’s game results according to the home and away results and builds a simulated team’s win rate when the team is at home and away. Taking into account the stability of the team’s players and the running-in period, this article uses the season data before the All-Star Game to predict the team’s victory and defeat after the All-Star Game. After the start of the NBA main game, the results of each team will be counted.

Different from football and volleyball among the three major balls, two teams with similar strengths have higher uncertainty in the outcome of the basketball game. Basketball games do not accept draws. Therefore, in basketball games of comparable strength, according to the rules of the basketball game, an “overtime game” will be added when the end time of the normal game cannot be determined. Similarly, in a basketball game of comparable strength, there will often be a “lore” scenario, that is, a team scores at the end of the game, changes the score, and wins the game. Therefore, in order to prevent teams of equal strength from amplifying the probability of winning the team due to accidental factors such as overtime or lore, this article proposes to exclude the game data of these games from the original data set, that is, to exclude those with high randomness. The number of matches allows the retained match results to more accurately measure the strength of a team.

The two basic assumptions of PageRank algorithm make the PageRank algorithm insensitive to the initial value assigned to participate in the calculation, that is, the result of PageRank calculation is determined by the topology and transmission relationship of each node in the network. The algorithm constantly calculates and determines the PageRank score of each page node and finally reaches a stable state. PageRank obtains the importance of the web page based on this. In this paper, the improved PageRank algorithm is used to calculate the importance of each team, and the value is used as the weight of the team. The larger the PageRank value is, the stronger the team is.

In this paper, Python language is used to develop the improved PageRank algorithm. The flow of the experiment is shown in Figure

Algorithm flow chart.

This paper selects the game data of NBA teams in the regular season from 2018 to 2019 as the research object. There are two reasons for using the NBA team’s 2018-2019 season data for research. The first reason is that compared with other competitive sports, the NBA in the United States has more research samples. A regular season has a total of 1230 games, and the number of data is relatively higher than other games; the second reason is that NBA teams have at least two games, and there will not be some games in which two teams cannot meet; if there are teams that do not meet, there will be no transfer of importance. The sample obtained is not applicable in this example. Figure

NBA playoff ranking prediction big data platform.

In this paper, the weight of team A to team B is calculated by adding the scores of teams in 30 teams. After arranging, a 30 × 30 weight matrix of losers is obtained.

By normalizing the rows of the matrix and performing matrix transposition operation, formula (

Because the improved PageRank algorithm is not sensitive to the initial value of the evaluation object, the initial value of PageRank of each team is assigned as 1 and calculated by formula (

Improvement PR index of NBA teams.

Team | Bucks | Braves | Raptors | Jazz | Blazers | Rockets | Nuggets | Thunder |
---|---|---|---|---|---|---|---|---|

PR value | 1.0347 | 1.0276 | 1.0254 | 1.0225 | 1.0214 | 1.0194 | 1.0186 | 1.0178 |

Team | Pacers | Celtics | Philadelphia 76ers | Spurs | Clippers | NETs | Kings | Heat |

PR value | 1.0164 | 1.0139 | 1.0105 | 1.0093 | 1.0055 | 1.0012 | 1.0001 | 0.9990 |

Team | Pistons | Magic | Mavericks | Timberwolves | Pelicans | Lakers | Hornets | Grizzlies |

PR value | 0.9987 | 0.9986 | 0.9974 | 0.9944 | 0.9916 | 0.9914 | 0.9899 | 0.9873 |

Team | Wizards | Hawks | SUNS | Bulls | Knicks | Cavaliers | ||

PR value | 0.9866 | 0.9771 | 0.9644 | 0.9612 | 0.9590 | 0.9579 |

From the definition of PR value, it can be understood that the larger the team’s PR value, the more positive the transmission of other teams’ PR value to the team. The higher the team’s ranking in the league, the stronger the overall strength.

In order to verify the rationality of the experiment, this paper compares the team ranking based on the improved PageRank algorithm with that of the regular-season teams, as shown in Figure

Team’s PR and winning numbers.

According to the competition system of NBA, the teams of the whole league are divided into eastern and western regions, each of which has 15 places. Finally, 30 teams will be selected to compete in the playoffs and match the strengths and weaknesses according to the regular-season points. According to the eastern and western regions, PageRank algorithm and the number of winning games are sorted. The results are shown in Figures

Team’s PR and winning numbers in the eastern division.

Team’s PR and winning numbers in the western division.

Division and strong and weak match can avoid the situation that the strong meet the strong too early, resulting in the strong team being eliminated too early. At the same time, after removing the strongest teams in the east, the improved PageRank value of the eastern team has been less than the equilibrium value of 1. If the team in the west will have a disadvantage, the division helps to increase the audience of the game and improve the viewing.

As can be seen from Figures

Comparison of the actual results and the two methods.

Eastern division ranking | 1 | 2 | Western division ranking | 1 | 2 |
---|---|---|---|---|---|

Accumulate points | Bucks | Raptors | Accumulate points | Braves | Nuggets |

Improved PR algorithm | Bucks | Raptors | Improved PR algorithm | Braves | Jazz |

Final result | Raptors | Bucks | Final result | Braves | Blazers |

According to Table

Qualifying predictions of eastern teams.

Qualifying prediction of the western conference team.

The traditional evaluation model of multiparameter input and single result output can only reflect the relationship between the strength and weakness of individual matches, but cannot reflect the overall situation of the season from the whole. However, the ranking of team strength according to the integral method is quite accidental, and sometimes the slight difference in score may affect the overall result. In this paper, an improved PageRank algorithm is used to rank the team strength from the overall perspective of the team competition data, and the results of the playoffs are predicted. The experimental results show that the predicted winning rate is equivalent to the integral method, but it is closer to the actual results in some parts. This paper only calculates the season data from 2018 to 2019 and does not calculate other historical data. There are some problems in NBA games, such as the style of the ball, the sudden injury of the main players, and the randomness of competitive sports, which makes the prediction results deviate to some extent. The effect of other historical data needs to be further verified.

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

This work was supported by Soft Science Research Program of Shaanxi Province (2019KRM101) and Scientific Research Foundation for doctors of Xi’an Polytechnic University (3100401016).