With the NBA Championship Series underway, the question on everyone’s mind is who will be the winner? Will LeBron finally deliver a championship to Cleveland? Or will Golden State go all the way in the postseason after boasting the best regular season record for 2014-2015?
There are a number of factors to consider here, such as Lebron’s supreme talents and his amazing postseason statistics. On the other hand Golden State’s record speaks for itself due to its well-rounded roster including MVP Steph Curry. However, these factors can be subjective and difficult to quantify, so I attempted to predict the outcome of NBA matches using a linear regression model.
In its simplicity, the model predicts the number of points the home team will beat the away team by. If the regression model output is 8 for a particular game then it predicts the home team to win by 8 points and if the output is -4 it predicts the away team to win by 4 points.
I used the previous NBA season as my dataset to calculate the linear regression formula. The fact that there are over 1200 games in an NBA season gave me a large enough dataset without having to use multiple seasons of data, which can distort the results as prior seasons are less relevant. To put this in perspective, an NRL season contains around 200 games, and therefore you would need 6 seasons of NRL to obtain the same dataset size as 1 NBA season.
The factors I tested and found to be statistically significant in predicting the outcome of NBA games were as follows:
Each player is assigned a “win-share” from the previous season and the top 12 players on each roster have their win-shares added to form a team rating.
The input into the regression equation is the difference in team rating between the home and away teams.
The Warriors have a team rating of 66, whereas the Cavaliers have a team rating of 57. If the Warriors are playing at home the input for the model would be 66-57 = 9 as the model is based on home – away. The input would be -9 for a Cavaliers home game.
This is essentially the most important factor in the model.
Each week ESPN posts a power ranking for each NBA team. This provides an indication of form.
At the end of the NBA season the Warriors were ranked 1st and the Cavaliers ranked 3rd. Thus for a regular season game hosted by the Warriors, the input would be 1-3 = -2 however as ESPN does not release a power ranking list for the postseason, thus input was set to 0 for the Championship Series.
This is one measure of fatigue, which is a factor which significantly influences the probability of a team winning.
The input into the formula is binary, 1 for a team that played the night before and 0 for a team that did not play the night before. For the Championship series, this input was set to 0 as both teams will have played the same number of back-to-back games.
Games played in last 5 days
This is another measure of fatigue, but captures fatigue over the week rather than purely the previous night. This second measure is necessary as it is possible for a team to play back-to-back, have a day’s rest and then play another back-to-back, thus playing 4 games in 5 days.
The input is calculated as the number of games the home team has played in the past 5 days minus the number of games the away team has played in the past 5 days. For the Championship series this input was set to 0 as both teams will have played the same number of games in the last 5 days.
Home Court Advantage
This accounts for the fact that most teams play better at home than on the road.
This is a constant in the linear regression model as the model is based on home-away and is therefore the team always has a home court advantage.
There are many other factors that would be expected to influence the outcome of an NBA game that are not allowed for in the model. For example, the conference the team belongs and the team’s playoff experience.
While it is generally accepted that the Western Conference is more competitive than the Eastern Conference, I found that this was not statistically significant. However this is likely because the difference in the quality of each team across both conferences is captured in the team rating and power ranking.
Further, while a team’s playoff experience is not significant for the regular season, this could prove to be crucial in the dying stages of games during the Championship Series. The Cavaliers have the upper hand in this regard, with Lebron James and James Jones playing in their 5th Championship Series in 5 seasons. To make some allowance for this, I increased the Cavaliers team rating to serve as a proxy.
What is Win-Share?
The definition of win-share is easiest to explain with an example. If the Miami Heat won 60 games in the previous season and Lebron has a win-share of 15, this means that Lebron contributed to 15 of the Heat’s 60 wins. It is important to note that when the Heat win a game, the entire game is not attributed to a single player as win-share. Rather, the win is split up proportionally amongst the Heat players, determined by their influence on the game. Lebron had the highest individual win-share of any player on the Miami Heat roster last season, meaning he had the largest influence of all Miami Heat players. Whilst this may be easy to understand, how win-shares are actually calculated is quite difficult and relates to offensive and defensive efficiency of a player.
I started with the win-shares at the end of the previous season from a basketball statistics website. I then made minor adjustments to individual’s player’s win-shares where I believed changes needed to be made. An example of this is for Pau Gasol, a quality player with a win-share of just 3, playing for the hapless Lakers last season but traded to the Bulls this season. As Gasol now finds himself on a more talented roster, I believed that he would be worthy of win-share greater than 3. I gave him an adjusted win-share of 7.5 for the current season. His actual win-share at the end of the current regular season was 10.4, meaning he had more of an influence than I had predicted.
I also made adjustments for players that were injured for a significant proportion of the 2014 season and therefore had not played enough games to be credited with a win-share that truly reflected their playing ability. This was done using win-shares for pre 2014 seasons.
As the season progressed, it was important to keep the model up to date by adjusting the observed win-share values for current season performance, injuries and other roster changes. As each player’s win-share was based off last season, I updated the win-shares halfway through the season by doubling the players’ win-shares for the current season with some further adjustments if a particular team had favourable fixtures up to that point in time. It was further necessary to allow for injuries to players and other roster changes. This was done by subtracting the win-share of the injured/replaced player from the team total and adding in the win-share of the replacement player.
And the 2015 NBA Champion will be…
With these observed values, the linear regression model can be used to predict the outcome of a particular game the Championship Series. Combining this with the various permutations (for example, one permutation would be that the Warriors win in 4 games), I calculated the probability of each team winning the Championship as follows:
|Probability of Winning Championship
|Golden State Warriors
Thus the model has Golden State Warriors as the team to take out the 2015 NBA Championship. Interestingly, the most likely outcome is for the Warriors to win in 7 games, which would no doubt make for a memorable Championship series!
CPD: Actuaries Institute Members can claim two CPD points for every hour of reading articles on Actuaries Digital.