In just 13 short days, the 2023 NCAA D-I/II Men’s Volleyball season will get underway. Opening day, January 4th, will see the Railsplitters of Lincoln Memorial traveling to Malibu to face the Pepperdine Waves. The predictions aren’t out yet, but I took a sneak peak and this looks to be a tight matchup. LMU is coming off of its fourth consecutive IVA (Independent) championship while Pepperdine is coming off a surprise MPSF championship as the #3 seed in 2022. This is a fantastic matchup for the first match of the season!
In the days leading up to tip off (serve off?), I will look at the 8 conferences1 and where VBelo has them at the beginning of the season. There’s lots to talk about, but that is for another day.
Today, I want to cover the new addition to the VBelo model this season: Roster Retention!
Background
Every year student athletes graduate, transfer, decide to step away from the sport, or (apparently) turn pro. The model didn’t have a way to capture these decisions that take place in the offseason. This became super apparent at the beginning of the 2022 season when the pre-season top rankings basically just looked like the teams that made the national tournaments.
Therefore, I wanted to figure out a way to adjust every team’s elo rating in relation to the players that did not return for the next season. This is what commentators, coaches, and fans do when they formulate preseason rankings. If elo has a better idea of a team’s strength before the season starts, the predictions will be that much more accurate.
At first, I started to create a metric to measure the overall value of every player for their position. (Something like WAR in baseball.) This proved to be…difficult. Apart from creating a complex metric by position, getting offensive and defensive data for every player to touch the court in a given season was proving difficult to gather. Instead, I turned to more accessible data and created a relative estimate of a player’s contribution to a team’s elo, which I’m calling an experience score. So, now for the fun (math) part.
The Math
Warning: It’s going to get really nerdy, really quick.
The first step was finding the data that I could both get and was helpful. I landed on two basic data points: the number of matches and sets played for every player in NCAA D-I/II.2 Then I looked at what percent of a teams matches and sets that player appeared in. For example, Davide Gardini (BYU) appeared in 94 sets and 24 matches in 2022. BYU, as a team, played 102 sets and 25 matches. Therefore, Gardini played in 92% of sets and 96% of matches.
The next step was to figure out how to weight the importance of these two percentages. These two metrics are not perfect at telling us how much a player contributed to a team’s success, but they are decent estimates. In terms of importance, the assumption was that appearing in a match is less important than how many sets that player saw the court. To gauge how much experience a player had in a season, their sets were weighted 3:1 to matches (i.e. 75% sets played, 25% matches played). Back to our example of Davide Gardini, his weighted experience score would be 93.12 out of 100.3
Now that we have an experience score for each player, we just need to find out if a player returned for the next season. If they left (for whatever reason), then their score is added to the total loss for that team. Since Gardini graduated, his 93.12 experience points were added to the total loss for BYU. If you add in the 3 other players who saw the court at least once for BYU and are not on the 2023 roster, you have a total of 226.97 experience lost for BYU.
Time for the last step: make this a number from 0 to 1 (for ease of use and because it’s fun). Since a player that has a score of 100 would have played in every set of every match, the simple math was to multiply that by 6 (i.e. one for each player on the court). So, a team losing 600 points of experience is essentially losing an entire starting 6 (which is pretty significant). If you divide BYU’s score of 226.97 by 600, you get their retention metric of 0.38.4
The last part is actually applying this metric to an elo score. For this I did some testing and found that a retention metric of 1 (or losing 600 points of experience) would equate to a 5% reduction of a team’s total elo points. For BYU, they ended their season with a VBelo of 1596. Given their roster loss, their rating would decrease by about 30 points.
The Data
If you are still reading, thank you and you must really like math (or be skimming.) I don’t want to spend too much time here, but I want to mention the data.
For the game data, I manually verify all of the scores and outcomes as correct. For the player data, there is just way too much to do manually. Thankfully, I was able to scrape some data from NCAA Stats. Huge shoutout to Dr. Dwight Wynne for the base code to scrape this data!
If you have spent any time in NCAA Stats, then you know that their accuracy for Men’s Volleyball is not as strong as other sports. In two cases, I had to alter the data because NCAA had a duplicated game in their records. By and large, this data has not been cross checked for accuracy. With some spot checks, it looks good so I mostly trust it.
There is a very strong chance that this data is not 100% accurate, but that is just the nature of the beast. All of the data is coming from the same source and uses the same code so there is some level of comfort in knowing that. Sometimes you just have to use what you have available.
Some Highlights
When I talk about each conference (coming soon), I will mention more about how many elo points each team lost in the offseason. But here are some interesting highlights from the data.
Biggest Loser: UCSD - The Tritons are the only team to have a retention metric greater than one (1.02). This means they lost the equivalent of 6 players who played in every set.
Two Out of Three Ain't Bad: The average retention metric was 0.36. This means that, on average, every team lost about 2 full time starters.
Gotta Keep ‘Em All: Five teams returned everyone! Four of these teams were first year programs but one has been around since 2013: Erskine. Super impressive.
Almost Kept ‘Em All: Stanford had the lowest non-zero metric (0.01) after losing a total of 7 experience points. Better watch those Cardinal.
Can’t Stop, Won’t Stop: 49 players appeared in every set and match for their teams.
1k: There are 1059 NCAA D-I/II Men’s Volleyball players for the 2023 season!
This is just the tip of the iceberg for this data. I hope you enjoyed it and would love to hear what you think. Leave a comment, find me on twitter, or send me an email. (I’m nice, I promise.)
Big West, Conf Carolinas, EIVA, Independent, MIVA, MPSF, NEC, and SIAC.
Yes, this is a lot of data.
The basic formula for this would be [sets played/sets possible] x 0.75 + [matches played/matches possible] x 0.25. Then multiple it by 100 to make it out of one hundred.
Another way of looking at this number could be that BYU lost about 38% of their playing time.