Arbitration salaries totaled about $867MM in 2011, and within a few years they will total over a billion dollars across the league, yet the arbitration process is poorly understood and rarely studied to the extent of free agent salaries. With the help of Tim Dierkes, Ben Nicholson-Smith, and other friends of MLBTR, I have fine-tuned a model for predicting arbitration salaries. By incorporating arbitration earnings from the last five years, the model is able to predict salaries using a range of related players. The model has a correlation of roughly .98 with actual salaries, and predicts actual earnings within $170K for more than half of players.
How good is the model? Well, it works well when it already knows what all the players made and can try to fit the data perfectly. So, I decided to see how well it did if I recreated the model without data from a year and then predicted the salaries from that year using the data from the other years. So I used 2007, 2008, 2009, and 2010 statistics and salaries to predict 2011 salaries, then 2007, 2008, 2009, and 2011 salaries and data to predict 2010 salaries, and so on. The result was still a very strong prediction: it was within $320K half the time. Even the most sophisticated model using service time, career wins above replacement, and single-season WAR (and remember that WAR is an actual one-size-fits-all estimate of player value) could only get within $700K half the time. For the average player, even a simplified version of my model cuts the error in half!
The salaries of arbitration eligible players are determined by arbitration panels or by contracts signed under the shadow of potential panel decisions. This represents a lot of players. Only about a third of playing time goes to free agents, and another third of playing time goes to players not yet eligible for arbitration. The other third of playing time -- and about 25% of payroll -- goes to players whose salaries will be determined by an arbitration panel, unless they reach an agreement first.
In contrast to the free agent market, which now incorporates a modern understanding of baseball, arbitration relies on simple statistics such as pitcher wins and runs batted in. When advanced statistics became available, teams incorporated these into their free agent bids, and stopped paying much attention to old-school statistics. Meanwhile, arbitration panels determine a player's salary based on "comparables," players with similar basic statistics and service time. The salaries that the model produces aren't far from what an educated fan might guess, but the subtle differences are important.
In Tim Dierkes's arbitration series, he has been giving rough estimates of salaries for players based on in-season projections, but we will be releasing the model’s official salary projections for the 2012 season shortly. The most influential factor for both hitters and pitchers is playing time. More plate appearances and innings pitched make a huge difference. For batters, unsurprisingly, home runs and runs batted in matter most to arbitration panels and our model, while stolen bases and batting average also play important roles. For starting pitchers, wins and ERA are the most important, while relief pitchers get paid mostly based on saves and holds, with a dash of ERA as well. This week, I will post another article on hitters and another article on pitchers explaining the importance of these statistics for certain players in more detail, and I will highlight a couple of unique cases for the 2012 season. Will the model miss by a lot for some players? It absolutely will. But it’s going to hit a lot more than it’s going to miss, and it can provide guidance on players that are harder to understand.