Which statistics do I prefer to see when evaluating players?
Introduction
It’s difficult to grow up as a baseball fan without at least a passing of Kevin Costner’s baseball filmography. Of course, Field of Dreams is both the best-known and arguably the best in the canon, but Bull Durham deserves at least an honorable mention (although my parents must have skipped through the sex scenes when I was a kid). While there are many enduring quotes from all of his movies, one of my favorites has always been the exchange in For the Love of the Game between Costner’s Billy Chapel and Kelly Preston’s Jane Aubrey:
Jane Aubrey: Do you lose very much?
Billy Chapel: I lose. I’ve lost 134 times.
Jane Aubrey: You count them?
Billy Chapel: We count everything.
It is an excellent encapsulation of baseball’s long standing obsession with quantifying as much as possible from the game and that same obsession is what led me to my own passion in data analytics. My very first statistical analysis correlated a variety of pitching statistics (ERA, WHIP, etc) with winning percentage – unsurprisingly, ERA was by far the strongest predictor despite my hypotheses otherwise. That interaction in the movie also presaged the sport’s increasing focus on statistics and sabermetrics that has become standard across the league and its fans. I now consistently hear fans debating a player’s OPS in the same way that they might have debated batting average 20 years ago. But now that the available statistics have reached a dizzying level of complexity, how are fans supposed to navigate them? Personally, I lean heavily on statistics that can be contextualized – i.e. can be indexed to the rest of the player’s competition in that specific category. That’s certainly not a unique approach or even the most correct one, but I thought I would take time to share my logic for batters and return for another volume to focus on the pitchers.
Baseball Reference
If only this site were more readily available when I was completing that aforementioned analysis, it would have made the grunt work significantly easier and would have launched my obsession with the site that much earlier. Today, this site is embarrassingly in my top-10 most visited site – especially during the season – as the amount of information that’s available on here truly is overwhelming. When it comes to batters specifically, there are a few areas and categories that I focus on for evaluating hitters: OPS+, situational splits, and BABIP (batting average on balls in play). As a reminder, OPS+ is exactly the kind of indexed statistic I prefer as it’s built on the assumption that 100 is the average for the league for that year so below is thus below-average and vice versa when the number is above 100. I also appreciate that the number is indexed to that specific year to account for specific offensive changes that might happen in a given year (i.e. a COVID-shortened season or “juiced” baseballs).
Baseball Savant/Statcast
Much like Baseball Reference, Statcast has gone from niche interest space to overwhelmingly packed with information and data. But in the same way, I prefer to focus on statistics that focus on how the player ranks across the league rather than counting statistics that dominated earlier conversations. I especially focus on the percentile rankings, the player’s whiff%, and their barrel% as it gives a good sense of how the player is approaching their plate appearances, how they are executing with their swings, and how they compare to their peers.
Obviously, this discussion barely even begins to scratch the surface of the overall conversation on batter/player evaluation. There are entire professionals and industries that devote all of their time to utilizing these numbers to effectively evaluate batters and position players on any number of axes. But given the time constraints that we all are facing as simple baseball fans, I hope that this brief discussion can be somewhat helpful as we approach a month until pitchers and catchers report for Spring Training.