Summary: | We present a multidisciplinary approach for learning, visualizing, and assessing a model for the intrinsic value of a batted ball in baseball. The new methodology addresses one of the most fundamental problems in baseball analytics. Traditional outcome-based statistics for representing player skill on batted balls have been shown to have a low degree of repeatability due to the effects of multiple confounding variables, such as the defense, weather, and ballpark. New sensors have created the opportunity to define batted-ball descriptors that are invariant to these variables. We exploit this opportunity by using a Bayesian model to construct a continuous mapping from a vector of batted-ball parameters to an intrinsic value defined using a linear weights representation for run value. A kernel method is used to learn nonparametric estimates for the component probability density functions in Bayes theorem using a set of over 100 000 batted-ball measurements, while cross validation enables the model to adapt to the size and structure of the data. Properties of the mapping are visualized by considering reduced-dimension subsets of the batted-ball parameter space. The approach separates the intrinsic value of a batted ball at contact from its outcome and, as a result, allows the definition of batted-ball statistics for batters and pitchers that are less subject to systematic bias and random variation than traditional statistics. We use Cronbach's alpha to show that statistics derived from batted-ball intrinsic values have a higher reliability than the traditional outcome-based statistics and that this leads to more accurate estimates of player talent level that can be used for performance forecasting.
|