Why Your Player’s MLB Stats Are Just The Starting Point
Fantasy Baseball players playing Box Baseball for the first time may run into situations where their MLB player’s game stats may be different from their Box game stats for pitchers and for hitters.
Users should understand that MLB stats are just the starting point in Box Baseball’s SIM engine and that there are a number of other factors that will contribute to our dynamic results. These factors include:
• Smoothing Data (stats)
• Ball Park Factors
• Defensive Ratings
• Platoon Splits
• “Boosters” for unlucky players
• Pitcher – Batter Matchup
• Sample size variances
Box Baseball does not use any stat lines in the SIM algorithm that are team- and context- dependent. These include, and not limited to: Runs, RBIs, Earned Runs (ERA), Unearned Runs, Wins, Losses, Saves and others. These stats often include environmental game factors that are often out of the control of any individual player.
Why Your Player’s MLB Stats Are Just The Starting Point
We frequently get emails from customers throughout the season. Some are compliments about how much they love Box while others express concerns and questions about how results are generated. And while we do love our compliments, we want to make sure we properly address the concerns and questions you may have. A typical question runs along these lines:
“My pitcher pitched 7 innings last night, gave up 5 hits and 0 runs. And then in Box, he went 4 innings, gave up 5 runs and 8 hits. What gives? How can my player be so off his real stats?”
“My hitter went 4 for 4 last night with two homers in real life. This morning I look in my Game Summary and he went 0 for 5 with 2 strikeouts. Whaaaat?”
Let us first say that if you have primarily played rotisserie (Roto) fantasy baseball, you are probably dumbfounded. You’re saying to yourself “where are the hits? Where are my home runs? Where is my pitcher’s win?” If you’ve come from the simulation world where they provide batch games results and primarily “fit to ERA”, then you might be saying, “how is this possible? The ERAs don’t match at all. How could my pitcher do so poorly in Box when he went 7 shutout innings in real life?”
These are legitimate questions. There is nothing wrong with those types of questions and there is certainly nothing wrong with Box. For those of you new to simulation-style fantasy baseball, it takes a bit of time to understand how our game works and why the results you get might land outside your expectations. It is our job to provide as much info as possible to help your understanding, and hopefully, contribute to an enjoyable fantasy baseball experience.
We believe simulation is a great game and most of all, realistic. There truly is nothing like it. Assembling real life rosters, thinking of how to balance hitting, pitching and defense, and fitting your team to your home ball park - which as we know, can dramatically affect the production and value of pitchers and hitters alike (ask any San Diego pitcher or Colorado hitter if they like playing where they play). All of that comes into play and it is infinitely more interesting than counting statistics.
To that end, when we built Box, our main objective was to make Box as challenging and real as possible for the user. We try to build as many factors into the game that can influence a game result. The factor has to be measureable so that a Box user can put some kind of internal value to that factor and weigh it relative to the other factors available.
We start off with real MLB stats.
This is the starting point on a long journey of stats. We collect --among many things-- inning counts, hit numbers, home runs allowed, home runs hit, walks, and all the “typical” stats that a batter and pitcher generate. We convert those stats or “events” as we call them into probabilities per appearance. So for instance, a pitcher who faced 27 batters and who struck out 8 in that outing, initially, would have a 29.6% chance of striking out a batter when they step to the mound in your Box fantasy league game.
This first step is very, very important to understand. We don’t necessarily care that he had 8 strikeouts total per se, but that in this particular game, he successfully achieved a 29.6% strikeout rate. We’ve moved from an integer count (8) to a rate count (29.6%). And every “event” like a strikeout or hit, or walk or home run has its own rate number that gets converted and stored in our stats database.
In addition, since we want to “smooth out” performances but still put the appropriate weighting to a player’s most recent appearance(s), we apply what we call “smoothers”. The smoothers bring in stats from trailing games (past games that he’s already played) so that it brings more balance to the player’s profile. Without it, we would have very uneven distribution of events and it would make things more difficult later on when the pitcher faces the batter in the SIM (as you’ll read later in the document). With the application of the smoother, the above pitcher’s strikeout rate may have moved to say, 27.2%.
We don’t stop there.
We apply ballpark factors, which increase or decrease the event rates and it may make the chances of doubles dramatically higher or triples significantly lower or something in between. Let’s compare the factors of two parks and see how it could affect that game’s results:
The park, as in real life, is poised to reduce offense considerably: singles by over 5%, and doubles and home runs by over 17% each. Over a course of a year, let’s say one team hypothetically played every game at Angel Stadium. This means that the stadium could singlehandedly reduce one team’s single season total of singles by about 50, doubles by about 49, triples by about 16, and home runs by about 26 . That is a lot of offense being taken away. But it can happen. Just ask anybody who has tried to hit a HR when visiting San Diego.
Wrigley Field however is hitter friendly. By the same measure, if a single team played all their games over a course of a season, singles would increase by about 107, doubles by around 21, triples by about 8 and home runs by about 20.
What is the end result? Ballparks matter in Box. And they can dramatically affect your results in any at bat or any given game.
Wait! There’s still more!
Before each half inning, we look at the defense on the field (and accounting for defensive substitutions too) and apply your team’s defense rating score. The defense can greatly affect your team’s performance by again lowering and increasing the chances of hits on balls in play . In 2011, your defensive rating counts for a maximum of up to 7% across the board and affects singles, doubles and triples.
So let’s say your team defensive rating is + 25. Let’s apply the math. The National League average production for a team in 2010 was OBP = .322, SLG = .383 for an OPS = .719. The +25 defense reduces the production to this: OBP = .314, SLG= .388 for an OPS = .703. The difference? About 16 OPS points.
What is the end result? That’s right. Defense matters.
Lastly, before just before each and every at-bat, the SIM grabs every player’s platoon splits. Depending on the player, you will see some dramatic shifts in their hit, walk and home run rate profiles. Take a look at this slugger’s platoon profile (which we are using for our games in 2011) :
The green numbers are good and mean that this batter has a platoon advantage versus right-handed pitchers but the red numbers means that he is rendered mortal against left-handed pitchers. Against lefties, this slugger’s OBP goes down by 5.0%. That means an OBP of .350 goes down to about .300 against lefties. His SLG, roughly speaking, decreases by a little less than 10%. If this slugger were slugging .550, he’s only about a .490 slugger against lefties.
Who’s the slugger? Read to the end and we’ll tell ya!
Now, let’s take a look at a pitcher with some dramatic splits:
For pitchers, the red numbers are good. This means that they are able to reduce the chances of those events. You can probably guess from looking at this profile, that this is a right-handed pitcher who is tough on right-handed batters. But lefties like what they see from him as their OBP goes up by 8.4%. Think about that. A player with a neutral platoon split against right-handed pitchers will see his OBP go from say, .340, to .420! That’s Pujol’s-esque!
Suffice to say, you really want to make sure this pitcher is appropriately ranked and used wisely. The pitcher? The answer is at the end of this document.
Each at-bat is a head-to-head battle.
Picture this: If Albert Pujols hits a homer off Carlos Silva in real life, that’s great. But what if he faces off against Roy Halladay in the Box simulation and Roy had just pitched a gem? Will he still hit a homer? Possibly. Does Roy Halladay still get the key out with bases loaded in the bottom of the seventh in a 2-1 game? Does he retire Pujols? Possibly. But it’s not a slam-dunk either way and may be the crucial turning point whether its the owner of Pujols or the owner Halladay that ends up sending Box the email of concern regarding the accuracy of the SIM.
This is a key component of the simulation and why we think that the simulation is fun – because we’re actively matching real life stats of both the pitcher and the hitter to come up with an outcome. Invariably, those who voice their concern to us about so-and-so pitcher not getting his complete game shutout, is singularly looking at the result from his own point of view. This is pretty key and something that all owners need to realize. Owners need to look at it from both a “Pujols owner” perspective, but also a “Halladay owner” perspective. People often say “my guy got screwed and should have done better”. But they have to factor in the variables on both sides of the equation.
So finally, when the pitcher faces the hitter, we need to amalgamate both profiles; the environmental factors like ballpark and defense and run the at-bat. Many users forget this stage that your player (on whatever side of the field) after all that brings only 50% of the profile to the at bat. So your hot pitcher? Sure he’s hot but you’ve not only seen all the different factors that could significantly change his rate profile from the initial MLB stat line, and, we hope you see that he still needs to face this batter, who also is hot and then we roll the dice.
(This is where the smoothers - as discussed before - are necessary because what if a pitcher who threw a 1-hitter faces a batter who went 4 for 4 with two homers? What happens? The smoothing data makes sure we smooth out the extreme performances on both sides).
What is the end result? That’s right. Matchups matter.
Although short-term results can be skewed, over the longer term and over large numbers, things will gravitate to where “they should be” or something reasonably close. Obviously if Pujols has a great season with 40 homers and 1.100 OPS, then over the course of 500-600 at bats, Box expects that he will have a good year. But over one game/5 at bats, we have no expectations that he will generate a 1.100 OPS in that one game, or one series or even a week. He may be colder or hotter than in real life. But again, over large numbers with recurring trials, the math will work out.
We like to refer to it as “random probability” but whatever term you fancy, it plays a part in the SIM. No matter how stacked your player profile is with favorable events such as HR, hits, or walks (if you are a hitter) or strikeouts or outs (if you are pitcher), the SIM doesn’t play favorites, it has no biases, and it has no ill will. It just picks an event. And sometimes you CAN be unlucky based on your player’s profiles.
So if you were to look at any one game in Box over the year and compare it to what you believe is the corresponding game in MLB for any given player(s), there may be a small to sizeable discrepancy. What you should note is that the SIM logs this in its memory and later on, will try to help out “unlucky” players with small boosts to events. The key is that the SIM doesn’t care so much about game-to-game results due to the small sample size effect, but that at the end of the year does the player’s Box profile look reasonable and acceptable compared to his MLB stat profile. And by reasonable and acceptable, we mean within 15% or thereabouts for the various stat counts, based on equal playing time in Box and MLB and all other things being equal (like user managerial settings that we haven’t touched on in this piece, but can affect performances for your players).
So now you know why MLB Stats are just the starting point.
You may think that we’ve now understated the value of MLB stats. This is not the intention. You as the GM of your roster make the most impact on how your team does and this starts from Draft Day when you pull out projection spreadsheets and make those crucial decisions when you are on the clock. It continues as you troll the Waiver wire looking for gems that may be floating around and continue with each and every trade you make. MLB Stats are the starting point for each and every player in our database. They are by far the most important element; you as the GM has the responsibility is to simply “put the best team on the field”.
But along the way those stats get pushed and pulled and transformed by myriad number of other factors that are meant to model what happens in real life baseball. After all, we are trying to create a simulated real actually baseball game! It’s all part of the challenge and fun of Box Baseball and unlike other games that use linear weights or multi-game batch results, when you run single game simulations, you will see variances, sometimes deeper and wider than anticipated on a game-by-game basis. Our advice to you is simple: keep going after the best players with the best rates, pick them up via the Waiver and Trader and play your best player all the time. Things will balance out and you’ll be sitting fine by the end of the year! It's a mathematical (near) certainty!
And finally, why ERA (and runs) don’t matter (that much) to us.
We may be re-missed if we didn’t explain why Box doesn’t use Runs at all in our SIM engine. Nowhere can you find any reference in the algorithm to Runs, RBIs, Earned Runs, ERA, Unearned Runs, Wins, Losses, Saves and other factors like this.
There have been a number of studies by such individuals as Voros McCracken and others; and new statistical measures such as DIPS (Defense Independent Pitching Stats) that posit that most pitchers, more or less, control only strikeouts, walks and homeruns. Without diving too much into detail here, a pitcher’s performance on events that take in the field of play (i.e. ground outs vs. ground ball hits; fly outs vs. doubles landing in the gap) are greatly influenced the defense behind the pitcher, ballpark factors and just random probability. You’ll see that we’ve included both Defense and Ball Parks into our SIM in order to account for the impact of these factors on our game.
So we accept that the pitcher largely controls Home Runs. And thus, we’ve accounted for his HR rate in our SIM. But does the pitcher control whether it’s a solo home run or three run home run? We believe not and until studies prove that pitchers actually pitch “better and bear down” when runners are on base or they “pitch to the score”, we will ignore whether the pitcher gives up a solo shot or a three run shot in our game.
Ultimately, what is important to understand is that a pitcher’s ERA is a product of his performance and not an input. He doesn’t (largely) control his balls in play, we’ve never seen studies that quantity that a pitcher pitches better when he has runners on base, and he certainly doesn’t control if he gives up homers with runners on or not. Simplistically, what we do know is the rate at which a pitcher gives up hits, walks, strikeouts and home runs (which are all affected to an extent by defense, ballparks, and randomness) and we can all identify the best pitchers (and the one’s we most covet in fantasy baseball) that have the best rates in each of those categories.
Sometimes those hits, walks, strikeouts and home runs are strung together in the Box game leading to 5 runs, and an early shower, and sometimes they are spread across 8 strong innings. But no matter what, I am sure if I offered you Joe Blanton for Cliff Lee, who may have gotten unlucky in a Box start or two, you’d laugh me out of the room.
Answer to the earlier question on platoon splits – The slugger, you guessed it, was Ryan Howard. The pitcher with the extreme righty/lefty splits is right-hander Brad Ziegler. Kinda makes you want to re-think your usage strategy for all your players now, huh?