Announcement

Collapse
No announcement yet.

Why The Wild Variance in xERA in Forecaster vs. Projections?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why The Wild Variance in xERA in Forecaster vs. Projections?

    In going over my draft prep, I often tweak a few of HQs numbers based on other data and analysis. While doing so, I always refer back to the Forecaster to see what the leading indicators are that might be influencing HQs projections and whether I might see anything different (however rarely that may be).

    One thing that's really stood out to me this year is the wild variance of the xERA numbers found in the Forecaster and shown in the daily projections on the website.

    Could someone please explain this to me? In other words, the basic leading indicators (as far as I know) haven't changed since the publication in December of the Forecaster. Last year's stats and the years before that haven't changed. HQ often preaches to ignore spring numbers, to concentrate on skills rather than roles.

    So why, for argument's sake, has Ubaldo Jimenez's xERA shot up from 4.18 in the Forecaster to 4.50 in the projections? Or Hiroki Kuroda gone from 3.80 in the Forecaster to 4.09 in the projections? Especially when the only thing that's changed between December and now is that the Dodgers infield defense improved considerably with the signing of Orlando Hudson?

    These are just two examples that jumped out at me, but there are many others and it actually seems to be more the rule than the exception.

    Don't get me wrong -- I love that HQ is constantly making adjustments to more accurately predict this year's stats, but I'm really confused by this. Especially since, in most cases, the result is the xERA growing worse, not better.

    What's the M.O. here? I would think that a number like xERA would be far more static than it seems to be here. I've been told before by HQ staffers, in fact, that it's the number that bears more watching that ERA.

    Anyway, you get the idea. What's the story behind these numbers and their changes?

    I'll appreciate any and all input here.

  • #2
    It's the web site value you want to look at. In all seriousness, the difference between the values comes from a stray parentheses in the book's xERA formula. Book values are still fine for player-to-player comparison, year-over-year trends, etc., but for xERA/ERA gap, look at the web site.

    Comment


    • #3
      Ray, I'm not sure what you mean about the stray parentheses. Are you saying the book xERAs are wrong somehow or arrived at by an error?

      I do know that the web site is the information considered most accurate and up-to-date, BUT....

      I'm still unclear on the basic question I've asked, which is:

      Why do so many of the xERAs published in December change so drastically come March? What is driving this change? Why do most seem to shoot higher than in the book?

      I'm just curious what drives the changes because it would seem once those numbers are calculated they wouldn't be subject to all that much change and, yet, they are.

      Comment


      • #4
        Originally posted by Stat Boy View Post
        Ray, I'm not sure what you mean about the stray parentheses. Are you saying the book xERAs are wrong somehow or arrived at by an error?
        Yes, I'm saying that a stray parentheses in the book formula caused the book values to run a little too high. The formula we're using on the web site right now is the correct one. That accounts for (most of) the difference you're asking about.

        Comment


        • #5
          Yikes! Who knew a stray parentheses could cause such mayhem.

          But your explanation is actually the opposite of my observation. I'm finding the vast majority of xERAs in the book are running considerably lower (i.e., better) than the ones in the daily projections.

          My experience with HQ and xERAs is kind of a 50/50 deal -- some xERAs are higher, some lower. But on the website, at least, there are very few xERAs that aren't higher than the ERA numbers.

          Is something going on this year? I often like to tweak my pitching numbers before the draft to reflect more of the xERA influence (when I'd asked Ron about this a few years ago, he said xERA was probably the more accurate predictor). But I'd hate to think I'm not using the most accurate information.

          Bottom line, is there a reason why this year's xERAs are, in the vast majority of cases, higher than the ERAs? This isn't normally the case.

          Comment


          • #6
            No, I don't have an explanation for what you're observing.

            Comment


            • #7
              I just did a quick perusal of the 3/31 projections. Of some 320 pitchers, only 74 have better xERAs than ERAs. Less than 50 of those with better XERAs have a variance of more than .20. And of those 50, only 15 are expected to have more than 50 IP.

              My point in all this, is that it seems significantly different than years past. I have found xERA a very useful tool to help uncover discrepancies between expectations and results and, ultimately, hidden draft day value.

              I'm not finding that to be the case this year and it strikes me that something might be off in the numbers. In fact, the stray parentheses in the book error seems to have numbers that are more historically in line.

              Derek Lowe, for example, almost always has an xERA lower than his actual ERA. But in the projections, it's actually higher. This has been the case, I believe, for 5 years running. In the Forecaster, this is the case again. He's listed with an ERA of 3.50 and an xERA of 3.19. In the projections, that xERA is now 3.63 -- the first time in 6 years that it's higher than his predicted ERA. Now, in this particular, case it might be a function of Lowe moving from LA to Atlanta. But I'm finding this to be the case with almost all pitchers.

              I'm actually now thinking that something might be off on the website's xERA numbers simply because they seem out of line with projections from year's past. And the book -- which you say had xERAs that were in error -- actually seem more in line with the historical numbers.

              Could you please look into this? My draft's fast approaching and I'd really like to make sure another parentheses hasn't inadvertently strayed along the way...

              Comment


              • #8
                I'm confident the web site formula is right, stat boy, as we did a thorough check of it when we found the book error. But the formula is in the glossary, feel free to test it yourself.

                Note that (I think it was) last year, we added a normalizing factor to xERA, precisely because too many pitchers had xERAs lower than actual ERAs. This better level sets xERA against the league-wide ERA standard. So comparing prior year projections to what we're doing now isn't telling you anything.

                Comment


                • #9
                  Ray, pardon my ignorance because I'm definitely not a mathematician...

                  What is a "normalizing factor"?

                  And why is it viewed as a negative to have too many pitchers with xERAs lower than actual ERAs?

                  In the past (which I know you're telling me to disregard), I found that larger variance to be an extremely useful gauge to predict better performances -- much as we use that variance in-season to predict improvement.

                  Has there been any real-world comparisons to see how the old system did vis a vis the new system in predicting ERA in the coming year? If the ERA and xERA are virtually the same, I'm not sure, ultimately, what the real value of xERA is (except, of course, for the handful of pitchers for whom it's extremely wide).

                  Comment


                  • #10
                    You're hitting on a key distinction, Stat Boy... it's a very different thing to look at the ERA/xERA gap in-season, as opposed to looking at it in projected data. In projections, why should ERA and xERA differ at all? The main reasons are things like park factors, team support, the occasional Javy Vazquez type who has consistently low strand rates. But we base our projections on base skills... if you told me you spotted a bunch of pitchers who had 0.75 of a run or full run differences between projected xERA and projected ERA, that would be something that would make me wonder if we had a problem with the projection model or the ERA formula.

                    Trust me, starting next week, you're going to see plenty of guys with wide ERA/xERA differences in their actual (not projected) stats for you to exploit.

                    As for the normalizing factor... if we ran a league-wide calculation and the league xERA came out to, say, 4.20, while the actual projected ERA came out to 4.50... you "fix" the xERA to level-set with the actual ERA by using a normalizing factor. It's just a straight multiplier, something around 1.10, that we use to add about ~10%, or whatever the actual factor is, to xERA.

                    Comment


                    • #11
                      Ray, thanks for some of the clarification.

                      I guess I was always under the impression (possibly mistaken) that the ERA prediction was a little more beholden to actual stats and the xERA prediction would go to those more speculative places, based on trends (like past xERAs, low hit rates/strand rates, etc.) that had yet to be reflected in the actual ERA.

                      But you're right, the difference between predictive ERA and predictive xERA has always befuddled me a bit. When I asked Ron a few years back, he said they both were valid, but if I had to bet on one, bet on the predictive xERA.

                      What I've done since is usually split the difference between the two or maybe weight the xERA a little more than the others if I wanted to tweak predictions that didn't feel quite right to me or based on my own observations and stat trends.

                      This year, it's proven somewhat futile to do that because the numbers are A) so close and B) so few are truly insightful to warrant a change from what is already listed as the ERA. If, say, Baseball Prospectus has a wildly different prediction, I might weight ERA or xERA more to account for that.

                      One thing I'm curious about -- since you seem to have knowledge of these things (I'm assuming you're Ray Murphy, no?) -- are 2 quirky discrepancies I've noticed between HQ and Baseball Prospectus.

                      1) BP seems to allow for PH appearances when it calculates playing time, something, from at least what I can tell, HQ does not. If not, why?

                      2) BP's predictive ERA numbers for Relief Pitchers are almost always significantly lower than HQ. Conversely, HQ's ERA numbers for Starters and, especially, young or rookie pitchers, are almost always significantly lower than BPs. (Also, for all pitchers, HQ tends to have lower WHIP predicted).

                      Any thoughts on this?

                      Comment


                      • #12
                        (Yes, Ray Murphy)

                        You were using projected xERA/ERA the right way... treat them as a range of outcomes. If you're focusing in on "we're projecting Pitcher X for 3.65 ERA, but xERA says 3.80"... you're getting to bogged down in the numbers anyway. Both of those are equally possible outcomes.

                        As for BP, I'm not really familiar with what they're doing... I've got enough to worry about here. But I'll take a stab anyway:

                        1. I don't understand why projected PH appearances would have any notable impact on playing time. We just do playing time as a straight percentage, if someone gets, what, 20-40 PH ABs a year, that's like 1%. Once you get down to this level, I think you're trying to attach more accuracy to playing time projections than you can ever realistically achieve.

                        2. No idea if what you're saying is true, let alone why (not that I doubt you). Obviously, you're aware that there are some signficant differences between the projection systems.

                        Comment


                        • #13
                          Ray,

                          While we're on "X" categories... I'm assuming xBA and BA are equivalent ideas to ERA and xERA. My question is, what is the number next to the xBA -- is that the probability of the xBA likelihood? The higher the number, the more likely the xBA?

                          Thanks!

                          Comment


                          • #14
                            It's % probability that the player's 2009 BA will exceed 2008's.

                            Comment

                            Working...
                            X