Announcement

Collapse
No announcement yet.

a 2003 comparison of HQ and other forecasters

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • a 2003 comparison of HQ and other forecasters

    Nate Silver at Baseball Prospectus takes a shot at comparing his PECOTA projections to six other forecasters, including HQ. It's way too complicated for me to understand (hell, I'm not sure I even understand the PECOTA projections), so I don't know how legit his conclusions are. Some of you stat-heads may understand it better and give us your opinion:



    Steve

  • #2
    PECOTA wins the comparison, but the test was designed by a guy with a vested interest. It looks like a reasonable study, though. His PECOTA system did particularly well with pitching.
    - - - - - - - - - - - - - - - -
    'Put Marvin Miller in the Hall of Fame!'

    Comment


    • #3
      Silver emphasizes that you can't conclude much from one year of data; of course, he could run through past years using PECOTA projections retroactively compared, but he may not have the other six sources' data to compare with.

      I know enough stats to know that the quantitative data he presents (if accurate) supports his conclusion

      Comment


      • #4
        More on evaluating projections (moving from the "Customer Service" thread)...

        Further to the discussion that got fragmented (somewhat out of hand) on another thread, I took a little deeper dive into the 2003 projections and results.
        Here's what I found, looking only at the 362 hitters who were projected to earn $0 or more--the assumption being that nobody would bid on a negative-dollar player):
        1. The correlation of all hitters actual R$ (4x4) to their projections was 0.70 (where "0" is no correlation and "1" is perfect correlation).
        2. The median error was -$1, which means the median player produced $1 less than his projection.
        3. The biggest overprojection was Jermaine Dye, who earned -$5 on a $26 projection, for -$31 difference; the biggest underprojection was Javy Lopez, who was + $27 ($34-$7).
        4. Not surprisingly, playing time was a critical factor: +/- variations in Plate Appearances (PA) from projections showed a 0.68 correlation with variations in R$ from projections.
        5. As the projected PA neared the actual PA, correlations went up:
        Within 50% (+/-): 0.74 correlation
        within 20%: 0.76
        Within 10%: 0.82

        What does it all mean? Well, if you follow John Burnson's logic in his recent essay "The Rotisserie Paradox", we need to do some speculating on players. These data suggest that you would be well-served to speculate based on potential PT gains, which often occur, rather than hoping for sudden surges in production that would be out of context with the player's underlying skills.
        That means targeting backups with demonstrated skills (including MLEs) playing behind injury-prone or inconsistent regulars, or players whose 2003 values were depressed by sudden PT losses (like Brad Fullmer) rather than by skill declines.

        Hope this helps. I didn't do pitchers yet because I have actual work to do!
        - - - - - - - - - - - - - - - -
        'Put Marvin Miller in the Hall of Fame!'

        Comment


        • #5
          One thing that hasn't been pointed out so far is this: it's possible for a projection to underestimate a player in one Rotiss. category and overestimate him in another category, leading to a wash when you look at $R. For example we could project 20 HR and 10 SB for a value of $XX, but the player actually produces 12 HR and 18 SB for a value of (within $1 of $XX). [Or however the $R works out... I didn't actually crunch the numbers here.]

          Maybe a better example is a pitcher projected for a great ERA/WHIP. He produces a crappy ERA/WHIP but somehow ends up with 10 Saves... that will skew his $R up to what it "should have been" if he'd performed to ERA/WHIP expectations.

          If you wanted to be ruthless about so-call projection accuracy, you would need to scrutinize each player across all category dimensions, and figure out how much $R was supposed to be derived from each projected category. Then compare this to the actual $R derived from the player's actual production in each category.

          I think this would be very time consuming. It might be interesting to know if some categories are "less predictable" than others... although I suspect the numbers would just reinforce what we already know (e.g., that SB depend partly on managerial tendencies, Saves are highly manager dependent).

          Comment


          • #6
            a 2003 comparison of HQ and other forecasters

            Thanks for the work. Playing time is always the issue and Aaron Boone is a prime example. He won't be hitting his projections but someone will now have an opportunity to jump in and earn some money.

            Comment


            • #7
              Originally posted by Randall@HQ
              One thing that hasn't been pointed out so far is this: it's possible for a projection to underestimate a player in one Rotiss. category and overestimate him in another category...
              It's possible, but not too probable.
              We could project 20 HR and 10 SB for a value of $XX, but the player actually produces 12 HR and 18 SB for a value of (within $1 of $XX).
              It's likelier with BA than with HRs and SBs. Those straight-count categoreis correlate pretty strongly (0.70+) with their projections. The chances of a player simultaneously overperforming in one of these categories while overperforming another has to be pretty small.
              If you wanted to be ruthless about so-called projection accuracy, you would need to scrutinize each player across all category dimensions, and figure out how much $R was supposed to be derived from each projected category. Then compare this to the actual $R derived from the player's actual production in each category.
              I'm not sure I follow this--if you apply some formula to derive an R$ value for each category, how is that any different from just looking at the production itself when the formula is a constant?
              I suspect a way to measure competing projections systems might be to go category-by-category to determine average error in each, then score each individual projection according to its variance from the mean error
              I'm not going to do it, by the way.
              - - - - - - - - - - - - - - - -
              'Put Marvin Miller in the Hall of Fame!'

              Comment


              • #8
                Echoing the comment from Patrick about speculating for PT gains: One of the passages that I cut from my essay made exactly this recommendation (though Patrick states it more neatly than I did). If I were speculating on batters, I might begin by sorting batters by $R/AB, and then picking the candidates who are behind fragile, unsettled, or aging stars.

                Comment


                • #9
                  Originally posted by DAVITT@HQ
                  I'm not sure I follow this--if you apply some formula to derive an R$ value for each category, how is that any different from just looking at the production itself when the formula is a constant?
                  I guess it is the same. My point was, by just looking at total $R to determine projection "accuracy", you might miss the (admittedly unlikely) scenario in which different components of a player's actual production (vs. projected production) led to that $R. Alternatively if you just look at projected vs. actual production, you miss the point of Rotiss., which is to maximize your return on a fixed investment. Well, actually the point is to win, but usually maximizing your return will accomplish that.

                  I suspect a way to measure competing projections systems might be to go category-by-category to determine average error in each, then score each individual projection according to its variance from the mean error. I'm not going to do it, by the way.
                  Aww... not even in your Copious Free Time (tm)?

                  Comment

                  Working...
                  X