Forecasters Are Paid to Be Wrong

Say what? Wait a minute, hoss…forecasters are paid to get it right! Well, yes, from a half-empty-glass perspective that’s true. But since forecasters can’t be 100% right, all the time, they are really being paid to be less wrong than it matters for you, as the user of the forecast.

How is this possible? If you really want to know, you’ll have to stick with me ’til the end of this long discussion, and you’ll have the viewpoint of filling the half-full glass.

The chart below is not just a forecast from an ensemble of models, it’s also a metaphor for our capabilities as human forecasters.

I’ll explain this chart for the unfamiliar. Each colored line is a different forecast from a start time into the future, valid at a spot. The thick black line is the average of all the colored lines (the “consensus” forecast, if you will). You can think of it two ways: literal and figurative. Understand it both ways and you’ll get this.

LITERAL: There’s a lot of information here, so follow along with me. Each line is a running total of a different model’s snow accumulation at Islip, NY, starting at 06Z the 27th (1 a.m. EDT). Yes, this is after the storm already started in reality–and this is important to remember! Also important: this isn’t even on the edge of the storm, where you know you can get either a lot or nothing–this is nearly smack in the middle!

Dots are every 3 hours. Vertical lines are every 6 hours. White background is day, gray is night (roughly). Numbers at left are in inches. Times and dates are in Z time at the bottom–take 5 hours off to get Eastern Time. The models actually started running from three hours before the start of the forecast (10 p.m., 03Z), but since these are forecasts, the first accumulation doesn’t appear until 1 a.m. (06Z).

Again this was while the storm was going on, and notice how different they all are already! There’s a nearly 6-inch disagreement from largest to smallest prediction at the first forecast time! By the time the models all agree that the snow has stopped, at 0Z on the 28th (7 p.m. EDT the 27th), it’s less than a day into the future. One model says just an inch comes down. Another says 30 inches. Some specific amount will fall, but how much? Only one of those models can be right in the end–and it’s quite possible that none of them are right. The forecaster can either offer a wild-ass guess (WAG), or give a range and express somehow that even that isn’t fool-proof. Which do you think is the more honest approach?

METAPHORICAL: Don’t worry about the exact spot or exact times. Pretend it’s where you are, right now, if that helps. Ignore the numbers on the graph if you want and make the forecast precipitation whatever you want it to be–rain or snow. Pretend that the lines represent not just computer models, but different human forecasts from media, the NWS, and private weather companies. Swap in whatever names of people, government forecast offices, companies or TV stations you want on the color legend.

[Understand this: media often get forecast information from either NWS or private forecasters and tweak it. The sum total of it all makes the number of possible human-modified forecasts that are available to you greater than this, and far more numerous than the number of computer models in existence!]

This speaks for the dilemma faced not just by forecasters but by the users of forecasts. What do you do with such a disparity of possibilities, none of whom may end up correct?

Well, look at this: the event’s already underway, there’s snow on all sides, and the forecasts are still all over the place. You can view it the easy (but wrong) way: “These forecasters are a bunch of idiots, how can they be so far apart? Just gimme a damn answer!” We call that “deterministic” thinking. That’s a ten-dollar word that just means, “way more exact than what’s really possible”. Whoever takes that attitude is looking at forecasting in a really lazy and ignorant way and does not want to understand how it works. I cannot help them and neither can any forecaster, so don’t bother. Such folks will just have to be left to like, you know, keep up with, like, the Kardashians and Justin Beiber, or, like, something. Fortunately I think anyone reading this cares and thinks a little more deeply than that.

The alternative to simple-minded dismissiveness and unreasonable demands is to put yourself in the forecaster’s shoes and try to make some sense of the mixed messages they’re getting. Recall the forecaster’s dilemma: 1) a WAG that follows a particular model of choice (the average of a bunch of models is actually a kind of model too) or 2) suggest a range of possibilities with uncertainty somehow stated.

Picking one “model of the day”–even the one in the middle–has a much bigger chance of embarrassing failure than offering a most-likely range and saying how likely. The model that was closest to correct last time can fail badly this time, since there really never has been a last time. Every situation is different–even those that look very similar.

Now you be the forecaster. Look at the same chart again (above). What do you tell your road crews, TV and radio stations, airports, school districts, law enforcement, your soccer-mom neighbor, and the governor to prepare for? They’re all breathing down your neck, demanding exact answers and amounts, or at best a really narrow range, but this is nuts–you can’t give them a specific answer with a clear conscience as a scientist. The possibilities are just too far apart.

You’re uncertain–very uncertain–because the input you’re getting is so wildly disparate. How do you tell the road crews, the governor, school superintendents, the aviators, the chief of police, Mrs. Soccermom, and the TV and radio stations about this uncertainty? How do you tell them that you don’t have an exact, specific answer and can’t give one–that the best you can do is offer a pretty big range (say, 7 inches either side of an average 15, since that’s where the bulk of the models fall) and still could be wrong? See the dilemma that real forecasters face here?

Now imagine this is 48 hours (two days) out, and not while the event is going on. Often, the spread is even bigger in advance than after it’s already started.

Real forecasters have lots of clues to look at besides models to narrow things down a little, but only a little. Surface observations, radar, satellite, and 12-hourly balloon launches, all put together, only measure a fraction of the air in the storm, and only in incomplete ways. Forecasters know this. It might help them to see that the 1-inch forecast is garbage and throw that one out, but the 20-incher (while not looking likely) is still possible!

The language of uncertainty is probabilities. Since forecasters are always at least a little uncertain–even those who pretend otherwise–the most honest approach is to assign a chance of each outcome. It starts as a percentage but doesn’t have to look like a percentage to the reader, necessarily…it can be translated into a range of colors if you think visually. But even a colored forecast map still is a translation of the probabilities (and uncertainty) behind it. This is why severe-weather outlooks offer probabilities expressed as numbers, colors or words on a map–you get to choose based on how you can relate best. But they’re all rooted in probability–the language of uncertainty.

In the end, it should be clear that no forecast (or forecaster) ever can be completely right, any time or all the time. We only can hope to be less wrong (more accurate) than the last time, and less wrong (more accurate) than most of the models. Forecasting is the art and science of being only a little wrong–but within acceptable tolerances. We express that truthfully by giving probabilities. That’s not hedging or hiding anything–it’s simply being honest.

Do you want your favorite forecaster to be honest or to lie? Because if he or she is telling you a specific snow or rain amount that’s going to fall tomorrow, that’s completely overselling his/her own knowledge. In short, that forecaster is lying. Nobody is that good, except by complete accident, and nobody is consistently that good! If a forecast is within your tolerance, you’ll consider it a good forecast. The question is: is your tolerance a reasonable expectation?

You see, it’s true: forecasters are paid to be wrong, to some extent. That’s because being totally right (perfect) is just not possible. The science just is not there yet. That’s the brutally honest truth. Accept it. We simply strive to be less and less wrong, and to provide the best information we can give to help the governor, the airports, the road crews, the cops, the soccer mom, and everbody else to make the best decisions for their own vastly different purposes. We realize, too, that we just can’t please everybody.

ADDENDUM

Were you wondering how much snow really fell? At Islip Airport, the final storm total was 25 inches (rounded up from 24.8, which is overly precise for blowing and drifting conditions). Right when the 10 p.m. (03Z) forecast package started, Islip Airport’s observer reported 5 inches already on the ground. Even subtracting that out, as we should to verify this forecast, that yields 20 inches. Regardless of what happened in Philly or NYC, the model consensus and most of the individual members actually under-predicted the big snowstorm at Islip, in the middle of Long Island. None of them got it precisely right. Two were within an inch. Even with those two, we must ask: were they nearly right for the right reasons, or was it a busted-clock type of accidental accuracy (with all those models firing, somebody had be the closest to the target)? The same concerns exist for human forecasters. These are the things we study when we look back at forecasts to try to improve.

RELATED LINKS

A general yet concise discussion by Doswell on uncertainty concepts in forecasting such events.

[ADDITION] Just found that Cliff Mass has posted a nice synopsis and discussion of the forecast scenario, with several examples and recommendations.

Lee Grenci (Ret. PSU) discusses communicating uncertainty for this event.

A specific 2011 BLOG entry on similar topics: Forecasting the Ensemble Outlier.

A general 2005 BLOG discussion: Ensemble Forecasting: Threat or Benefit? — including a definition of the “Obsolescence Point” for human forecasting (some links probably have expired)

SREF plume diagrams (the source and type of the chart used above).

Leave a Reply