Dangers of Forecasting by Model Consensus

Numerical ensemble forecasting has proven to be a wonderful development in many (probably all) facets of operational meteorology, including predicting those patterns which yield the right ingredients for severe storms from hours to days into the future. I love using model ensembles, and they have become a very helpful, indeed almost indispensable, part of my toolbox once due diagnostics have been completed. [In the interest of space, I’ll spare my usual rant about having deepest possible understanding of the current state of the atmosphere before proceeding to model guidance of any sort!]

Refer to an earlier BLOG entry for a lengthy discourse about ensembles and the future of human forecasters. Now we’ll look at an example or two to illustrate an important point: Beware the *incorrect* mean or consensus, and know when the outlier is best!

The May 2006 NOAA Atlantic seasonal hurricane forecast provides lessons for all in using ensemble guidance. The initial prediction from May was for 13-16 TCs, 8-10 hurricanes and 4-6 major hurricanes, with 80% probability of an above-normal season. This was a gross overforecast, partly because the seasonal CPC ENSO forecast underpredicted the ENSO warming. Even the revised prediction ( issued in August after El Niño began and three storms into the season), was for 12-15, 7-9 and 3-4 respectively, and an 75% chance of an above-normal season…still overdone.

The forecast of warm Atlantic SST (consistent with our temporal position in the AMO cycle) apparently was fine, but the rapid transition into an El Niño SST regime in June, and its associated upper tropospheric kinematic anomalies, wrecked the potential for an active season. The climatic ENSO forecast models blew chunks, the consensus forecast being near neutral (a.k.a. “La Nada”). Importantly, the only verifying member was on the far upper end of the spaghetti plot.

There’s a message here for all forecasters who use any kind of ensemble guidance: Outliers sometimes are the most correct, i.e., sometimes the solution on the far left or right, or high or low, really does verify much better than the middle or average.

Think back, for example, to college forecast contests where even the most aggressive operational model prog of a frontal position still failed to surge it far enough south, and your temp forecast got blasted to smithereens, your error scores bloated more than I get after eating too many bean burritos. An ensemble extreme in that case probably would have been the best model forecast to use as guidance in making your real forecast — not the ensemble mean or “consensus.” Clearly the extreme outlier verified as the highest probability forecast…because it happened!

Whatever the scale of the forecast, from mesoscale to SREF to seasonal and beyond, the nagging question is: How do you understand when and especially why a far-flung string of prognostic spaghetti will be the one to kick butt? There lies a crucial role of the human forecaster in the ensemble era!

As I once said the aforementioned BLOG essay: “Pretend for a minute that ensembles of 20 years from now will provide a range of forecasts for specific supercells and tornadoes. The next 3 April 1974 type outbreak is forecast by a gross outlier…and happens! If one of the lowest-probability and most ridiculous looking ensemble extremes for that day had been a forecast of over 140 tornadoes, it would have been highly unwise to discard it!”

One apparently safe path is to forecast the consensus all the time and simply swallow or ignore the missed extremes. Your scores will be great, on average, no matter how many people were inconvenienced or even killed, or how many hundreds of millions of dollars in damage occurred in the few missed events. Does your scientific and moral conscience permit this to be acceptable? Most managers probably will, at least if the missed event didn’t hit a politically sensitive target or cause mass casualties.

As long as your forecasts didn’t happen to include the fatal events — a fluke of happenstance — managerial bean-counters who place too much value in objective verification numbers as indicators of forecast value (and there are many, unfortunately) will love you. You may even get rewards for what is termed “superior performance.” Such “success” is hollow — performance without understanding, style without substance, an emperor without clothes. And it will come back to haunt you when the biggest day of your career hits and you fail to nail it because

  1. It is too far from the ensemble consensus and
  2. You don’t have the physical and conceptual understanding that gave you the confidence to deviate off that erroneous consensus.

[Herein also lies the pitfalls of human “collaboration” in forecasting. More on this in another stream of spew, coming soon!]

Maybe the really devastating event is just a once or twice-a-career occurrence. Isn’t that what we’re paid as forecasters to get right when it counts the most, though — the rare and extreme event that can affect many lives? It is, after all, the firehouse analog to the legendary, 7-alarm chemical fire in a populated area. If we put out everyday house fires well when nobody’s home, that’s good, but only until we totally drop the ball on the “big one.”

My dad once said that, no matter how tough you are, there’s somebody out there who can whip you. Same goes for forecasting and the events we predict. Every forecaster, just like every fighter, is going to get his @ss kicked sometime if he keeps at it long enough. It’s the nature of the beast. Ask any old-timer forecaster, especially at a national center, and you’ll get a horror story of the biggest miss of their career. There are a few who won’t be available for asking because they changed careers after such a disaster. This is about doing one’s best to deliver the butt whipping to the big bad foe instead of taking it.

Finally, quoting from the previous essay, I’ll reiterate another reason for knowing when to buck the consensus trend, then having the balls to do so: the inevitable automation of “routine” forecasting. “If you want to stay relevant as a forecaster in the next decade or so, get good at predicting extremes and rare anomalies.” The best means to that end is scientific understanding of the processes behind those anomalous but career-making (or -destroying!) events.

Are you a forecaster? If yes, do you want a meaningful job in the future? Then heed these warnings. Consensus guidance and ensemble solutions are your friends, to be sure, but as we know, friends sometimes can screw up and do terrible things that hurt us. Treat ensemble progs wisely, with due vigilance for the potentially correct outlier, instead of as the meteorological equivalent of a brain-damaging sedative.


Leave a Reply

You must be logged in to post a comment.