Ensemble Forecasting: Threat or Benefit?

My answer is: both — and as much of each as we choose to make it. If you have patience for a somewhat rambling but most assuredly sincere discussion on the subject, read on. This is a compilation of both new thoughts and some musings I’ve made on this topic in private correspondence with other scientists over the last few years.

The Coming Era Has Come

Much weeping and gnashing of teeth has taken place over the past few decades in the weather prediction community about automated forecasts, and the increasing accuracy and efficiency of computer models in generating them. As mainframe computing power accelerates, so does the angst in many human forecasters.

Military forecasters at Joint Typhoon Warning Center (JTWC) have been relying heavily on doing this for a few years, with apparently good success, using an ensemble approach called Systematic and Integrated Approach as a Forecast Aid (SAFA, more info here). Forecasters are allowed to toss away outliers at their judgment, but only if the outliers follow pre-identified model error patterns from a checklist. This leaves behind a numerical forecast called Selective Consensus (SCON) that the forecaster must use. In the long haul, it is almost impossible to beat the average track errors from these ensemble techniques.

NHC is relying more heavily than ever on composite (ensemble) track guidance, and with good reason; it works pretty darn well, averaged over a season. [2005 was their best season ever for overall track forecast error through 72 hours, partly because of use of ensemble composite guidance and partly because so many storms were “well behaved” compared to climatology and persistence (CLIPER).] SAFA-style forecast regulation may be the future in TC prediction stateside, after some years of ultimately futile inertial/political resistance.

Severe storms forecasters (me included) have been using Short Range Ensemble Forecasting ( SREF) guidance more and more during the last few years. Though some deterministic (single solution) modeling produces high-resolution output that looks like realistic radar echoes of supercells and squall lines, it’s still wrong quite often. Further, such output still has a long way to go before being robust and timely enough to integrate fully in SREFs that give reliable output by the time forecasts are due. But the writing is on the wall!

Human (Ir)Relevance

Are human forecasters are being made obsolete by machines? After all, a truly “accurate” forecast (say, a better prediction of tomorrow’s high temperature, wind speed or cloud cover) probably is, or soon will be, more useful to more customers than one of lesser accuracy that has a “human touch.” By in large, in a fast paced and market driven economy, results matter far more than the means to which those results are achieved.

If the slick-haired, smooth talking dude on “4-Warn” is telling Aunt Mildred there could be a hard freeze tomorrow night, and it happens, she’ll be glad she responded appropriately by moving in her potted plants. She probably won’t give a flip whether the forecast ultimately originated from a computer or a person. It was a good forecast and that’s all that matters.

Let’s briefly consider the powerful input-output problem: that forecasts only are as good as the input observational data. I totally agree! Ensembles don’t solve that problem; otherwise they would tightly cluster about a specific solution no matter the density or quality of input. They do, however, mitigate this factor somewhat, given surface and upper air data densities we now have over the U.S., by integrating numerous possible solutions (given the same input or even multiple sets of simultaneous input) and providing ranges of possibilities for forecasters to choose from.

We’re sliding into ensemble land and there is no going back. That much we know. Thus, could it get to the point where we’re required to “adhere to rules” to a great degree in probability forecasting because we will be shown to be outperformed otherwise? If so, when? If not, why?

Consensus Forecasting vs. Extreme Event “Outliers”

In ensemble-based forecasting, most human prognosticators will go for the most common solutions and throw out the extremes (especially if required to by rule, as at JTWC). But the wise forecaster, who is allowed to do so, also considers the extremes for what they are — low probability possibilities — and examines them further for conceptual validity before heaving them into digital Hades. After all, the model “outlier” makes its forecast for a reason…something was there which caused it to integrate such a seemingly whacked out solution, and there is a (low but not ignorable) probability that it may sometimes be right! What happens to the ensemble consensus approach every 5 or 10 or 20 years when a major landfalling typhoon or hurricane just kicks the stuffing out of those ensemble means and does something from way out in left field of the ensemble ballpark?

Pretend for a minute that ensembles of 20 years from now will provide a range of forecasts for specific supercells and tornadoes. The next 3 April 1974 type outbreak is forecast by a gross outlier…and happens! If one of the lowest-probability and most ridiculous looking ensemble extremes for that day had been a forecast of over 140 tornadoes, it would have been highly unwise to discard it!

JTWC forecasts often were hugely in error before SAFA techniques were put into place. Now they’re not, a much greater majority of the time. But what of the inevitable “fluke” event that defies SAFA error rules — thereby defying the expungement of outliers? Forecasters truly earn their pay by nailing the most deadly and dangerous: quite often, the rare extreme that comes “out of nowhere” to cause great damage and potential harm to people and the economy.

I dearly hope rigid rules for forecasting don’t allow the next 3 April 1974, or the next Labor-Day-1935 hurricane, to go badly forecast just because it is so extreme as to seem in error.

Yes, the human must have a sound physical and conceptual understanding of the atmosphere and of the processes relevant to his realm of forecasting (in my case, severe storms) to have any idea when one of those low-probability outliers actually could strike!

Those humans who simply regurgitate the ensemble mean, or any of the individual high-probability solutions, will watch the most important forecast of their careers roll around the bowl and down the hole, at the cost of great embarrassment (minimum consequence) and massive loss of human life (maximum). That’s the price any forecaster risks paying for overdependence on computer generated ensemble consensus, over his/her own understanding of the atmosphere.

Over the Next Several Years

This asymmetric penalty function actually can be made to work to the benefit of the human in the forecast process — at least, for the humans who scientifically educate themselves enough to keep up with understanding all possible aspects of atmospheric extremes that lie just beyond the reach of ensemble consensus forecasts.

In the years and maybe decades until the “Obsolescence Point” (more below) is reached for extreme-event forecasting (such as hurricanes, tornadoes, highly anomalous winter weather and so forth), I have no doubt at all that we will have important roles in:

  1. “Filling the gap” in numerical prognostic capabilities, where it still lags the human; and
  2. While still allowed, sniffing out exceptional events which defy MOS-like climatologically adjusted probabilities, or which verify out nearer to the fringe members of the ensemble instead of the means.

In the TC realm, the temporary reprieve from the prospect of utter model dominance over humans is in intensity forecasting (where models and humans both suck…humans just suck less) and in TC rainfall (likewise). Perhaps those will become the emphases of the human TC predictor as the models take over track prediction (and the latter becomes more of a zero sum game for the human versus the ensemble). [According to Chris Landsea (personal communication), the state of TC intensity forecasting is about where track prediction was in 1980.]

I can see this happening with some aspects of severe local storms (SLS) forecasting as well — say, overall outlook areas may be lost first, followed by tornado, then hail, then wind probabilities, then the mesoscale areas, then watch probabilities for each event type.

Each automation step takes several years and progressively more near-term superiority in accuracy — perhaps an asymptotically more difficult climb for the models to make to match the human. So maybe…just maybe…there still would be something left for me and my colleagues of same and lesser age to forecast when we are nearing retirement age.

So if you want to stay relevant as a forecaster in the next decade or so, get good at predicting extremes and rare anomalies. Tropical cyclones (TCs) and severe local storms (SLS) are relatively safe havens in forecasting for this reason; daily highs and lows are not!

But for how long?

The Obsolescence Point

At what point to the “painful exceptions” become so few that the bureaucrats become confident enough to risk them and put humans out of the daily forecast business, when the cost/benefit ratio (salaries, benefits, etc) as human forecasters is deemed expendable — even if we humans are still a little better than some “SREF-severe-probability-MOS”? Eventually even SLS and TC prediction increasingly will be “taken over” by automation. The machine may not forecast a few extremes well now and then, but also doesn’t keel over from avian flu. Supercomputers don’t get messy divorces, file union grievances, go home and go to sleep, nor take annual leave to go storm chasing.

I think the Obsolescence Point in humans in forecasting, to some extent, is coming in my natural lifespan. Chuck Doswell has touched on the latter across many of his writings (including this, and this, and this); now a manifestation of the phenomenon looms.

So where does this leave us? Even in that world, I have reason for optimism — pragmatically too, not in a Pollyanna fantasyland of wishes and magical fairy dust.

The Great Hope (Scientific Understanding and Communication)

The prospect of model ensembles taking over some forecast functions is not as scary once we set aside our innate ego and territoriality, and take a more pragmatic look at where we may be more useful in the future.

What matters in a forecast ultimately is its closeness to reality (the results, as often expressed by ranges or probabilities to reflect uncertainty. There will be no useful purpose, therefore, in sentimentally clinging to such a (by then) outmoded value system as “the human touch.” Our focus may have to shift to being “interpreters” rather than “predictors.”

Therein lies the importance of keeping up with scientific advances in understanding related to what we do (at least). Even when we reach the point at which we no longer predict much of it better than the machine, we will need physical understanding of the numerically superior predicted phenomenon, and practical understanding of its impacts, to translate it to publicly and industrially palatable forms.

A fair question Erik Rasmussen once asked me is this: “Is it good for society, for the human spirit, for mankind, to have our labors replaced by machines?”

No! At least, not unless we adjust in ways that give us reasons as scientists and as humans — intellectually and (glad he
mentioned this oft-neglected aspect!) spiritually — to stay motivated and stimulated. It sure helps to recognize one’s limitations and adapt accordingly. In both TC and SLS forecasting, maintaining optimal scientific understanding will help…I hope.

Science-minded operational meteorologists will be doing something 20 years from now. It just may not be forecasting in any form resembling today’s. Savvy forecasters might aim toward new niches in applied forecasting, where we guide the customers of those forecasts in responding appropriately, or translate a high-confidence numerical forecast to give emergency managers (for example) the likeliest time window of a tornado hitting Dallas this afternoon. As both human and mechanized forecasts get more accurate, the issue of communicating and interpreting them will get extremely important.

What to Do?

My message to fellow forecasters is this: Be prepared, not scared. Stand tall in the face of these challenges. Understand the science of extreme and deadly weather, to the greatest possible extent. Make yourself too valuable, too knowledgeable, to be eliminated. Instead of “fight” or “flight” — the two most natural and instinctive responses, “adaptation” could be the best way to respond most of the time.

Still, we must not sell out , turn to inauthenticity or otherwise compromise personal and scientific integrity. Expect to experience subplots of “fight” (against pseudoscientific crap or overselling of technology) or sometimes even “flight” (i.e., from Sisyphus-like futile struggles against superior mechanized forecasting on the longer timescales). As an old gamblin’ song once advised, “Know when to hold ’em, know when to fold ’em!”

So maybe I’ll eventually go for my CCM certificate afterall, in case my unnamed workplace is rendered wholly robotic by 2020, or deemed too cost-inefficient by scientifically obtuse bureaucratic hierarchy to keep as a distinct entity sooner than that. Then I can be ready to earn my keep in translating the new stuff into forms folks can use while remaining plugged in to the science.

In the meantime, and for as long as possible, I will continue to strive to maximize the tax dollars that pay my salary by outperforming the automatons on exceptional and deadly events — beginning where all great human forecasts do: with colored pencils applied to surface and upper air charts!


3 Responses to “Ensemble Forecasting: Threat or Benefit?”

  1. Rob Dale on December 16th, 2005 4:23 am

    How do we train forecasters on ensembles other than a 2-hr teletraining session after they’ve been hired by NWS? Is there any hope in a dramatic turnaround for the educational process when most college courses probably still use 10:1 and 540 lines as the core of winter weather?

    And if we do teach people about ensembles – how do we keep them aware of the April ’74 outlier and actually consider it as an option instead of ignoring?

  2. Gilbert S. on December 31st, 2005 1:08 am


    Great discussion, but you missed two parts and left me out in the cold as a result: application
    and input.

    Forecasts by humans and computers are getting better and better, and yes, we are approaching the limits of forecasting given the current input of data we have going into the models.
    We can improve models, and forecasters. But say we hit the limits of both, as you describe above.
    Are we any better? Let me dare to tell you…NO!

    Privately, I’ve been telling you about the “StormReady” certification Northern Illinois University has from the severe weather and emergency preparedness program I learned how to put together. When the NWS issues a tornado warning and people go outside to look for it or to go chase after it (without a proper background to do so safely), the glaring problem of wisely applying the information to people
    reveals its ugly head. The roof of my townhome, unfortunately, is mandated by city code to only sustain 60 MPH winds. That is, get any winds higher than that…and bye-bye shingles. My in-house study shows we get that once every 3 years, and in 2005, a few months after I moved in to my townhome, a squall line damaged some of the shingles on my house, which were replaced by the builder. We did a bit better job warning what would happen if a Katrina hit New Orleans, but people swept that information under the rug.
    A hurricane hitting south Florida causing much damage? Preposterous, until Andrew took that and blew it and hundreds of thousands of homes INTO the swampwater of the Everglades. And of course, in 2004, that ugly scenario was repeated several times.

    What I mean is this. Forecasters deal with the “during” of an event, not the “before” and “after”. I envision that the future forecaster will be conulted with “what should I do with my new football stadium to make it so that 30,000 people can go to a safe place during bad weather”, and “what can we do next time”, or even, “should I even rebuild here?”.

    My dealing with your accurate assessment of where forecasting is headed has provided me here at NIU an opportunity to glimpse into the future
    of forecasters’ roles in society. Some will become weather safety officers, as I am, and deal with “what if” scenarios. Others will become “what now” meteorologists, dealing with the aftermath and laying out future rebuilding plans, code designs to better withstand extreme weather, and so forth, in addition to, or even
    instead of, their forecast duties.

    And, there’s more. I also recommend when to buy and sell energy on the markets here. If we think there will be a cold winter or hot summer coming up, we’ll buy our gas and electricity ahead of time, using a fixed price. When prices spike, our price stays the same. We save millions of dollars at NIU doing this.

    It’s exciting, rewarding, and perhaps moreso than what a NWS or private sector forecaster gets to do on an average day. IMO, interpretation of forecasts will always be high priority, but when the fur hits the fan, it will be your cumulative forecasts, preparation for extreme events, and how well you handle the aftermath of said events which will define the forecaster roles in the future. I’m already there, and sometimes, it is nail-biting. But it’s rarely dull, even when it’s sunny outside!

  3. tornado on December 31st, 2005 8:06 pm

    I did touch on applied services and forecasting, but only in a generic sense as one of the key roles for future human meteorologists. Gilbert then went into more specific terms with it, which is a good thing. In fact his own example is an excellent one of the future demand for meteorological services that we all (as meteorologists) may need to adopt in some form.

    In a way, therefore, Gilbert’s work *is* the future of human forecasting. The same holds true for someone like Paul Janish, a former SPC forecaster who now works the energy usage prediction angle almsot full time for a big energy company.

    Public meteorologists, by comparison, are expressly prohibited from providing customized services to private individuals and companies — but *are* allowed and often encouraged to do so for federal, state and local governmental needs.

    One example of this is in the Storm Ready program itself ( http://www.stormready.noaa.gov/ ), where NWS declares communities, universities, states, Indian tribes and other local/regional governmental entities as “Storm Ready” when they reach certain milestones of storm preparedness.

    Gilbert’s university (Northern Illinois) is one of the very few colleges so certified. [OU, right here in the heart of “Tornado Alley,” is not!] Gilbert, who is publicly employed by the university, has done absolutely extraordinary preparedness work up there!

    Other roles for public forecasters in the future may include energy-related prediction for government consumption (something currently not done), as well as extensions of the incident-hazard services now performed (e.g., HAZMAT wind and hydrology prediction, other EM support, safety consulting for publicly funded operations, flood preparation and warning for potential dam breaks and IMET type functions in fire situations).

Leave a Reply

You must be logged in to post a comment.