In March, estimates of how many Americans would die from Covid-19 in the coming months ranged from 81,000 to 2.2 million.
That disparity—which arguably produced as much confusion as insight—prompted one independent data scientist to build his own model in hopes of arriving at a better forecast. (He did.)
Now, as many as 50 different research groups make predictions, but one of the most accurate assembles all of the individual models, calculates the median value and looks no more than four weeks into the future.
The ensemble forecast was founded by the Reich Lab at the University of Massachusetts, Amherst, in collaboration with the Centers for Disease Control and Prevention and is based in part on models previously developed to forecast influenza and other infectious diseases.
In the next four weeks, it predicts the total number of deaths attributed to the new coronavirus will surpass 240,000—adding roughly 17,000 deaths to the current tally.
Such projections help policy makers and health-care officials decide how to manage resources and implement or relax interventions intended to curb the spread of the disease.
“It gives you guideposts,” said Jeffrey Shaman, an infectious-disease modeler at the Mailman School of Public Health at Columbia University. “Are you going in a bad or good direction? Is it under control?”
But even now, the projections of individual models sometimes differ substantially, according to Michael Johansson, co-leader of the CDC’s Covid-19 modeling team. In June, one model predicted that by July, the death toll would top 263,000. Another anticipated that it would be less than 120,000.
By focusing on the median, the ensemble forecast has consistently provided reliable four-week projections. In this case, its projection was 130,558, and the actual figure was 130,089.
“That’s been one of the big advantages,” Dr. Johansson said. “By leveraging all the information from all the model approaches, we get forecasts that are demonstrably robust.”
The Covid-19 Forecast Hub is the central repository for individual forecasts. Those included in the ensemble—currently more than 40—are probabilistic forecasts that look four weeks into the future; provide a point estimate of the expected number of deaths in the period; and include the range within which the actual number will likely fall with prediction intervals of 95% and 50%.
The ranges for each forecast are divided into 23 quantiles. The Hub calculates the median for each quantile, and the ensemble forecast is the median of those values.
“Because of previous work, we know you don’t want to exclude models unless there is a clear reason to do so,” Dr. Johansson said. “They capture different things.”
By confining the forecasts to the short term, changes in policy or public behavior are also less likely to upset the projections.
In contrast, the earlier models tried to predict the number of deaths that would occur months later.
SHARE YOUR THOUGHTS
How has your understanding of Covid-19 evolved since March? Join the conversation below.
“Forecasts beyond a relatively short prediction window are not going to be very robust,” Dr. Johansson said. “You can model trajectories beyond that, but it’s hard to have confidence in what will happen when so many things are changing.”
The sprawling range of the early estimates is also explained in part by their different assumptions.
A high projection by Imperial College of London depended on the absence of any interventions that might mitigate the spread of the disease, while a low estimate by the University of Washington Institute for Health Metrics and Evaluation depended on widespread interventions.
“We assumed everybody would have a problem, and those having a problem would do something about it,” said Ali Mokdad, a professor of health metrics sciences at IHME.
Reality fell somewhere in between: Some states implemented protective measures, such as stay-at-home orders and mask mandates, and some didn’t.
Get a coronavirus briefing six days a week, and a weekly Health newsletter once the crisis abates: Sign up here.
Seeing the disparate estimates early on, Youyang Gu, an independent data scientist, attempted to improve on the projections by developing his own model, and over time, it came to be regarded by many as the best individual forecast, according to Jarad Niemi, an associate professor in the statistics department at Iowa State University who works on the ensemble forecast.
But now, given the number and quality of available models, Mr. Gu has decided to stop making projections.
“When I started, there were around five models,” he said. “Now there are over 40. A lot of others are doing great work. The ensemble is the one I recommend the most. It’s unquestionably the best.”
Because the ensemble forecast includes so many different models, losing one—even a very good one—shouldn’t diminish its general performance, according to Mr. Gu, Dr. Niemi and the CDC.
“Having many, many groups helps make the system withstand changes,” said Matthew Biggerstaff, co-leader of the CDC’s Covid-19 modeling team. “The whole system won’t collapse.”
Write to Jo Craven McGinty at [email protected]
Copyright ©2020 Dow Jones & Company, Inc. All Rights Reserved. 87990cbe856818d5eddac44c7b1cdeb8