Friday, April 10, 2020

Upon Further Review ...

The patient is dying.

The patient in question is the Institute for Health Metrics and Evaluation (IHME) COVID-19 model, which I discussed at length in my last post. And what is the model dying from?

Starvation. And malnutrition.

You may recall the theme of my last post: Data informs models. And I noted that, as more data becomes available, it is fed into the model, and the model gains predictive value. As with we humans, if you don't feed a model sufficiently, it may eventually starve to death. At a minimum, it will become so malnourished that it can't do its job effectively, just as with a human.

And with the IHME model, that's where we are today.

Before I explain why the model is starving, let me explain how I came to the realization that it is. As I said in the previous post, I analyzed the projections from the first run of the model that was made available for public consumption. That was on April 1. I updated the analysis on April 5, when IHME released the next revision. I updated it again on April 8, when a subsequent revision was released. And in looking at how the data changed, how it moved, from one release to another, just a few days apart, I saw behavior that you'd never see from a "healthy patient" in the modeling world. In other words, I saw symptoms of starvation, and of malnutrition on a major scale.

You've probably seen in the news how projected total deaths have dropped significantly with each update of the model. The initial count was over 95,000. After being fed a few days of additional data, the count dropped by 14,000, to about 81,000. And just a few days later, it dropped to about 60,000. (The media has tried to mislead you regarding why, but more on that later.)

But it's what happened state-by-state that really revealed just how malnourished the model is. Some states are seeing their projected death totals drop by half or more with each update. Dates for the curves to peak and flatten have also changed markedly. The projected date for the curve to flatten in Virginia was originally July 15; in the latest update, it's June 1. Nearly all states are now projected to peak and flatten earlier than in the original release, and nearly all within the April-May timeframe. However, a few states inexplicably have seen their projected deaths and time to peak/flatten increase. And there's little rhyme or reason as to why.

Or is there? The states that appear to be displaying the most counter-intuitive movements in the projections tend to be lower in population than the ones whose trends appear logical. Thus the model appears to be suffering from the law of large numbers.

Let's look at my home state of Kansas, which ranks 35th among all 50 states in terms of its population (about 2.9 million people, or about 15% of the population of the New York City Metropolitan Statistical Area, spread out over 82,277 square miles). The original release of the model projected 640 deaths in Kansas by Aug. 4 (the end date for the model's projections, by which time all curves have long since completely flattened). That's about 220 deaths per 1 million (1M) residents. The curve was projected to peak on May 3 and flatten on June 10.

At the time, Kansas had experienced 10 deaths as reported, so the model was projecting another 630 reported deaths by the time the curve was projected to flatten on June 10. The first death in Kansas was recorded on March 12, so by the first model release, the state had experienced an average of 0.5 deaths per day. The model projected that average to increase to about 10 per day - an increase of nearly 20-fold - through June 10. (Through April 9, the highest number of daily deaths in Kansas has been 5, so that average would be pretty hard to attain.)

The next update projected 265 deaths in Kansas. That's a reduction of 375 projected deaths, or almost 60%, from the original projection. The new date for the curve to peak was April 25 - 8 days earlier than originally projected - and the new date for the curve to flatten was May 23 - 18 days earlier than originally projected. (Some states saw those dates come in by more than a month, like Virginia.) So now, the model was projecting an average of about 5 deaths per day - half the original projection - in a state that to that point had still seen an average of less than one per day.

Those dramatic changes were the result of the model being fed four more days of data. During those four days, Kansas experienced just 264 new reported cases, and 12 new reported deaths. That's a big change in the projections produced by a very small number of data points.

Are you beginning to feel the model's hunger pangs?

In the next release, the model forecast an increase in total deaths in Kansas, to 299 from 265, an increase of about 13%. And it projected that the curves would peak and flatten one day later than in the previous update. Yet Kansas was still averaging just 1.26 deaths per day. This time, the change was driven by 12 new deaths and less than 300 new cases. Again, not a lot of data to produce reliable results. Those projections appear to have been skewed by the fact that, on the last day of data that was fed into the model at that point, Kansas saw a peak in daily cases at 123 (.004% of the state's population) and 5 deaths.

Wyoming is also interesting: There have been zero deaths to date in the state, out of 230 cases at this writing (about the same number of cases per 1M as Kansas or Iowa), but the model projected 67 deaths between the April 8 update and the May 22 flattening of the curve. On what basis? As Wyoming's governor said, "We've been social distancing for the entire 130 years we've been a state."

**Note: the model has been updated again as of April 10. The latest projections are as nonsensical as the earlier ones. I'm not even going to bother updating my analysis anymore. Suffice it to say that projected deaths in Kansas are now back up to 426, an increase of 127, or 42%. Why? Because there were 8 deaths on April 9.**

More broadly, here's the first thing that's wrong with the IHME model, and any other model making projections about COVID-19 cases or deaths: the data is simply insufficient in quantity to produce statistically significant results.

Let's put it into perspective. To date, there have been about 490,000 cases of the virus in the U.S., and about 18,000 deaths. That's an incredibly small number relative to the U.S. population. In the last flu season, the CDC reported more than 35 million cases, nearly 500,000 hospitalizations, and over 34,000 deaths. So last flu season as many people were hospitalized from the flu than have been reported to have COVID-19 (and remember, 96% of cases are mild), and about twice as many people died of the flu than have been reported to have died from COVID. The flu data is a lot richer dataset to feed into a model.

The average number of auto accidents in the U.S. each year is about six million. Average injuries are about three million, and average deaths are about 33,000. Again, a far richer dataset that would produce a more robust model.

Remember my discussion of mortgage prepayment models in the last post? In a year when rates are falling and more people are refinancing their mortgages, millions of mortgage loans will prepay, resulting in billions of dollars of prepayments. In a year. And during the Great Recession, billions of dollars in mortgages defaulted, resulting in full prepayment. So the mortgage prepayment models are far more robust than a model predicting auto accidents or influenza would be. And those models are twice as robust as any of the COVID models at this point in time.

But the problem with the IHME and other COVID models runs far, far deeper. It is not an insufficient amount of data alone that is producing such inaccurate results and wild swings in results based on tiny amounts of additional data.

The quality of the data is horrible.

When a mortgage loan prepays, we know for certain that it has prepaid. And we know exactly why, whether it is from refinancing, default, death, or any of the other factors that drive prepayments.

When there's an auto accident, we know it (unless it isn't reported and nobody gets hurt or dies). We know exactly how many people are injured in auto accidents (again, unless the injury is very minor and it's not reported), and we know how many people die from auto accidents. (We also don't state cause of death as "auto accident" if the decedent had a fatal heart attack that then resulted in the car veering off the road.)

We know less about the flu, because some people probably get mild cases and don't go to the doctor, and thus are not tested or diagnosed. Doctors can't report data they don't have. But there are a lot more data points, so the data and the modeling are more reliable.

It's even worse with COVID. What are the data points needed to feed the model? Number of cases, number of deaths, and number of recoveries (daily and total for each).

We have no earthly clue how many cases there have been, or how many active cases there are, for two reasons. One, since 96% of cases are mild, there are probably a huge number of people who have had COVID-19 during this cold and flu season who didn't know it. Their symptoms were mild. They might have thought they had a cold or the flu. I know a number of people who believe they may have had it in December or January, though I am always cautious about self-diagnosis, especially with something like this.

My not at all curmudgeonly wife and I went on a cruise in late January. We flew to Tampa, spent the night in a hotel, went out for dinner, then went to the cruise port to board the ship. We cruised to several Western Caribbean ports: Belize, Cozumel, Costa Maya and Roatan. We got off the ship at each port. We ate lunch in Costa Maya and Belize, and had a day pass to a resort on Cozumel, where there were numerous other vacationers from all over the world. We shopped. We touched things. We washed our hands, as we always do, and used hand sanitizer when entering the ship's dining room. Of course, we saw the usual random foul louts who walked out of a bathroom stall without washing their hands, didn't use tongs in the buffet line, etc.

We disembarked and flew home from Tampa on Feb. 1. I flew to San Francisco for business on Feb. 4. The next day, I developed a dry cough. I may have had a fever; I didn't check it until after I returned home. On Feb. 10 I went to the doctor, and the PA I saw said that it looked like I had "this virus we've been seeing going around." She asked about all the symptoms that are now associated with COVID-19, including shortness of breath. I didn't have all of them, and I also tested positive for Influenza A (despite getting the vaccine last October). On that basis, I do not believe that I had COVID. But I may have.

So there are people who likely had it that we don't know about, but even if they self-reported, a significant number of them would probably be wrong.

The other reason we don't know the number of cases or active cases is that reported cases may be overstated due to cases being reported on the basis of a diagnosis of symptoms, without a positive test. Since there aren't enough tests yet for every suspected case, health care providers are reserving the tests for those with the most severe symptoms. (You may recall my story from the last post about my friend who was diagnosed based on symptoms, and sent home to self-quarantine.) Some of those diagnosed but untested cases may have been flu, a cold or some other virus.

So some cases are probably being over-reported, and vast numbers are under-reported. We won't really know the number of cases until every man, woman and child in this country has had the antibody test to see whether they've ever had the virus. That won't happen this year. So if and when there is a second season, we won't know if the people who've had it contracted it this year or next. We will never have reliable case-count data for this season.

We do know that a group of researchers from MIT have been testing sewage from 10 U.S. cities for traces of the virus. And the amounts they've found suggest far more cases than have been reported. Far more, as in a multiple of about 258 times. That could mean that more than a third of the U.S. population has had the virus, which would further indicate a recovery rate of more than 99.99%, making the mortality rate a fraction of what's been estimated.

What about deaths? Well, Dr. Birx admitted that they are erring on the side of citing COVID-19 as primary cause of death (PCOD), even when there are one or more co-morbidities, and regardless of the patient's age. We know that 83% of coronavirus deaths in Italy are among patients over the age of 70, and the majority of them had three co-morbidity factors. Three.

If you're over 70 and have three co-morbidity factors already, your chances of surviving anything - COVID-19, the flu, or a common cold - are dicey, I would think.

So if somebody has heart disease, and they die of a heart attack, but test positive for COVID-19, PCOD is COVID-19. Heart disease may or may not be listed as an underlying cause of death (UCOD).

That first death in Kansas on March 12? It was a gentleman in his 70s living in a long-term care facility who had a "heart condition."

The COVID diagnosis on that patient was made post-mortem. The patient had already died from the heart condition, then COVID was diagnosed and listed as PCOD.

The deaths reported by generally reliable sites such as worldometers.info, and reported in the news media and in public health briefings, are far higher than the number of cases reported on the CDC's own website - about four times as high. According to the CDC site, the reason is that the more widely reported death count includes cases that are "presumptive positives," meaning that a lab has recorded a positive test but the CDC has not confirmed it by submitting the completed death certificate to the National Center for Health Statistics (NCHS) and processing it for reporting purposes. In other words, there's a lag due to government bureaucracy.

The CDC's statistics by age clearly show that the risk of death is infinitesimally small for anyone under the age of 55 - and I am rather generously defining "infinitesimally" as less than .001% of the population of Americans under the age of 55. If you're 55-65, it's still only .0012%. Even for those over the age of 84, the death rate is .0176%. Those numbers will, of course, increase, as there will be more deaths. But even if we account for the fact that the more widely reported deaths are four times what the CDC reports as "confirmed," the death rate for those over the age of 85 is less than .07%. The overall mortality rate for people in that age group, from all causes, is over 38%. Suffice it to say that, if I'm lucky enough to celebrate my 85th birthday, I won't be buying unripe bananas.

Here's the link to the CDC site I'm referring to, for your own perusal. Note in particular the footnotes, the table broken down by age, and pay special attention to the Technical Notes at the bottom related to cause of death reporting: https://www.cdc.gov/nchs/nvss/vsrr/COVID19/index.htm.

Now, back to the model. What if we hit the 60,000 or so deaths it currently projects? That would put the death rate for those 85 and older at .26%, if the distribution by age holds statistically. If we hit the 95,000 cases the model originally projected? It would be .41%. To even get to a 1% rate, there would have to be more than 231,000 total deaths in the U.S., a 14-fold increase.

The final data point we need for accurate modeling is recoveries. And the reporting there is laughable. I've written previously about the lag in confirming a recovery (at least 14 days) and confirming a death (as little as two days, typically no more than five). That's not the issue.

Recoveries are, by and large, simply not being reported. Originally they were reported by state for the U.S., but on the worldometers.info site, but after several days, they took that column out of their table altogether. (I don't attribute this to some conspiracy theory. I'll explain below.)

Recoveries are, however, reported at the country level. In most countries, the ratio of recoveries to deaths is steadily climbing. In Italy and Spain, it was up from April 8 to April 9 by 0.1 (to 1.6 and 3.4, respectively). In Germany, it was up from 16.5 to 18.9. The most reliable "mature" data is probably from South Korea, where the outbreak hit early and now appears to be largely contained. There, the ratio of recoveries to deaths is 34.2. In the U.S., the ratio isn't advancing at all. The UK updates cases and deaths daily, but hasn't updated recoveries for two weeks.

This isn't because there aren't recoveries in the U.S. the UK and the states. It's because they aren't being reported.

Why aren't recoveries being reported? Again, I don't see a conspiracy here. I see a lack of data, and a lack of reliability of any data there is. Since we have no idea how many cases are out there, any reported recoveries will be vastly understated. All of those people who likely had it in December and January and didn't know it have recovered, but they'll never be counted until we've all had that antibody test.

Let's recap: we have no idea how many cases there have been, or how many active cases there are now. We have no idea how many deaths are directly and solely attributable to COVID-19. And we have no idea how many people have recovered. If we ever have reliable case and death data, recoveries are easy: cases minus deaths. That won't happen soon.

So the model is starving from a lack of sufficient data, and it's malnourished from poor quality data. Imagine trying to survive on one moldy Ho-Ho a day, and you understand the model's extreme weakness.

A final point about the model: many media outlets and people in general are pointing to the sharp decline in projected deaths as evidence that social distancing is working. To be polite, those people are being stupid. They believe what they hear and read, without going to the source. They haven't looked at the model's website. If they had, they would see, at the top of the page, in large letters, these words:

COVID-19 projections assuming full social distancing through May 2020


Assuming full social distancing through May. No re-opening of businesses and churches. No eating out. Limits on the number of shoppers in a store at any time. Grocery store aisles marked one-way. One customer per cart. "Non-essential" items like electronics being removed from store shelves. People ticketed for being too close to each other. For another seven weeks.

(For what it's worth, I don't see that happening. Nor, if it doesn't happen, do I see the death toll rising to the horror movie levels that the models were projecting "if we do nothing." I expect some reasonable happy medium that carries no more risk than the risk of the flu. We'll see.)

So it's clear that the decline in the model's projected deaths is not at all related to some notion that social distancing is "working." (I'm sure it is - if we followed these guidelines every flu season, we'd hardly see any flu cases or deaths either. But a hell of a lot of Americans aren't working, which also bears a cost.) In any event, the decline in projected deaths is solely attributable to new data, which is still insufficient and not reliable enough to produce realistic projections. It has nothing to do with social distancing's effectiveness. So stop kidding yourselves regarding that myth.

**Note: in the April 10 model update, projected total U.S. deaths went up by about 1,000, further debunking the myth that the projections are influenced by the "success" of social distancing.**

The upshot of all of this is that any decisions made on the basis of these models are foolhardy. Much of the American economy has been destroyed, hopefully not permanently. The stock market has been moving higher this week. Other countries are beginning to ease mitigation measures, including allowing some "non-essential" businesses to re-open, with some social distancing requirements. I've even seen a couple of local restaurant locations that initially closed, re-open for curbside and delivery service. As Red said near the end of The Shawshank Redemption, "I hope."

However, here are some real numbers for you: 30,000 U.S. restaurants have closed permanently. That number is projected to hit 110,000 by the end of May if this continues. That's about 10% of all U.S. restaurants - a much higher "mortality rate" than COVID-19's. U.S. restaurants employed more than 15 million people before this shutdown. And 70% of U.S. restaurants are single-unit operations. I won't even mention the downstream effects on farmers. Or the similar decimation of the hotel, leisure and airline industries. And the government can't spend our way out of this, because it's our money that is ultimately being spent.

The fact is that ALL businesses are essential to the people who own them and work for them.

A couple of final points. The panic that has, in part, led us to where we are today was fueled in no small part by the news media's reprehensibly sensationalist and inaccurate reporting. Sadly, they will never be held to account. They never are.

But another group used a different medium - social media - to foment panic. They posted wildly inaccurate articles based on bad math. When called on it, they tried to sanctimoniously scold those who tried to be a voice of calm reason. That group can, and should, be held to account, by their friends. They and the news media should apologize to everyone who lost a job, experienced investment losses, and had to go to several different stores just to find items that were heretofore commonly available. The overreaction was born of panic that led local governments - cities, counties, and states, not the federal government - to issue mandatory shut-down orders.

I'm all for saving lives, but there may have been better ways to go about this. There will be plenty of time for recriminations, Monday-morning quarterbacking, and situation analysis later. But perhaps the concept behind a familiar phrase should be broadened, and applied to local government, the news media, and the panic-mongers:

First, do no harm.

No comments: