Friday, April 3, 2020

They're Trying to Scare You. This Time, Let Them.

Okay, calm down - I'm not talking about the media, and I'm not talking about economic or financial data. I'll explain a little later in this post who I'm talking about, how they're trying to scare you, and why you should let them.

But first, let's introduce the topic of this post: models.

Not runway or swimsuit models. (I know, you're disappointed.) But mathematical models. Don't worry, I'm not going to get technical. And - long post alert. So if you know how mathematical models work, or don't care, scroll down to the dashed line and read from there.

Now first, a caveat: I am not an epidemiologist, or an infectious diseases specialist. I'm not a doctor, and I don't play one on TV.

However, I do know mathematical models, and I know data. I look at numbers in a spreadsheet and see them graphically. Data is my drug. I've never built a pandemic model, but I've built econometric models, and I've built stochastic mortgage prepayment models. So I know a bit about modeling data. (This explains why I'm not popular.)

Here's a truth: all models are wrong. Let's consider a model that predicts the price of a bond. Not to bore you, but such a model will be based on the projected interest and principal cash flows of that bond, discounted at current and projected interest rates. The theoretical, or modeled, price of the bond is the sum of the discounted principal and interest cash flows. Don't worry if you don't understand that; it's not important for this discussion.

When clients used to ask me about the modeled price of a bond, I would reply, "No model ever bought a bond." The real market value of a bond is what the next buyer will pay you for it. Thus the modeled price isn't the market price. So what value do models bring, and how are they built?

The value proposition of models is that they may give us some idea of what is likely to come. The key words are italicized. How well do they do that?

The first thing to consider is that data informs models. Let me say that again: data informs models. By the same token, a model is only as good as the data that informs it.

Let's look at mortgage prepayment models as an example, because I know them intimately (don't worry, I'm not going to bore you with a lot of math). Bear with me through this: it will help you understand how the COVID models work.

Mortgage prepayment models forecast how likely a group of mortgage loans is to pay off early, based on the mortgage interest rate, prevailing interest rates, how long the mortgage has been outstanding, geography, and other factors. They do this using historical data that captures those variables. The data informs the model.

This is important to mortgage lenders and investors in pools of mortgages in forecasting cash flows. If rates fall, more people will refinance (i.e., prepay the entire mortgage), and I'll get my money back sooner than expected - then I'll have to re-lend or re-invest it at now-lower rates.

Back in 1990, we thought the mortgage prepayment models were pretty darn reliable. Then, in 1993 the Fed cut interest rates to what was then an all-time low, and in the following year, they raised rates by about 3%, effectively doubling them. We called that a "whipsaw." As a result, a lot of people refinanced their mortgages at record-low rates in 1993, so prepayments exceeded what the models - informed by historical data - projected. (Prepayments happen for a number of reasons, but the biggest driver is refinancing when rates have fallen.)

Then, when rates rose sharply the next year, people prepaid their mortgages more slowly than the models - again informed by historical data - projected. So cash flows slowed as people just paid the minimum monthly mortgage payment, instead of further reducing principal. Why should I pay early on a 5% (at that time) mortgage, when the prevailing rate is 8%, especially when I get to write off my mortgage interest on my taxes, so that my after-tax mortgage rate is even less? Now, lenders and investors had to wait longer to receive the amount of prepaid principal they expected, when they'd rather have had it right away to reinvest at higher rates.

Thus the prepayment models totally missed the mark in the 1993-94 whipsaw. The data from previous periods that had been incorporated into the models to inform them didn't capture how readily people would refinance a mortgage if rates fell by a certain amount. That's because people used to want more of an incentive to refinance than they want now. And that's because the process of getting a mortgage is more streamlined today, and the fees are lower. Thus mortgages that we thought wouldn't prepay in 1993 did refinance, and mortgages that we thought would prepay in 1994 didn't. Investors took a beating by betting wrong based on the models, which were based on historical data.

The silver lining is that after that whipsaw, all of the actual prepayment data from 1993-94 was fed into the models, making them more robust. Data informs models. With more data capturing more unique circumstances, including changes in borrower behavior, the model is made more robust.

So, in 2005, we thought the mortgage prepayment models were really good. But we'd never seen subprime mortgage loans before. Housing bubbles had always been local; by 2007 they were widespread. Credit ratings on packaged mortgage loan pools were being gamed.

And the housing market came crashing down. Mortgage defaults reached unprecedented levels. Borrower behavior was different, too. My Dad, a Depression kid and a WWII vet, taught me that the last thing you ever miss a payment on is your mortgage, because that's your family's home. By 2008, the family home was seen as an investment, and if the value of that investment had fallen to less than the mortgage balance, you initiated a "strategic default." In other words, you walked away from the property and defaulted, even if that meant you wrecked your credit in the process. Unwise, yes. But it became common.

Those shifts destroyed the efficacy of the prepayment models. Drastic changes in borrower behavior, interest rates, mortgage structures, default protections, and other factors resulted in the actual prepayment experience being vastly different than what the models projected. A lot of people just walked away from their homes, as described above. A lot of other people had to default because they lost their jobs. (A default and subsequent foreclosure counts as a prepayment, because the loan goes away at that point - due to charge-off by the lender, not payoff by the borrower).

Once again, the models, informed by historical data, did not capture how high prepayments would go on a mortgage with a given interest rate. But as before, after the carnage, all of that data was plugged into the prepayment models to inform them, and thus today they are more robust than ever before (though they're still not prepared for the next thing we haven't seen yet). We have more data, covering a far wider range of scenarios. Data informs models.

-----------------------------------------------------------------------------------------------------------

Okay, so what does all of this have to do with the current situation? Well, you've probably heard Drs. Fauci and Birx refer to a coronavirus model that predicts when the curve will peak and flatten, and how many deaths are likely. That model was developed by researchers at the University of Washington's Institute for Health Metrics and Evaluation (IHME), which was funded by Bill Gates (thank you, Bill).

You may have also heard that, in a matter of just a few days, total U.S. deaths projected by the model jumped by about 10,000. You may have wondered why the big change - is the virus more deadly than previously thought? More contagious? Are containment measures failing?

None of those are reasons for the change in the model. The reason is that more data was fed into the model, and that changed the model's forecasts. What kind of data does the model incorporate? We'll get to that in a minute.

Since the COVID-19 outbreak got serious, I've been following the actual data (cases, deaths, recoveries) daily. I track it by each state, and by a number of countries (China, S. Korea, Iran, several European countries, and the U.S.). I'm looking at cases and deaths per capita, because denominators matter. I'm looking at maturity of the outbreak in those locations (time since first death, because time since first reported case isn't always available). I'm looking at the ratio of recoveries to deaths, which is encouraging. It improves with maturity. The problem is that it's clear from the data that in some locations (including the U.S. and the U.K.) recoveries are under-reported, and they aren't reported at all for the individual states.

Sidebar: lest you think this is morbid, please know that, as with the unemployment numbers, I recognize that each data point is a human life, someone who is sick, or a loved one lost. In fact, that's why I'm writing this post. Read on.

Also, since the IHME model data was made public, I have been tracking that as well, and comparing it to the actual data. I update it every few days, because it changes. As noted above, total deaths jumped from the first time the model's projection of them was mentioned to just a few days later. Since then, they are down by a few hundred. I'm tracking the model's projections for total deaths, when the curve peaks, when it flattens (no more new deaths, or several days with just one per day), per capita data (you know why), days to curve peak, days to curve flattening, and actual to projected deaths. I'm looking at this for the U.S. and state by state. And for the days to curve peak or flattening, I'm looking at averages, minimums, and maximums.

Let's first get to the question of what kind of data the model uses, then we'll look at some examples from the model. It doesn't use some scientific chemical formula related to a possible treatment or vaccine. It just uses math - complex math, but math. The variables it considers are number of cases to date, number of deaths to date, whether there are mandated containment measures, and how stringent they are. (In states that have strong stay-at-home orders, the curve is expected to peak and flatten sooner. That's what Drs. Fauci and Birx have been telling us.) At the state level it is also influenced by the population of the individual state. Finally, the model appears to be considering the assumption that this virus is seasonal.

Some quick examples. For the U.S., using April 1 model data and an April 2 date, projected days to the curve peaking are 13 - therefore the projected peak for the country as a whole is April 15. The minimum is 8 days, in NY and NJ, where the outbreak and containment measures started early. The maximum is 56 days, in MO, where the governor has yet to issue a stay-at-home order. In my home state of KS, the curve peaks in 30 days. And the curve flattens for the U.S. as a whole in 60 days (June 1). The earliest state to flatten is Delaware, in 27 days. The latest are MO and VA, 104 days (this may be why the VA governor recently issued a stay-at-home order that doesn't expire until June 10). In KS, it's 64 days (June 5).

So, how and why have the model's projections been changing? I first started analyzing the model's projections on Monday, March 30. I updated the data on Wednesday, April 1. In those two days, the projected total U.S. deaths jumped by about 10,000. In some states they doubled. In others, they fell by half. Also, in some states, the time to the curve peaking or flattening extended by as much as two weeks. In other states, the time shortened by as much as two weeks.

Why the big changes? Data. Data informs models. The model was initially way off, because the data was sparse. As more data comes in - more cases, more deaths, more mandated containment measures - the model's predictive value will increase. So it will continue to change. My guess is that the time to peak/flattening will remain more or less constant for the U.S. as a whole - about April 15 to peak, and late May to flatten. This is consistent with the season for influenza and other coronaviruses such as the common cold.

However, I expect the projected total deaths will come down, and I don't think the actual data will reach the 93,000 or so currently projected. Before I explain why, I'll repeat my disclaimer: I'm not an epidemiologist, an infectious disease specialist, or a doctor. I'm just a data guy.

Why are we seeing big increases in the number of cases? More testing. There are probably a lot of people who've been sick this winter that had COVID-19 and didn't know it, because their symptoms were mild. Francis Suarez, the mayor of Miami, tested positive, and posted daily videos on his Twitter account describing his symptoms, which were mild. The CDC reports that 95% of cases are mild. In Iceland, which has tested 5% of its total population, 50% of cases had no symptoms at all. I know someone who presented with symptoms, was diagnosed without a test, and sent home to self-quarantine. Maybe he had COVID, maybe he had something else with similar mild symptoms. We're not going to know this until we have a test to see if people have ever had it - which is coming.

As the U.S. is now testing more than 100,000 people every day, we're going to see more cases, so the case total will go up. The IHME model doesn't project total cases, at least not that I've been able to find, but it probably incorporates them. So why are projected deaths going up?

Let's go back to my friend who presented with symptoms. Why did they diagnose him without testing him? Because his symptoms were mild, he's under 60, and in good health. And even though we're testing very large numbers of people daily, there still aren't enough tests to test everyone who has symptoms. The U.S. has tested 1.3 million people, but that's less than half a percent of the population (because people are failing to follow the guidelines, not because the government response is failing). So they're reserving the tests for the most at-risk population - the elderly, those with complicating health issues, and those who present with severe symptoms.

The sad reality is that a larger proportion of those people will die after they're diagnosed, usually due to those complicating health issues, but brought on by the virus. Remember the CDC said that 95% of cases are mild? Of closed cases, 80% have recovered.

So just as more testing leads to more diagnosed cases, which feeds into the model, more testing of critical or at-risk cases will lead to more deaths, which will also be fed into the model. And that's why, based purely on math and data alone, I don't think we'll see 93,000 deaths in the U.S. I hope and pray we don't.

Now, here's the point of this post, beyond helping you understand how the model works, what feeds it, and why it changes daily as more data informs it.

Drs. Birx and Fauci cite the model projections to keep people following the guidelines and social distancing. It's critical. If we stop doing it because we think the model numbers are too high, then we will indeed reach the model numbers - remember, part of the data informing the model is the presence of mandated mitigation strategies.

So even though I think that, based on math and data, the projected numbers may be on the high side, I'm not hanging out with friends and family. I'm still washing my hands thoroughly and frequently. We have a routine for handling groceries and take-out food and we're following it to the letter. I wipe down everything I touch in my car every time I come home from the grocery store. I use a credit card at Target instead of their Red Card or another debit card so I don't have to touch the PIN pad. We all need to do that, to do our part.

Looking to the future, here's my fear. Let's say total deaths in the U.S. come in much, much lower than 90,000. (I'm not saying they will - I think they'll be lower; I can't even guess by how much.) You know how every time there's a hurricane approaching the U.S., and officials issue evacuation orders, and lots of people ignore them, thinking, "They said Hurricane XYZ was gonna be really bad, and it wasn't, so I'm not going to let them scare me into leaving my house"? Then, when it is bad, first responders get overwhelmed having to evacuate those people using boats or helicopters, when the people could have just driven to safety had they heeded the warnings?

You guessed it. If the total casualties from this seasonal round of the coronavirus are far below the modeled projections, the next season - and there will be one, whether it's in the fall or next spring or both - people will take the social distancing guidelines lightly. They'll ignore the hygiene protocols. They'll sneeze into the air, or on their hands and then touch public surfaces like airplane seatbacks and handrails and shopping carts. And when there's a vaccine available, they won't get it because they don't think the risk warrants getting jabbed in the arm.

And they may be okay. But they'll pass the virus to me. To you. To your child, or your grandparent. And the health care system will be overwhelmed, unless the government shuts down the economy again because people are too selfish to do what they need to do to protect other people.

The model isn't bad. It's well-constructed by people who know what they're doing. But any model is only as good as the data, and even though the numbers reported sound tragically staggering, it's not enough data as a percent of the U.S. population for the model to be considered robust yet. Its predictive value improves every day as new data is added, and the projections are going to continue to change.

So the doctors on the task force aren't misrepresenting the model or the projections. They are pointing to those numbers because they want to make sure we continue to comply with the guidelines, to avoid ever reaching those numbers. I don't know whether they believe the numbers will reach the model projections if we all do our part. But they are trying to scare you, and for good reason.

Let them. We need to be scared - scared of what would happen if we drop our guard. If not for ourselves, for our grandparents, our parents, our kids, our nurses and doctors, our police and firefighters, our grocery store clerks, our restaurant take-out workers - all of the people who are on the front line, every day, putting themselves at risk so people's health is cared for, we're safe, we have food. Be scared for them. They're counting on you.

No comments: