Multiple Chronic Conditions in Research for Emerging Investigators

Introduction to Risk Prediction Models

AGS/AGING LEARNING Collaborative Season 1 Episode 17

Join Karen Bandeen-Roche, PhD, Johns Hopkins Bloomberg School of Public Health, and Terrence E. Murphy, PhD, MS, Pennsylvania State University College of Medicine, as they discuss predictive risk modeling in the context of multimorbidity. They discuss the importance of model selection depending on the situation. They also discuss internal and external validation, statistical modeling and outcome events.

To view a transcript click here then select the transcript tab.

Karen Bandeen-Roche, PhD: Well, hello, I'm Karen Bandeen-Roach. I am a professor of biostatistics at Johns Hopkins, but also have been deeply engaged in research on aging for more than 30 years. I'm here with Professor Terry Murphy, who is a professor of biostatistics and bioinformatics at the Penn State College of Medicine, where he's been since 2022.

Today we'll be discussing the key points from the Predictive Risk Modeling segment of the Useful Analytic Approaches module of the AGS/ AGING LEARNING Collaborative Curriculum. And it really is a pleasure and a privilege to be here with Terry. Terry has been also engaged deeply in aging for many years from 2005 to 2022 he was a member of the biostatistics staff at the Yale Program on Aging, and in addition to teaching and serving [01:00] as an academic advisor of MPH students, he collaborates on a wide range of research projects through the CTSI of the Penn State College of Medicine. I certainly can attest that Terry is one of the founders and, you know, real proponents of gerontological statistics, which has spread throughout the country, but is affiliated with Yale, particularly, you know, while he was still there.

So Terry, welcome. It's great to have you today. 

Terrence E. Murphy, PhD, MS: Thank you so much for your kind introduction, Karen.

Karen Bandeen-Roche, PhD: You know, you created this wonderful module on predictive risk modeling in the context of multimorbidity. And so I wonder if you could begin by just, you know, giving your own high level overview, you know, to the statistical task that you were talking about in that module, you know, as well as the implications for the study of multimorbidity.

Terrence E. Murphy, PhD, MS: Sure, I [02:00] think this particular topic isn't quite as nice a fit as say I recently had the privilege of interviewing you about methods of clustering of chronic conditions. This is a little less directly applicable. Generally, risk prediction models are fairly simple. They calculate the probability of a binary event, the zero or one death or hospitalization or readmission within a specific time frame.

And most often we're interested in the associations of either specific chronic conditions, or sometimes a count of chronic conditions, and the very famous weighted counts are the Charlson or the Elixhauser. Even as I have [03:00] found modeling a number of outcomes that sometimes just the raw count, especially among older folks with high numbers of comorbidities tend to demonstrate fairly nice linearity, you know, but anyway, these risk prediction models are trying to stratify risk so that providers and patients can jointly decide, oh, am I at high risk, am I at low risk, and if this worries me, am I motivated enough to do something about it, and what might I do? I thought this was a useful thing to present because risk prediction models have become very fashionable over the last 10 years. I'm, I'm being asked to review risk prediction models on a weekly basis, and so we're covering here just some of the fundamentals of what they're [04:00] about and the most important building blocks. 

Karen Bandeen-Roche, PhD: Well, thank you so much. And I actually think you've been humble in the first part of your statement that what incredible potential impact and relevance to either have algorithms by which to identify people at large risk for incident multimorbidity or the implications of multimorbidity for subsequent outcomes.

And so, thank you for this very important module. As then getting further down into the specifics that you so usefully discussed, many people are familiar with regression. And so, you know, one might wonder, well, why don't I just fit a a regression model and, you know, just use the whatever comes out of that model as a prediction either for multimorbidity or, or the subsequent outcome?

But your module had some very lovely insights about the importance of model [05:00] selection. And maybe you could tell our listeners a little bit about what to keep most highly in mind about that as they approach prediction in the multimorbidity context. 

Terrence E. Murphy, PhD, MS: Yes, model selection, multivariable model selection.

So we have an outcome and we generally have many plausible candidate predictors, and often many of these are indeed chronic conditions. So more typically, we. Let's, let's take the condition of you have a bunch of different chronic conditions. How do you decide which subset of a big set of plausible predictors is the best to balance bias? In other words, accuracy. And variance, how much noise is there around it? 

And there's no strong statistical [06:00] consensus. But I think one of the evolving themes has been this idea of two components are of interest. One is There's an increasingly sophisticated appreciation for model uncertainty. I think there was a group of folks at the University of Washington in the 90s with Adrian Raftery and David Madigan and some other folks, and I think they really, with Bayesian model averaging, came up with this idea that the best models are ones that take into account the uncertainty of the model selection process, and Bayesian model averaging is a very nice example of that.

I think a second aspect of model selection is this idea of regularization, where you have many predictors in a model, and how do you winnow it down to the best group? [07:00] There's been a lot of work showing that our traditional practices of stepwise, whether forward or backwards they don't tend to replicate very well.

They tend to pick different models in different situations. So, the LASSO approach and lasso, just like a cowboy lasso, it's spelled, but it means the least absolute shrinkage selection operator. That's become increasingly popular because it considers even a very large number of plausible predictors, all at once, without forming nested models.

Uh, these automated selection procedures start either with one or they start with all and then they work backwards or forwards. Exclusively on nested models, meaning that once something is included or thrown out, it sticks. And LASSO just takes [08:00] everybody all at once and through a nice penalty term, it shrinks the coefficients of each term and at the same time effectively selects by shrinking many of those coefficients to zero.

And so, these two aspects of model selection, both the uncertainty and the need for regularization, I think these we're really getting a much sharper appreciation for how important these are. 

Karen Bandeen-Roche, PhD: Indeed, those were, you know, two very cutting-edge aspects of your module, you know, well beyond, I think, what most practitioners in the medical literature apply even yet, but so important.

And so have you found that your collaborators are open to considering these methodologies and implementing them and their prediction efforts. 

Terrence E. Murphy, PhD, MS: It's a great [09:00] question. I think our clinical colleagues are on the line and I think the clinical publishing industry is quite conservative, generally. I remember when I first started, I was using a Bayesian spatial model and I remember being advised by someone to leave out the Bayesian word.

You knowand of course, in the last 20 years, Bayesian techniques have really come of their own. Right. But I think something like Bayesian model averaging, I think it's very popular among a lot of researchers, but almost unheard of by our clinical colleagues at the end of the day. I think if we can get it published, especially in a good journal, then that tends to do more than anything else to bring our clinical colleagues along.

So I've been fortunate to get a few things published both with the Bayesian model averaging and LASSO, and once our colleagues see that that works [10:00] and it's accepted, then they seem to be fine. But there are always jitters. That's why I think so many of our techniques go on uninterrupted, even when there are clearly better alternatives, just because of the conservatism of the industry.

Karen Bandeen-Roche, PhD: Well, I'm so glad that you, you know, raised these innovative techniques for those who will use the AGS modules. They have such an opportunity to make the resulting research truly cutting edge and increase reproducibility. And sodefinitely join you in encouraging people to try them out and to work with statisticiansto implement these cause they can be well worth it. 

You know, in terms of the other really major challenge of prediction that goes beyond just plain old regression, it really has to do with model validation as being particularly important in the area of prediction.

And you talked very nicely about both [11:00] internal validation and external validation. What sorts of things can people expect to see along those lines as, as they study your module with respect to validation? 

Terrence E. Murphy, PhD, MS: That's a great question. I think we all have, I think, some intuition about optimism, which means you take a certain set of data, you fit a model to it and something like maximum likelihood, right? It's, it's an optimization scheme. It's picking the parameters that optimize the manifestation of this particular data. So internal validation is a must, and I think it's the most critically important.

And, and what it is. There are several flavors. The simplest is a split sample, much like in machine learning where you have a training data set. And then you have a validation data set, and sometimes you have a third data set. [12:00] That's kind of the simplest and the least sophisticated approach. 

Much more interesting is cross validation, where you select out some number, some proportion of the entire data set. So you, you take the entire data set and you think, How can I extract the best estimates that will hold up the best in other external samples that theoretically come from the same underlying population? An internal validation, whether it's through a cross validation, process, which selectively leaves out a proportion of the sample, fits the model, and then predicts the part that was left out and does that iteratively until it comes up with the optimal set of coefficients. That's probably one of the best approaches.

A second, very strong approach is bootstrapping where [13:00] you take a model and you fit the model to the original sample and then you replicate the model selection process in bootstrap samples, and you take the coefficients that were fit, newly fit to the bootstrap samples, and then you use those to see how well they predict the original data set, and you do this repetitively.

Frank Carroll has advocated this forever, and it's so nicely described in his book. And this is a way of ensuring that you're doing everything you possibly can with your original data to come up with a set of coefficients that are much more likely to generalize and work well in separate data sets.

So cross validation and bootstrapping are probably among the best. 

Now, [14:00] external validation, wonderful, but it comes with its own set of headaches. Often it's hard to ensure that a, say, a population in a different country really does come from the same underlying population. I had an example of this where we fit a model of readmission to hospital after acute myocardial infarction in the States. And then we tried doing something similar in Canada, and it was amazingly different. Our model from the states did not hold up well at all. And we realized that so many things like social determinants of health, public health insurance even racial disparities. Were so different in how they were associated with readmission.

So the external is in some ways, this [15:00] very high level of rigor, and if you can achieve it, it's that much more evidence, strong evidence that you really are getting to the truth of the matter, but it comes with with so many challenges. It's often not easy or practical to do. So that's why internal validation is critically important.

Karen Bandeen-Roche, PhD: I couldn't agree more in grant review. I'm I'm often astonished at how often internal validation is ignored. And, um. You know, it's certainly a recipe, if nothing else, for a grant being lowered in its evaluation of merit, but, but so important for the reproducibility of results.

I'm so glad you brought up that latter example about the differences, even between Canada, the US, which we would believe would be such similar contextsand the importance of considering contextual factorsin prediction, [16:00] you know, and, and, and that, Predictions may not simply be portable between different contexts.

Such an important insight. 

Terrence E. Murphy, PhD, MS: Yes, a related note there is the nature of the outcome. There are some outcomes such as death, which almost everyone will agree. The heart stopsthe brain activity stops, you're dead, right? We all agree on that around the world. Those very hard outcomes tend to have much better model performance.

Those that have much more subjective elements and policy and social elements such as readmission, they do not perform nearly as well and much harder to reproduce. So there's all of these human elements and subjective elements. That cannot be well captured by a small set of measured predictors. 

Karen Bandeen-Roche, PhD: Such important insights, and it leads a little bit, you know, [17:00] into the very nice section of your module in which you developed some detailed examples to exemplify what you had developed in the earlier part of the module.

And, you know, I'm wondering if there's any... Insights from either one or or both of of the examples of medically serious falls or six month mortality that you'd really like to highlight that those who listen should particularly notice. 

Terrence E. Murphy, PhD, MS: A wonderful question. So there's increasing interest in machine learning. For one of these models we published it and I remember a high school student wrote in and said, I used machine learning at home and I was able to get a C statistic of 90%. Why are you guys publishing a model of readmission, which only has a C statistic of 68 percent andit's [18:00] really not that easy to answer, except that Frank Harrell, again, on his blog has this very nice discussion of the relative merits of machine learning and statistical modeling.

And to me, one of the take home points is statistical modeling is often better when there's interest in the specific associations between predictors and the outcome. And machine learning may be better in cases where you don't really care, you just want it to operate as a black box and do the best possible.

In our model of serious fall injury among middle aged veterans, we had this wonderfully rich veterans data set, and we used a statistical model for prediction, but for fall related injury, which traditionally relies on e codes [19:00] reported at hospitals, we developed a machine learning algorithm, a text based support vector machine. We unleashed it on the radiology reports, and it essentially tripled the number of outcome events that we were able to include in our analysis. So, to me, what's striking about that particular example is we used machine learning for what it's often best at, which is image recognition or, in this case, recognizing and pulling out certain specific text, things that most providers would reasonably attribute to a fall. 

And so in this sense, we were able to greatly augment our harvest of outcome events, which gave us much greater power. And then we used [20:00] multivariable logistic regression based on the Bayesian information criterion. And so what I what I love about that example is I tried, we tried to use the stats modeling part for what it does best because the providers really are interested in knowing, Oh, so opiate usage in the previous six months really is associated with a higher risk of falling. Makes sense. 

And at the same time, we use the machine learning to triple the amount of our outcomes. And in this sense, I, I thought we were using both approaches according to their best abilities and, and machine learning, of course, has huge potential to help us in so many areas, but as practicing methodologists, we need to be very mindful. This is what it does best, and this is what statistical modeling does best and use them appropriately. [21:00] 

Karen Bandeen-Roche, PhD: That's such an important insight. And, you know, I, I appreciate so much that you have just - and your module highlights for those who will study it - that, you know, prediction sometimes is just to identify people, but other times it is meant to then lead to doing somethingintervening.

And in those cases, the interpretability is, is important. And, um. So, absolutely, that's, that's extremely valuable. You know, I think that brings us almost to the, the end of talking about your module. Were there any concluding thoughts that you wanted to highlight as, as people now go off to study and then hopefully implement the valuable things that you've described?

Terrence E. Murphy, PhD, MS: So, risk prediction modeling, like most of research, is a team sport. We must have good statistical measures, and here we've emphasized internal validation as being [22:00] key, and model selection that properly considers a wide array of potential predictors, as well as the uncertainty of the process, but reconciling everything the stats does with our content experts, and trying to come up with something useful and practical to clinicians that they may use to help improve the quality of life of folks as we age and help us stay independent as long as possible.

As you know, Karen, you and I have both worked for Pepper Centers for many years. I think that's just a wonderful ideal or goal. Let's, let's all stay as independent and healthy as for as long as we can. And, and hopefully the use of statistics and good content knowledge can, can help us get there.

Karen Bandeen-Roche, PhD: Hear, [23:00] hear. 

Well, thank you so much, Terry. It's been such a pleasure to share this time with you, and I so greatly appreciate all your many contributions to the health of older adults, you know, through the insightful use of statistics that you bring to studies. Thank you so much. 

Terrence E. Murphy, PhD, MS: Thank you, Karen.