Multiple Chronic Conditions in Research for Emerging Investigators

Observed and Latent Cluster Analysis

AGS/AGING LEARNING Collaborative Season 1 Episode 19

Join Terrence E. Murphy, PhD, MS, Pennsylvania State University College of Medicine, and Karen Bandeen-Roche, PhD, Johns Hopkins Bloomberg School of Public Health, as they discuss usefulness of cluster analysis for the study of multimorbidity.

To view a transcript click here then select the transcript tab.

Terrence E. Murphy, PhD, MS: Hello, my name is Terry Murphy, and I'm a professor of biostatistics at the Penn State College of Medicine. I am very fortunate to be here today with Dr. Karen Bandeen-Roche. Who has been a professor of biostats at the Johns Hopkins Bloomberg School of Public Health since 1990, and she has also been the Hurley Dorier Chair of Biostats from 2008 through June of this year.

In her time there, Karen has also served as the co-director of the Epidemiology and Biostatistics of Aging Training Program from 1996 through 2008, at which time she became its director, and I believe continues directing to this day. She also has been leading the Johns Hopkins [01:00] Older Americans Independence Center, which some of us know as the Pepper Center dating back to 2008 and also participates on the leadership team for the National Older Americans Independent Center Network. On a related note, Karen is one of, she's a very preeminent practitioner of an art form called gerontologic biostatistics,  and so I have been happy to collaborate with her in this form for some time.

Now, regarding our module is called Useful Analytic Approaches, and Karen's module is specifically Observed and Latent Cluster Analysis. Now, what we're all about here is to study multimorbidity, and this is most commonly defined as the co-occurrence [02:00] of two or more chronic diseases. 

Karen, in her module, has shown us two ways of evaluating binary indicators of specific chronic diseases. The first type is hierarchical cluster analysis, which corresponds to the observed cluster analysis, and you can think of this as variable focused, and it deals with the research question of which diseases tend to travel together among persons. 

The second approach that she talks about is latent class analysis, which is person focused and hypothesizes about different sub populations and it deals with the question of can a reasonably small number [03:00] of separate sub populations, can they be identified. And she motivates both these approaches with a nice example from the Women's Health and Aging Study. And we'll talk through that a little bit.

The hierarchical cluster analysis that Karen presents builds a tree that connects diseases from those with the closest association, or comparably the least dissimilarity, and then talks about some rules needed to guide decisions about how to connect subsequent branches, and these include single linkage and complete linkage. And she demonstrates this for us nicely and [04:00] discusses some of the differences in these approaches.

And Karen, I like the way that you start with this because it's so concrete. I think all of us can think of indicators of specific diseases as the most simple and intuitive way to think about this. I wanted to ask you if you could share with us, you know, in your, in your own work, maybe think of one of your favorite examples of when you've used a dendogram like this later in your notes here, you show us a more sophisticated example, I think, where you look at a bunch of conditions that seem to cluster into one group that you refer to as cardio-metabolic and a second group that you describe as connective tissue disease cluster.

I haven't worked [05:00] actively with these and I, I wondered if you just share some of your experience or your observation In either this example or another where you felt that this use of the hierarchical cluster yielding a dendogram has really brought you some some notable insight.

Karen Bandeen-Roche, PhD: Well, Terry, first of all, thank you for your very kind comments at the start. You know, it's been a pleasure to be a part of this effort and then for sure to be affiliated with you over the years. 

Actually, one of my very favorite examples is the more complicated one in the lecture, you know, where there were a number of these conditions that were studied just as you, have described just now. And, you know, we went into the work suspecting that diseases would, so to speak, co-travel together in a hypothetically reasonable way thinking of things like shared [06:00] etiology and the way that some conditions occur downstream of others. But we were honestly surprised at how beautifully the dendogram matched, you know, what we would have expected.

You know, with certainly the cardiometabolic diseases fairly compellingly clustering together and then the various connective tissue ones traveling together and that that motivated us and in later work to use these clusters of diseases to study determinants of, of multimorbidity in the women's health and aging study.

And so I, I think that it's a pretty good example of when this sort of analysis can be useful, you know, when there's reason to think that various indicators should co-occur together and to see this as a way of really [07:00] exploring that hypothesis in one's data. 

Terrence E. Murphy, PhD, MS: You know, there's a certain level of judgment and interpretation that has to take place as you're building these, you know, as you're, you're navigating the possibilities of should I use a single linkage? Should I use a complete linkage? 

What's your experience with your clinical colleagues? As you work through this, are they generally receptive or do they find this a little intimidating because of the, you know, it's slightly more abstract. 

Karen Bandeen-Roche, PhD: It's a great question. I, I've been very fortunate here among my collaborators, which I've had for many years that they've been generally very receptive to this type of thing.

And, and indeed. Eager to dive in with their insights as, as they need to, you know, and I hope that perhaps clinicians who are reviewing our, our series will similarly become excited and to [08:00] recognize how important their knowledge and their insights are to ground these analyses where there really are decisions that have to be made.

So I at least personally advise that it's always useful to come to analyses like these cluster analyses with a specific hypothesis in mind, you know, that can help guide, you know, what may be reasonable, what may not be reasonable, but also the, the sorts of choices, hopefully a priori that make the most sense clinically.

I think as we talked through this example, it made sense to everyone involved that single linkage probably was the most sensible one, you know, because you wouldn't want two conditions that clearly have a shared ideology, you know, to be excluded from being seen as co-clustering because it happens to [09:00] co-travel also with something else that is more distantly related, you know, for other purposes, that sort of exclusion might be It makes sense, but for identifying diseases with potentially a shared etiology or other shared factors, it made less sense to us.

So that's a long winded way of saying that, you know, I've, I've found my colleagues to be very receptive and indeed pretty excited. 

Terrence E. Murphy, PhD, MS: One thing I've noticed in working with clinical colleagues over the years is there is kind of a default to want to dichotomize conditions and outcomes, and many of us folks who are methodologists, we kind of cringe at that.

But I think here, you've given a very nice example where the dichotomous expression of presence or absence of a disease can be combined with this kind of sophisticated approach to try to give us some real insight about, as you [10:00] say, groupings of chronic conditions that travel together. 

Karen Bandeen-Roche, PhD: So, so definitely I agree with what you've just said, but I'm glad you brought it up because one shouldn't dichotomize just for the sake of it, or just to be able to use a method. 

And towards the end of the Module I did try to point out that Other sorts of data having to do with multimorbidity Might be amenable to other approaches than were discussed at very much length in this module. One example being measures of disease severity rather than yes/ no measures of disease. Those you could much more imagine plotting against each other and, you know, creating discernible clouds, you know, of measures that tend to go together. And in those cases, you know, either a different hierarchical [11:00] approach with different metrics or even a non hierarchical approach like k means might make sense with those sorts of data. 

Terrence E. Murphy, PhD, MS: Yes, and I note at least when I look at these dendograms that they remind me that they're certainly built around these binary indicators, but they look an awful lot like classification and regression trees, and they also look like random forests. Right? The latter two not based on dichotomous form. So you're, you're starting us here with these nice, simple to interpret dichotomous forms and combining it here. 

Okay. 

The second approach that Karen presents to us is latent cluster analysis. And in latent cluster analysis you're not focusing on diseases and how they travel together. Rather, you're thinking about subpopulations: groupings of persons [12:00] that tend to be similar in terms of the prevalences of chronic conditions within the clusters. And those tend to be different between clusters. And I like to think of latent class analysis, as Karen states in her notes, as a sort of a binary form of, say, factor analysis, where you're trying to find these underlying subpopulations.

I like very much that you present an illustration where you show kind of the mixed model of the clusters and how they are latent variables that in turn drive the probability of persons within the clusters having the specific outcomes of interest, and they are characterized by, by the prevalence of the chronic [13:00] diseases within each.

Karen mentions a few of the software tools. These tend to be a little more sophisticated and they include Mplus which is a special package with lots of bells and whistles in terms of all kinds of structural equation models. Karen points out that latent class analysis can be thought of as one special case of structural equation models. 

But a lot of times that that connection is not drawn. Perrin applies latent class analysis to the same nice example of four diseases from the Women's Health and Aging Study. And she comes out with two clusters and she shows that you can, in latent cluster analysis you usually pick out the number of clusters and it fits that model. And then you look at several possibilities [14:00] and decide which fits the best. 

In this particular example, Karen. Decides that a two cluster model does not fit very well to these, these four binary conditions. But again, she goes on to show a more sophisticated one, where she takes six different, kind of cardio metabolic diseases. And she clusters them into three clusters and examines those and, and provides some, some intuition. 

So again, Karen, coming back to this, you've, you've shown us in the first part of the lecture that we can just look at the diseases and how they travel together. Here, you're showing us that you can identify these underlying subpopulations, which we think of as latent variables that are somehow driving the [15:00] manifest variables that we measure.

I wondered again in your work, a lot of times latent class analysis, in my experience, comes out with less than satisfactory solutions in that there's so much uncertainty around the final clusters, that I think this may be less than satisfying to our clinical colleagues.

On the other hand my favorite paper of yours, Karen, is your 2006 paper in Journal of Gerontology Medical Sciences called Phenotype of Frailty, Characterization in the Women's Health in Aging Studies. And this paper, I want to encourage everyone. Who has the slightest interest in latent class analysis to read this paper. It's about four and a half pages long for a stats epi paper it's been cited more than 1300 times, which means that this is rock star material [16:00] because, you know, stats papers, nobody knows who we are, you know. 

But in here, Karen applies latent class analysis to the five criteria of the Fried frailty phenotype, she does this in the Women's Health and Aging Study, and it's really just a lovely illustration of that which is measured, and that which is latent, which is the frailty. And she shows that both the two class solution (meaning frail and non frail), and the three class solution (non-frail, intermediate, and frail), both work quite well. 

So Karen, do you have any other examples that even come close to your success of latent class analysis as you found working with the Fried criteria? 

Karen Bandeen-Roche, PhD: Yeah, so again, thank you so much, Terry, for all the kind comments.

You know, I I [17:00] think that it exemplifies the, my recommendation for best uses of latent class analysis, which is not so much to discover underlying subpopulations, but to come to a problem with a theory of you know, what are the underlying subpopulations that one hypothesizes? And if one's science or hypotheses are correct then how should the data manifest? And then use latent class analysis as a means of testing that hypothesis.

And so you referred to in the frailty paper that the specific hypothesis there was manifestation as a syndrome. And without getting into the weeds of that that really implied something both about the number of classes and the patterns of prevalences of the, in that case, of frailty criteria, of each frailty criterion within each group: non-frail, pre-frail and frail; what those should [18:00] look like. And the manifestation in the analysis that we fit actually fit the theory very well. And so indeed, we found that to be a great success also. 

Another example actually was not from aging. One of my favorite examples, it was a study of post traumatic stress disorder or PTSD.

And so there is a set of diagnostic criteria that is established, you know, by the Diagnostic and Statistical Manual, the standard in the psychiatric field. There are a few different domains and based on those domains, you know, that it's a very strong hypothesis about, you know, what a disordered or a non disordered or maybe an intermediately disordered population should look like in terms of the constellation of criteria that one would, would manifest.

And we also had data on the type of trauma. [19:00] And so, you know, the, the two latent class analysis in that case was first in a broad sense to again, adhere to the predictions that one fully would make based on knowledge of the disorder and, you know, the way that the Diagnostic and Statistical Manual criteria are put together. But also to be able to evaluate that some of the assumptions of that analysis were actually strongly violated in terms of some types of traumas being much strongly connected to certain Criteria. 

You know, regardless of whether otherwise people were exhibiting PTSD or not, allowing us to make some recommendations about examining, you know, future criteria in a way that is, you know, more uniformly applicable across different types of traumas. 

So I think that's another use. That that method also [20:00] can serve is to identify places where one's expectations or theory actually are violated so that one can then go and do something about it.

Terrence E. Murphy, PhD, MS: Kind of in summarizing Karen what I'm hearing you say is that, even as we have these sophisticated methods hierarchical and non hierarchical clustering, I hear you harking back always to the importance of having a theoretical basis to start with, and that we're always reconciling what these sophisticated methods tell us, and how that relates, or doesn't relate, or contradicts the pre existing theory. 

And this is science, right? This is this never ending iterative process that continually sharpens our understanding and expands our knowledge. 

Do you have any final closing suggestions or [21:00] advice for our gerontologic researcher friends? Who may be considering the use of either hierarchical clustering or latent class analysis in their own studies?

Karen Bandeen-Roche, PhD: First of all, I, you said what you just said beautifully. I, I couldn't more strongly agree with it. 

And then I think it would just be to recap, you know that these can be very useful methodologies. You know, I think of them as shining, as putting a lens on a problem. You know, and in some cases, the lens works beautifully to clarify what one is studying.

In the case of hierarchical cluster analysis, connections between diseases. In the case of latent class analyses, identifying subpopulations with similar disease constellations or profiles who then, you know, could be the subject of further study, whether it be for determinants of, you know, that [22:00] typology of disease presentation or perhaps implications for future adverse events or whatever. And so don't be afraid of them. I would absolutely try them. But indeed to be mindful that they really do need to be paired with a very solid scientific understanding in partnership between statisticians and other scientists in aging, you know to adjudicate both whether one's theory is manifested in the data. But also whether these methods are really deployable in a reliable way in one's sets of data. And hopefully the module talked about both of those things. 

Terrence E. Murphy, PhD, MS: I believe it's a very nice and gentle introduction to these very sophisticated tools. And I thank you for your many contributions to the field and for this module in particular.

Thank you so much, 

Karen. 

Karen Bandeen-Roche, PhD: Thank you, [23:00] Terry. It's been a pleasure.