Cluster Analysis Methods to Support Population Health Improvement among U.S. Counties

Elizabeth Pollock, Ronald Gangnon, Keith Gennuso, Ronald Gangnon, Marjory Givens

Jun 2024

PDF DOI

Abstract

Context: Population health rankings can be a catalyst for the improvement of health by drawing attention to areas in need of relative improvement and summarizing complex information in a manner understood by almost everyone. However, ranks also have unintended consequences, such as being interpreted as “hard truths”, where variations may not be significant. There is a need to improve communication about uncertainty in ranks, with accurate interpretation. The most common solutions discussed in the literature have included modeling approaches to minimize statistical noise or borrow strength from covariates. However, the use of complex models can limit communication and implementation, especially for broad audiences. Objectives: Explore data-informed grouping (cluster analysis) as an easier-to-understand, empirical technique to account for rank imprecision that can be effectively communicated both numerically and visually. Design: Cluster analysis, specifically k-means clustering with Wasserstein (earth mover’s) distance, was explored as an approach to identify natural and meaningful groupings and gaps in the data distribution for the County Health Rankings’ (CHR) health outcomes ranks. Setting: County-level health outcomes from the 2022 CHR Participants: 3,082 counties that were ranked in the 2022 CHR Main Outcome Measure: Data-informed health groups Results: Cluster analysis identified 30 health groupings among counties nationwide, with cluster size ranging from nine to 184 counties. On average, states had 16 identified clusters, ranging from 3 in Delaware and Hawaii to 27 in Virginia. Number of clusters per state was associated with number of counties per state and population of the state. The method helped address many of the issues that arise from providing rank estimates alone. Conclusions: Public health practitioners can use this information to understand uncertainty in ranks, visualize distances between county ranks, have context around which counties are not meaningfully different from one another, and compare county performance to peer counties.

Type

Journal article

Publication

Journal of Public Health Management and Practice