Resilient Communities are the Foundations of a Resilient America.

Deep Diving into Community Data

In a post just about a year ago, I admitted to being a data geek (well, actually, I sort of gloried in it).  For data geeks interested in “Whole Community” resilience, these are glorious times.  Sites such as Community Commons and that of the Hazards and Vulnerability Research Institute offer loads of opportunities to wallow in data on many facets of community life. However, these maps only tell us how the jurisdiction (city, county, state) is faring relative to others.  They don’t really tell us anything about the underlying relationships among the datasets.  And if how important facets of community resilience relate to each other, we have greater leverage to move our communities toward greater resilience.

In that same post, I wondered whether relationships among higher-level datasets (e.g., country or state) might carry over into the community context.  With that in mind, I started out looking at whether relationships between state datasets would translate to the county level.  In general, I found that datasets were less correlated at the community level than at the state level (e.g., median household income and poverty, or male obesity and male life expectancy).

But what I found really interesting was what could be learned by actually looking at the plots themselves rather than relying on the statistics.  In several cases, the graphs suggested different ways of looking at common community problems and solutions.  I want to share a few of these with you.

In the following, I’ll be using federal county-level data on education, health, and demographics (My appreciation to Opportunity Nation for allowing me to use their compilations for many of the datasets.).  I must quickly point out that there are over 3000 counties (or their equivalents) in the US.  However, many of the datasets do not have data from every one of them.  The smallest dataset has about 1540 data points.  I have also normalized all of the data so that an individual point represents its relative position between the data’s maximum and minimum (e.g., if the max of the dataset is 25 and the minimum 5, then the point representing 10 would be represented as (10-5)/(25-5) = 0.25).

In the figure below, I’ve plotted the population with incomes below the poverty line against the median household income (MHI).  Statistically, there is a reasonable linear correlation (R2 ~ 0.6) which matches our intuition that the higher the median household income the lower the fraction of the population living in poverty.  However, the plot shows something a little bit different – the relationship is non-linear.  It appears that the proportion living in poverty drops off quite quickly up to an MHI of ~ 0.25, and then falls off much more slowly.

Poverty vs MHI

 This same behavior is seen even more dramatically in the plot of “Disconnected Youth” vs MHI.  Here “Disconnected youth” refers to the fraction of people 16-24 who are neither in school nor working.  Again, our intuition might suggest that as the MHI increased the fractional population of disconnected youth would decrease.  Statistics bears this out to some extent – a linear relationship with an R2 of ~0.3 (The black line on the plot indicates the linear curve fit.).  The plot tells a different story, though; the fraction of disconnected youth is independent of MHI when MHI is > ~ 0.5.  In other words, increasing median household incomes does reduce youth disconnection to a point, but does nothing to reduce its rate in higher income communities.

 mhi disc youth

The relationship between heavy drinking and on-time high school graduation (portion of the population who graduated from high school on time) is more surprising.  In this case, there is no statistical correlation.  And yet the plot reveals that on time graduation effectively sets an upper bound on heavy drinking (as indicated by the black line – in fuzzy math terms, there is a very high probability that on-time high school graduation limits heavy drinking by the population); i.e., the education system might offer a pathway to limit heavy drinking later in life and all of its social baggage.  All of the outlier counties are predominantly rural (and their population in total is < 300,000).  While the relationship might be coincidental, it may be worth some graduate student time to explore this further.

drink vs hs

 I have a few more plots to present, but I’ll tag them onto future posts.  To me, the takeaway from this is the recognition that while the data points are important, they are a means to an end.  Ultimately, we collect data to assist in making decisions – what Datnow and Park call “data-informed decisions” – about actions that bolster a community’s resilience.  Action without data relies on the gut feel of the decision-makers to get things right.  Data can inform action by pointing it in the right direction.  Action informed by data validates the value of collection; but without action, data gathering is merely an interesting academic exercise.