At Civis, we’re always looking for new ways to use existing data. Even though I primarily spend my time working with clients to collect new data through surveys, we know that some of the best data to solve our clients’ problems may already exist, and often in unlikely places. A good case in point comes from my days before Civis, as a grad student in Political Science. At the time, I wasn’t just short on beer money – many students who do quantitative social science are data poor, and looking for ways to squeeze the most insights possible out of whatever data they can find that has been collected by someone else.
The US Census Bureau does the mother of all data collection projects every ten years. It’s in the Constitution: the federal government needs a headcount of every resident in the country. It’s a massive undertaking, mostly done by mail. But not everyone responds in a timely manner; the Bureau must allocate its resources to make sure that it’s able to reach everyone, even those who are hardest to count. As part of this effort, the Bureau publishes a Planning Database that compiles the past response rates of a given geographic area (that is, the number of people who return a census form by their deadline out of all those who received one). These numbers vary quite a bit across different neighborhoods within cities and across the country as a whole, and serve as a good predictor of which areas will require more follow-up in future censuses.
For the Census Bureau, this solves an operational problem: using past data to determine where to allocate resources for the next time around. For a data poor grad student, the Planning Database was a massive trove of data that I happened upon and had never seen anyone use – I just needed to find a problem to solve with it!
My idea (which resulted in a peer-reviewed article with my then-colleague Benjamin Newman) was to think about the behavior of responding to the census not just as a means to solving an immediate organizational problem, but as an indicator of a broader phenomenon.
It’s worth thinking about what responding to a survey actually entails: In order to respond to a survey, you must be able to find the person you want to talk to, but when you do, they have to be able and willing to respond themselves. People without fixed residences may be hard to track down in the first place and people who work long hours may be less inclined to put the energy into responding, but survey response often come down to motivation: the willingness to do something for no clear reward other than curiosity, social obligation or – more in this case than elsewhere – civic duty. What if Census response data could be used as a way to measure the inclination of a community’s members to contribute to the collective good?
Harvard scholar Robert Putnam (among others) has argued that a well-functioning democracy isn’t just a product of having regular elections, a well-designed Constitution and good laws: it requires a culture of civic engagement and community-orientation in its citizens, a quality Putnam calls social capital. Using survey data collected through Putnam’s research, we were able to show that areas with higher census response rates were also significantly more likely to report that they trust their neighbors; that they interact with their neighbors more; they are more likely to believe that their neighbors will be likely to cooperate on common goals; and they are more likely to believe they can have an impact in bettering their community. Even controlling for common correlates of social capital (aggregate measures of age distribution, income, family structure, ethnic composition), places where people respond to the Census more are communities where people are more socially and civically engaged.
A measurement like this can be potentially very powerful, above and beyond the purpose for which the data were originally collected. Without administrative data like this, it can be cost prohibitive to field conventional surveys to measure these concepts at a level of detail that allows us to distinguish among individual neighborhoods and communities. And if we take the work of scholars of social capital seriously, then these response rates become a map of the health of communities and neighborhoods, distinct from measures we commonly use. If areas with a surplus of social capital are places where neighbors are most trusting, capable and willing to cooperate with one another to solve their problems collectively, then areas with low social capital are those that are least able to resolve their collective problems or organize effectively to seek government intervention. For sociologists and political scientists, it can help clarify the perhaps different ways that communities engage in collective action where mutual trust and civic engagement is rare. While it’s sometimes accepted as a truism that racially diverse communities have less social capital, with detailed data at this level of aggregation, researchers could better assess how (and whether) some communities can better escape this fate. And by studying the ways in which communities with low social capital respond to crisis (for example, natural disaster), we may better understand the degree to which such intangible forces as social capital serve to make communities more resilient. And perhaps best of all for the researcher, it’s rich data that’s already paid for, free for anyone to put to another use.
A few things have changed since my grad school days: the beer budget isn’t quite as tight as it used to be, and at Civis, I have access to cutting edge analytics tools, my colleagues and I are continuing to raise the bar on existing analytics techniques, and of course we have access to a wealth of data far beyond what I had as a student. With all these assets, we’re able to address this kind of research question with even greater precision and depth of insight. That said, we know there’s always more data out there just waiting to be leveraged against the world’s biggest problems – sometimes where you wouldn’t expect it.