What is Progressive Analytics?
Progressive analytics in the United States describes a particular set of tools and methodologies used to optimize resource allocation for a set of organizations working in domestic progressive politics. The label "progressive politics" encompasses a wide variety of groups and individuals that are trying to promote progressive ideas, values, laws, and candidates. The term progressive often describes organizations aligned with the Democratic Party, but this doesn’t necessarily have to be the case. Types of organizations active in progressive politics include candidate campaigns, party committees, labor unions, advocacy organizations, and consulting firms. What these organizations have in common is that they are engaging in a set of actions to promote a policy, a set of policies, or a candidate. Some campaigns only last for an election cycle (e.g., a presidential campaign) while others will last for decades (e.g., organized labor).
(Before I get any further, I'm going to reiterate my disclaimer that the world of progressive politics is large, and I don't have a comprehensive picture of all of it. This section is meant to give you a basic overview of what happens and how analytics fits in.)
There are lots of parts to a campaign (which, remember, can be related to an election or not): fundraising, policy, communications, member mobilization, field organizing, opposition research, digital strategy, etc. All of these parts serve the campaign’s goals, whether that is electing a candidate, getting a legislator to vote a particular way on a bill, or raise money to fund a voter registration drive. Progressive political candidates generally have a variety of possible tactics at their disposal from buying television ads to engaging on social media to sending volunteers out to knock on doors to remind supporters to vote. The goal of analytics work is to use data to make particular tactics as effective as possible.
Relative to other components of campaigns, analytics is a recent addition, arising in either the late 1990s or early 2000s, depending on whom you ask.1Analytics is both related to yet distinct from polling, which has been a feature of American politics since the 1950s, but we’ll get back to this point later.
The basic assumption of analytics is that we can observe things about the world, and that those observations should inform our decisions. Analytics practitioners are responsible for producing the ground truth, so that campaign leadership can make sound decisions around resource allocations: what geographic areas should receive more funding? What tactics should a campaign spend its money on? Which individuals should campaigns try to mobilize or persuade? How many volunteers do we need? How do we use their time? The goal of analytics is optimization: maximizing impact given a limited set of resources.
One of the biggest functions of progressive analytics is to optimize voter outreach, and at the heart of that effort is the voter file, a list of every registered voter in the United States along with their name, address, phone number, and previous vote history (whether or not they voted, not whom they voted for). This data is actually publically available, sometimes for free and sometimes for a fee.2 Sometimes, this data is appended with commercial datasets, which adds records of unregistered voters. Data providers will also add columns to this voter file through other sources, ranging from Census data to datasets of commercial activity. At the end of it, we end with dozens to hundreds of characteristics (often called features) on any given individual.3 For advocacy organizations, the general voterfile might not be as relevant. Instead they'll focus on their membeship, but even then, that file is often also matched to the voterfile to take advantage of the many features offered there.
Within an organization, there are two types of people who will interact with the data: on one end of the spectrum are data managers, whose primary role is to make sure that the data is clean and up-to-date, and on the other end are analysts, whose primary role is to actually use statistical analysis to draw insights from the data. In practice, many junior analysts and data managers will have overlapping duties and roles.4 Both of these groups can drive campaign strategy: data managers will often be more "on the ground", interacting directly with field organizers who are using the data to identify and target voters and reporting on their results, while analysts tend to be clustered at headquarters, where they may do more abstract analyses like analyzing polls or building the tools used by data managers for targeting.
The goal of using this data is to optimize resources. I'm going to use an example from a campaign context, but you imagine how this also applies to advocacy work. Let's say that we want to convince a group of voters to vote in an upcoming municipal election. We probably have a limited number of volunteers, so we want to make sure that we send those volunteers to the voters for whom they will be most impactful. In the world before voter files, we'd probably look at precinct- or county-level turnout, provided by the State Secretary of State. Then, we might look at precincts where there's fewer people who turned out in the last municipal election than in the last general election and send volunteers to those precincts. But even then, some of the voters that the volunteers will contact will vote in the municipal election regardless of if they were contacted, and some of them won't vote even if they are contacted because they actually never vote. This means that you're wasting the volunteer's time by sending them to these doors where they won't be effective.
In order to increase efficiency, analytics practitioners will often try to optimize resources at the individual rather than county or precinct level. The way that they do this is through modeling, which uses the voter file to generate individual-level scores ranging from 0-100 (called models) to predict things like the likelihood of turning out to vote in a particular election or the likelihood of supporting a particular candidate or policy. Then, they can use these scores to target which voters to reach out to encourage to vote (often called get out the vote or GOTV; which ones to target for persuasion efforts; or which ones to mobilize in specific ways, like to call legislators or show up for a protest. By using these individual-level scores, organizations don't have to target particular counties or precincts: they can target only the individuals of interest. Modeling can also include election forecasting, taking in public and private polling to project out election results.
How do analytics practitioners know which people to target? In some cases, they can look at past behaviors, like voting. One of the biggest predictors of whether or not someone will vote in the future is whether or not they’ve voted in the past. When it comes to predicting support for a candidate or policy, through, analytics will often rely on polling.
Polling has been a central component of campaigns since the 1950s, although polling has typically referred to strategy polling rather than analytics polling. Strategy polling refers to the tradition of pollsters being viewed as strategic partners on campaigns, guiding the development of messaging and, to a much lesser degree, targeting. Strategy pollsters typically use long surveys on small samples (i.e., 400-1000 people) to perform deep dives on the electorate. This approach enables simulating the campaign environment, complete with overlapping messages that are pro- and con- towards both the candidate and the opponent.5 The results of these polls are collected in toplines, which represent the main results, and banner books, which present cross-tabulations of the results by demographic group. Together, these tools allow pollsters to present macro-level advice about the direction of the campaign and how it should move forward.
Analytics polling is similar but distinct from strategy polling. Typically, analytics polls involve shorter polls and larger sample sizes.6 These poll results often are appended to the voter file, and using these results, analysts can build predictive models of which candidates or policies that individuals are likely to support. Analytics polls are performed at regular intervals throughout the campaigns and used in modeling, election forecasting, and tracking. You’ll generally see fewer analytics polls than strategy polls reported on in the media, both because there are fewer analytics firms and because most of their work is purchased by campaigns and IE (independent expenditure) groups directly rather than produced for public consumption. In contrast to strategy pollsters, analytics polls will rarely try to simulate a campaign environment, both to reduce the length of the poll (which increases representativeness and reduces costs) and because, philosophically, it’s somewhat questionable about whether phone polls can actually simulate a campaign environment and accurately measure changes to it.
Obviously, the question of how to effectively change voters’ beliefs and behaviors is one of the most important questions of any campaign: how will voters would respond if they received some type of intervention, e.g., seeing a particular TV ad or listening to a particular message, and how many resources should a campaign allocate to those actions? But rather than simply asking people in a phone poll which message is most persuasive often isn’t enough. The things that people identify as the most persuasive or resonant may not actually be what is persuasive, both because people lie and also because persuasion can be more subtle than individuals can identify.
As a result, when analytics practitioners try to answer types of these questions about the effect of campaign actions, they’ll typically run large experiments on actual voters. Just like in a medical trial, they’ll randomly assign individuals to receive certain types of interventions and then measure what is the impact of those interventions in order to provide strategic guidance about what types of tactics to use. These tests often take place "in the field", which is to say that rather than exposing people to a message on the phone, analysts might actually try to send mail pieces to people’s doors and then run a phone survey on people assigned and not assigned to receive mail.
Historically, analytics has mainly fallen into two big camps: digital and field. Digital analytics typically involves optimizing fundraising and messaging; it encompasses everything from online petition signing to the e-mail list to the Twitter feeds. Field analysts look at, well, everything else, ranging from volunteer recruitment to voter outreach. Digital analysts often perform a lot of testing on things like messaging and layout in order to optimize behavior. Compared to other practitioners, digital analysts often see a faster turnaround time on the things that they’re analyzing or testing. Field analysts, by contrast, are often guiding member, volunteer, or voter outreach behavior in the "real world". As a result, any tests that they run will take weeks rather than days, and they may never be able to fully measure the output of their results. The item that falls between these two groups is often media spending (think TV budgets), which can be handled by either team or sometimes another team entirely. At any rate, these are mostly organizational issues: analysts will often move back and forth to different types of analyses, but the core analytical skills will remain the same.
In large part because the field is so new, analytics is constantly evolving and growing. A good analytics team that is used well will touch almost every part of a campaign’s operations. As a result, analysts rely on the work of many other teams, ranging from the technology team, which writes software to manage databases and other workflows, to the field organizing team, which uses the models that analysts produce in order to create lists of voters (often called pulling lists) for targeting. The role of an analyst is to set up systems through a combination of programming and statistics to answer a campaign’s key questions quickly, reliably, and at scale.
1. You can buy the Pennsylvania voter file, for example, in about 15 minutes with a credit card and twenty bucks. The fee for other states ranges from free to tens of thousands of dollars. ↩
2. There is a fashionable, but incorrect, belief that analytics started with the 2008 or 2012 Obama for America campaign. While this might have been one of the best-funded analytics operations, it certainly wasn’t the first. FiveThirtyEight has an excellent overview on the use of data in political campaigns. ↩
3. Okay, that was actually a fairly dramatic oversimplification of the long, messy, incredibly complicated topic of identity resolution. Voter files are actually sort of absurdly annoying to compile. Think about it: every single state has its own way of compiling information and uses different standards in doing so. People move across states all the time, and resolving duplicated entries and tracking people across time is a decidedly non-trivial task. Luckily, most analysts will never have to deal with this process, leaving it to the professionals at organizations specializing in voter-files like Catalist, TargetSmart, and the DNC. ↩
4. This document is more focused on the analysts’ roles in campaigns, although much of the advice applies to data managers, especially at the more junior levels. ↩
5. Most of the polls that you find in the media are from strategy polling firms. Typically, if a poll has more than 10 questions on it, you can be fairly certain that it’s from a strategy polling firm. ↩
6. Methodologically, both analytics and strategy polling uses voter files to draw on samples, although strategy pollsters will occasionally also use random digit dialing. In general, strategy polling tends to hew more closely to traditional, academic survey research whereas analytics pollsters tend to use more modern techniques, such as drawing samples on the basis of turnout propensity (rather than using a likely voter screen) and inversely to response likelihood. ↩