When Nobody Picks Up…

Wherein I describe the challenges presented by having polling response rates in the gutter.

It’s an election year, which means we’re all looking at polling to gauge what, exactly, is going on. I’m not even going to go there.

But what I do want to write about is a situation the polls find themselves in: their response rates are terrible. I mean terrible. ‘round about the Obama/McCain election in 2008, pollsters were fretting that their response rates were down around 30%. This year, for the polls that are actually bothering to publish such numbers, polls seem to be around or under 1%.

For various reasons, people just don’t respond to polls anymore. Smartphones are really good at flagging spam calls and texts, and basically nobody picks up their phone from an unknown number anymore because of all the scams and spam floating around. Some polling firms have gone so far as to pay for “filling out polls” as an advertisement in mobile games. And this has a profound impact on what, exactly, the polls are measuring.

Any sort of polling is meant to figure out what, among several options, the population is going to pick. Pepsi vs Coke. McD’s vs Burger King. Toyota vs Ford. Basically, what is the probability of a random member of the population picking Candidate 1 vs Candidate 2 (or more, but for the sake of not making this a big ugly mathematical exercise that obfuscates rather than elucidates the point, we’ll stick to two and only two choices).

Thus, we are trying to infer $P(C1)$ and $P(C2)$ by sampling from the population.

Now, when we inquire about something, whether it’s an email to a customer, or a product review, or political polling, we can only measure the responses that we get. So if we made 100 calls, 100 people picked up, and 50 said $C1$ and 50 said $C2$, we probably have a pretty good confidence in those numbers, insofar as we are confident that our 100 calls are a representative sample.

However, usually we don’t get 100 people to pick up. We get 30. Or, with this year’s political polling, we get 1. So what we are actually measuring is the probabilities of selecting from Candidate 1 or Candidate 2 given that someone responded to the poll. That is to say, in probability terms, we are measuring $P(C1~ | ~R)$ and $P(C2~ | ~R)$.

Bayes’ Theorem tells us a little something about this, in particular:

We want to know the probability of support for a given candidate, but what we can measure is the probability of support for a candidate given a response, and the overall probability of a response.

Now, as $P(R) \rightarrow 1$, $P(R~ | ~C1) \rightarrow 1$, because all these parameters are constrained by the Law of Total Probability:

$$P(R) = P(R~ | ~C1) P(C1) + P(R~ | ~C2) P(C2)$$

Now, if $P(R~ | ~C1) = P(R~ | ~C2)$, then we can figure out $P(C1)$ and $P(C2)$ pretty easily. But what if there’s a difference in response rates based on support for Candidate 1 versus Candidate 2?

Let’s say that the conditional probabilities are close enough, and we can write them as

$$P(R~ | ~C1) = P(R) + \varepsilon_1$$

and similarly for Candidate 2. Then the polling margin will take the form

$$P(R~ | ~C1) - P(R~ | ~C2) = P(C1) - P(C2) + \frac{\varepsilon_1 P(C1) - \varepsilon_2 P(C2)}{P(R)}$$

and furthermore, because these asymmetries have to be consistent with the response rate, we have that

$$\varepsilon_1 P(C1) + \varepsilon_2 P(C2) = 0$$

(I leave this as an Exercise to the Reader™). So, the measured polling margin between two candidates is as follows:

To put some numbers to this, in a dead-even poll ($P(C1) = P(C2) = 0.5$), a 1% response rate will show a 1 point margin for every .01% response rate difference. By contrast, a 30% response rate with show a 1 point margin for every 0.3% response rate difference, and is thus much less sensitive to any asymmetry in response rates between the two groups.

So the problem with a low response rate becomes very clear.

The polling margin is pretty reliable to measure so long as the asymmetry in polling response rate is small compared to the overall polling response rate. As the polling response rate drops, we are increasingly measuring the asymmetry in polling response rate, not the actual margins.

This is actually pretty well-known from on-line reviews, where the so-called “J-shaped distribution” is extremely common — product reviews tend to either be 1 star or 5 star, with very little in the middle. It’s very hard to get around this because there is such a strong under-reporting bias for people who thought a product was “fine” versus “terrible” or “the best thing evar”. So is your product any good? It’s hard to say when you’re relying on customer reviews that are voluntarily left by people who would usually rather do something else.

The problem here is that the conditional response rates are pretty much impossible to measure, because to measure them you have to have to have an unbiased way of measuring whether someone would reply to your poll, which is probably another poll that will show the same response biases as any other poll. This is why empirical scientists spend so much time calibrating their equipment to well-established data, so we can deal with these sorts of systematic biases in the measurement systems.

For polling or consumer sentiment, there’s no real solution to this, so it’s just an important thing to keep in mind when interpreting low-response-rate polling data. Perhaps a reasonable question to answer is “what response bias would be consistent with these results returning no difference among the candidates?”

It’s important to keep in mind that any biases that appear in response rates to any sort of polling can create issues with interpreting preference, since you’re inferring a population-level sentiment with a sample that may have a response bias. This problem gets worse as the response rate gets smaller, as small asymmetries in response rate can dominate the measurement.

When Nobody Picks Up…

The Measure of a Metric