Monday, July 11, 2016

Checking in on the polls: why you shouldn't ignore the biased polls

Well, time to check in on the polls. Over at RealClearPolitics, HRC still seems to have around a 4 to 5 point lead.  That seems to be pretty stable.   It seems to me I said that this would be a stable campaign long ago.   Like, back in April.  Maybe I actually know what I'm talking about.

Anyway, remember my standard advice.  Look at the polling averages, not any one poll.  Still, let's take a moment to look at the variation.  Scroll down, and you will see a lot!  You will see a bunch with Trump in the lead.  Now, take a moment to notice which organizations did those polls.  Most of the time, the organizations whose polls put Trump ahead are either Fox or Rasmussen.

Take a moment to deal with the shock that Fox's polls are overly optimistic for Republicans.  Breathe.  I know your world view is collapsing, but it will be OK.  I promise.

Now who the fuck is Rasmussen?  That Matt Frewer character from Star Trek: The Next Generation?  Yeah, my sci-fi geekery digs deep.  You probably haven't heard of them before.  I have.  Why?  Because I've been obsessed with this polling shit for years.  And for years, I've seen their polls be overly optimistic towards Republicans.  So let's briefly explore how polls get biased results.

There are two main methods that a poll can have a bias:  sampling bias and the "likely voter screen."

Sampling bias is almost what it sounds like.  We would always like a random sample.  We can never actually get it.  Cell phones, low response rates, all of that crap can make a survey organization's job miserable.  So, they get nonrandom samples, and have to find a way to correct for the bias.  The answer is to weight the sample.  If they oversample one population, then they give that population less weight in the analysis.  If they undersample another, then they give it more weight.  That process is hard, and it is an easy place to insert your own personal biases because there are arguments to be made about the correct weights.

The even harder issue is the likely voter screen.  How do you figure out who is likely to vote?  Stated vote intentions?  Past voting patterns?  Demographics?  Your own assumptions about campaign mobilization?  You begin to see the problems, and how biases can work in.

Different survey organizations have their own assumptions about the correct weighting procedures and the correct likely voter screen.  Those lead to different results, and those results are driven by the organizations' own biases.

So who is right?

WE HAVE NO FUCKING CLUE!  That's why we look at the polling average.  And that's why you don't get to ignore Fox, Rasmussen, or any other poll that you don't like, nor focus on them just because you like them.

One of the most important rules of science is that you don't get to throw out data points just because you don't like them.

I don't care how much you don't like them.

Quit whining.

Data are sacred.

Yes, I mean "are."  Data:  plural, as in, "many."  Datum: singular.

Every polling organization uses different models.  By taking an average, the biases cancel out.  That's the point.  If you throw out the ones whose biases you don't like, you are missing the point.  Yes, Fox and Rasmussen have biased polls.  But, that Reuters poll with Clinton up by 11 is biased too.  The point is that the Rasmussen poll with Trump up by 2 cancels out the Reuters poll with Clinton up by 11, bringing us back to the stable average.

Look at the polling average, not any one poll.  And understand that the biased polls are important too.  They are data.  We need those.

No comments:

Post a Comment