Friday, September 23, 2016

Why you shouldn't get too precise with prognostications these days

Yesterday, I started going through the current state of the electoral map, and talked about the fact that there are a bunch of plausible paths for Clinton to get to the magic number of 270 electoral votes, and implicitly, that there are very few ways for Trump to get that magic number.  What I didn't do was put a probability on Clinton going over the top.  Plenty of people try to compute that.  Anyone doing so is playing fast and loose with the math.  Here's why.

Imagine Clinton has a 75% chance of winning Colorado and a 75% of winning New Hampshire (a plausible method of getting a majority in the electoral college, as I showed yesterday).  Well, that's simple, right?  .75*.75=.5625.  Clinton would have a 56.25% chance of winning both, right?  Not so fast.  That computation is based on the premise that winning Colorado and New Hampshire are "independent events."  That would mean winning one's got nothin' to do with the other.  That ain't so.  There are nationwide factors.  But, there are local factors too.  Colorado has more Latinos than New Hampshire.  That will help Clinton more in CO than NH.  So, how do we calculate the "joint probability" of Clinton winning both CO and NH?  We need to know exactly how independent the events are.  Do we?  Fuck no.  Plenty of people have guesses, and through those guesses, they compute probabilities of victory in the electoral college for Clinton and Trump, but those are based on imputed levels of independence for the state contests, and that's all pulled from the statistician's head, which is generally up his ass, so it's actually pulled from the statistician's ass in a whole, rectal haberdashery thing.

Your best guess for any one state is always whoever is ahead in the polling average in that state.  Trying to get more precise than that is a fool's errand at this point, and trying to compute joint probabilities across states is a con.  Don't be a rube.

What can we do?  We can look at scenarios.  That's what I did yesterday.  There are more plausible scenarios for Clinton.  The polls favor Clinton.  If every state goes the way their current polls say, Clinton will win.  Default guess right now, Clinton.

How confident can we be?  Fuck if I know.  This year is nuts.  Just remember:  Trump needs to run the table.  Clinton can lose a bunch of swing states and still take the White House.  That's why the odds favor Clinton by some small amount.  Right now, PredictWise gives Clinton around a 72% chance of winning, which is higher than the poll aggregators.  That is probably based on a run-the-table line of reasoning, but at this point, one is on solid footing by defaulting to uncertainty anyway.

This year is nuts.


  1. It will be a super tight race, Right now if we go by Poll aggregations:

    Florida, Ohio, Nevada, Iowa and North Carolina and that one EV in Maine go to Trump putting him at 266 EV's to Clinton's 272.

    Basically Clinton is in a tight squeeze, she's winning, but the margin is razor thin, if any other state tips Red, Trump will be President.

    I knew this would be the case, HRC is a terribly weak candidate, only Foolishness by the Dems made her the candidate, and there is a good chance she will lose.

  2. And I will laugh long and hard if she loses (with a little bitterness tossed in of course knowing that Bernie would have won this race easily).

    1. Avinash, we've been through this. The political science is pretty clear on ideological extremism and vote shares. More ideologically extreme candidates get lower vote shares, and Sanders is measurably more extreme than Clinton. As for his supposed likability, nobody has ever campaigned against him. Clinton treated him with kid gloves because she knew she needed his voters in the general, and there is no reason whatsoever to think that he is the one candidate in history who is immune to the normal political forces like negative campaigning. Once Republican attacks started, he would have been Joseph Stalin come back from the grave to destroy all of humanity, and having identified himself as a socialist, he would have given the Republicans all the ammunition they needed. There is no empirical basis for assuming that his favorables would have held up under the negative campaigning that never happened to him. That's like saying I have a twig that won't burn, and you can tell because it hasn't burned yet. If nobody has ever held a flame to it, your claim isn't very convincing. (And see what I did with the "burn" thing?)

  3. We can't run models and figure out what movement is national and what movement is local?
    We can't run models to see what local movements seem to be related? When Dem polls rise in Iowa, they tend to rise almost as much in Indiana. But, polls going up in Florida have nothing to do with polls in NH. OK, so we can run models to predict who moves together; a simple factor analysis would work. Then we'd know that changes in CO are only correlated at .5 with changes in NH, and that the partial correlation is 0 after controlling for national tides.

    It's not impossible. The limited sample size for frequent state-level polling gives you an error problem as well as a lot of missing cases. Those factors might make your errors too big for the exercise to be worthwhile.

    But this post comes off strongly as suggesting that we can't do social science.

    You can inpute levels of interdependence using....wait for it, you'll like this word.....DATA.

    1. Hardly anti-social science. More anti-Nate Silver. In order to measure interdependence, you need more data than we have. We need good over-time data within and across states to measure how movement in CO tracks with movement in NH. The state-by-state polling is too spotty for that. At best, then, we can do Monte Carlo simulations, but those depend on standard errors within states, and at any fixed point in time, the problem is that those are too big because the polling at any one point in one state is done with a small sample, which is why we aggregate, but those aggregations build in over-time movement, so....

      Lack of data sucks, and I just don't want to pretend that our data are better than they really are. And yes, I'm being extra picky about my grammar because I know it bugs you.