This methodology describes our primary poll tracker which will be used to track the state of the Democratic and Republican primaries. This methodology is subject to change if we are able to fix a flaw or make it a bit less quirky, and when this occurs, we will be sure to alert you of the change. Additionally, in the interest of full transparency, the code behind this tracker will be posted and updated on our GitHub. Feedback on this and any other part of the site is always welcome, so please critique (or compliment!) our primary poll tracker in the comments below and we will try to respond. Our methodology proceeds in several steps which are applied to every candidate in the race.

#### 1. Daily Poll Averages

The first step in our tracker is to determine a polling estimate and its associated uncertainty for each day. For a day in which there is only one poll^{1}, this is easy enough. The estimate is simply the result of the poll^{2} and the variance, a measure of uncertainty, is p*(1-p)/n where p is the proportion estimate and n is the sample size^{3}. Things are a touch more complicated if there are multiple polls from the same day. We use a weighted average and its associated weighted variance to represent that day’s polling estimate and uncertainty. The final case is a day where we have no polls. If we have no polls, the variance of our polling estimate is infinite^{4}, meaning no matter what we say our polling estimate is, it has infinite uncertainty and therefore does not tell us anything^{5}.

#### 2. The Kalman Filter

Obtaining a daily polling average is important, but our observation of the current state of the race should also be informed by the sate of the race in previous days. This is where the Kalman Filter comes in. The Kalman Filter is a method of obtaining an estimate of a value which is measured over time with random noise. In our case, the value is the “true” proportion of support for each candidate, and the measurements are polls, each of which has an associated uncertainty, the variance determined in the previous step. We apply the Kalman Filter in much the same way described in this paper. Applying the Kalman Filter then gives us an estimate for every day on which there was polling. However, one day’s polls can tend to jerk around the estimate, meaning that the results of the Kalman Filter can be rather volatile. Which is why we apply our next step.

#### 3. Smoothing

In our smoothing step, we take the Kalman Filter results and run them through a LOWESS smoothing algorithm^{6}. This is partially for aesthetic reasons (the charts that are produced end up being a lot prettier and more readable) but also is a recognition that public voting intentions are often not that volatile. Rather, people change between candidates in a trickle rather than a flood. Sure, a scandal or huge revelation may quickly change voting intentions, and our tracker may not pick up on those, but more often than not, changes will be gradual, which smoothing affords us. Smoothing is our final step, after which we obtain our estimates and plot our charts.

Footnotes [ + ]

[…] started reading one of Gelman’s books – he’s amazing! Also, read the comments), here andÂ here for more technical blog posts with a lot more […]