This methodology describes our primary poll tracker which will be used to track the state of the Democratic and Republican primaries. This methodology is subject to change if we are able to fix a flaw or make it a bit less quirky, and when this occurs, we will be sure to alert you of the change. Additionally, in the interest of full transparency, the code behind this tracker will be posted and updated on our GitHub. Feedback on this and any other part of the site is always welcome, so please critique (or compliment!) our primary poll tracker in the comments below and we will try to respond. Our methodology proceeds in several steps which are applied to every candidate in the race.
1. Daily Poll Averages
The first step in our tracker is to determine a polling estimate and its associated uncertainty for each day. For a day in which there is only one poll1, this is easy enough. The estimate is simply the result of the poll2 and the variance, a measure of uncertainty, is p*(1-p)/n where p is the proportion estimate and n is the sample size3. Things are a touch more complicated if there are multiple polls from the same day. We use a weighted average and its associated weighted variance to represent that day’s polling estimate and uncertainty. The final case is a day where we have no polls. If we have no polls, the variance of our polling estimate is infinite4, meaning no matter what we say our polling estimate is, it has infinite uncertainty and therefore does not tell us anything5.
2. The Kalman Filter
Obtaining a daily polling average is important, but our observation of the current state of the race should also be informed by the sate of the race in previous days. This is where the Kalman Filter comes in. The Kalman Filter is a method of obtaining an estimate of a value which is measured over time with random noise. In our case, the value is the “true” proportion of support for each candidate, and the measurements are polls, each of which has an associated uncertainty, the variance determined in the previous step. We apply the Kalman Filter in much the same way described in this paper. Applying the Kalman Filter then gives us an estimate for every day on which there was polling. However, one day’s polls can tend to jerk around the estimate, meaning that the results of the Kalman Filter can be rather volatile. Which is why we apply our next step.
In our smoothing step, we take the Kalman Filter results and run them through a LOWESS smoothing algorithm6. This is partially for aesthetic reasons (the charts that are produced end up being a lot prettier and more readable) but also is a recognition that public voting intentions are often not that volatile. Rather, people change between candidates in a trickle rather than a flood. Sure, a scandal or huge revelation may quickly change voting intentions, and our tracker may not pick up on those, but more often than not, changes will be gradual, which smoothing affords us. Smoothing is our final step, after which we obtain our estimates and plot our charts.
Footnotes [ + ]
|1.||↵||We consider the date of a poll to be the ending date.|
|2.||↵||The tracker does not currently discern between Registered and Likely Voters for determining uncertainty.|
|3.||↵||The exception to this is when p=0, which would normally imply a variance of zero, but this is nonsense when considering the context of polls. Instead, we say the uncertainty is the same as it would be if p=0.03. For example, if a sample of 500 comes out with zero support for candidate X, we say that candidate X’s polling estimate is zero, but with a variance of (0.03)*(1-0.03)/500. Right now, this is a little messy, but since 0% is statistically the same as 3% in most polls, it is not entirely naive.|
|4.||↵||This is not quite right because a polling estimate is bound by 0 and 1, but for the sake of simplicity, we will call the variance infinite.|
|5.||↵||This is not exactly how it is implemented when building the program, but the implementation is statistically equivalent to this understanding.|
|6.||↵||Our specific smoothing algorithm considers the closest 10 estimates when smoothing.|