BallotCoverage: Finding the Minimal Set of Presidential Candidate Surnames
I recently came across a comment that said:
The 2024 election is the first since 1976 without a Bush, Clinton, or Biden on the ballot
This is interesting but a bit superficial. It counts multiple people with the same surname (like ‘Bush’ and ‘Clinton’) and includes running mates who later ran for president. It got me thinking: what’s the smallest set of surnames that covers every election since earlier years? If it’s just three for 1976-2020, the number for all elections going back to 1789 might be surprisingly small.
This is a bit like the birthday problem, where a small group has a high chance of shared birthdays. Here, it’s less about political dynasties and more about the overlap of surnames across elections.
To explore this, I created a Python script called BallotCoverage.
The Problem
The goal is to find the smallest set of surnames that includes at least one candidate on every ballot since a specific year. At first, it sounds straightforward, but there’s a lot of work to be done. As you go further back in time, new names are added, and you must reconsider all previous elections.
The challenge is that as you include more elections, the number of possible surname combinations skyrockets, making it impossible to brute-force in a reasonable time. That’s where algorithms come into play.
How the Script Works
I started out with just a greedy algorithm. The greedy algorithm takes the “biggest” name first (the one that appears on the most ballots) and keeps adding names until every election is covered.
First, I made a quick dataset that had all of the nominees and running mates for each election (that had them — before 1800, the office of the vice president was just the runner-up to the election).
The script gathers all the candidate surnames from the relevant elections, and for each surname, the script tracks which elections they appeared in by using a bitmask. Each bit in the mask represents an election, and it’s set to 1 if the surname was in that election:
# Create a bitmask for each candidate showing which elections they appeared in
candidate_elections = {candidate: 0 for candidate in all_candidates}
for i, election in enumerate(relevant_elections):
for candidate in election['candidates']:
candidate_elections[candidate] |= (1 << i)
Once we have this bitmask, the greedy algorithm chooses the surname that covers the most uncovered elections. It starts with the surname whose mask has the most 1’s (i.e., covers the most elections), and then it removes those elections from consideration. This continues until all elections are covered:
# Greedy algorithm: pick names that cover the most elections first
greedy_solution = []
uncovered = all_elections_mask
for candidate in sorted_candidates:
if uncovered & candidate_elections[candidate]:
greedy_solution.append(candidate)
uncovered &= ~candidate_elections[candidate]
if uncovered == 0:
break
At each step, the algorithm adds the candidate covering the most uncovered elections, updating the set of uncovered elections until all are covered.
This approach works quickly and often gives a good result, but it’s not guaranteed to find the smallest set of names.
Finding a More Optimal Result
To get the optimal result, I added an exhaustive search that checks every possible combination of candidates. While this guarantees the smallest set, it can be significantly slower.
The script combines each candidate’s elections using a bitmask and checks if they cover all elections. If a combination works, it’s saved as the best solution.
def covers_all_elections(candidate_set):
coverage = 0
for candidate in candidate_set:
coverage |= candidate_elections[candidate]
if coverage == all_elections_mask:
return True
return False
By starting with the greedy solution, the exhaustive search only checks combinations up to its size, making the process more efficient. For older elections, exhaustive searches can take hours, so I added a 30-second timeout to return the best solution found within that limit.
Results and observations
Running the script from 1976 (up to 2024), I found this:
Since 1976, at least one of these surnames has appeared on every presidential ballot (in chronological order of first appearance): Dole, Bush, Biden, or Trump.
This was interesting because Dole isn’t often discussed in this context. The original quote that sparked this project mentioned Clinton instead, but when you include the 2024 candidates, Clinton isn’t an optimal name due to the overlap with Trump in the 2016 election.
For 1789, I get differing solutions for each run, because of the variance in the greedy algorithm. I don’t have the patience to run the exhaustive check, but you could by making minor changes to the script. Here is one of the greedy solutions:
Since 1789, at least one of these surnames has appeared on every presidential ballot (in chronological order of first appearance): Adams, Clinton, King, Tompkins, Clay, Van Buren, Johnson, Harrison, Fillmore, Grant, Wilson, Wheeler, Arthur, Cleveland, Stevenson, Bryan, Roosevelt, Hoover, Truman, Nixon, Mondale, Bush, Biden, or Trump.
Notice that Clinton is back on the list, due to the somewhat forgotten figures George Clinton and DeWitt Clinton in the early 1800s.
In the end, it was a fun script to work on, and it has made me interested in learning more about efficient algorithms. Before this problem, it was easy for me to assume modern processors can brute-force a problem like this in milliseconds. But once you start looping through trillions of combinations, it becomes clear that clever approaches make a big difference. There’s probably an approach that exists that could find the canonical solution to the 1789 lookback in a reasonable time, so I’m interested in learning more to see how it can be improved.
Feel free to download the script from GitHub and run it with different input years to see how the results evolve.