The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or click here to continue anyway

GitHub - brandtg/trump-data: Collection of data from Donald Trump's 2016 presidential campaign



Each JSON file has three columns:

Tweet data prepared using


First download the data

mkdir /tmp/speeches
curl $SPEECHES | OUT=/tmp/speeches ./bin/

Then clean it

mkdir -p /tmp/cleaned/speeches
for file in `ls /tmp/speeches`; do
  echo $file
  ./bin/ < /tmp/speeches/$file > /tmp/cleaned/speeches/$file
pushd /tmp/cleaned/speeches
rename "s/html/json/" *.html

And repeat for any other candidate / content type you want. See:


After Trump secured the nomination, the campaign relied only on speeches

Whereas Clinton had a more even distribution of speeches, press releases, and statements

Given that pre-nomination, the Trump campaign showed a more even distribution among these methods, (with speeches notably underrepresented), there must have been some perceived or real advantage to switching to entirely speeches.

This shows the lexical dispersion plot for several phrases in all of Trump's speeches concatenated together

The bursty, highly-focused pattern seen with immigration and ISIS might have helped cement opinions on these topics, whose intensity the subsequent steady references could easily recall due to availability bias.

Also, the heavy focus on jobs and trade - which are less abstract than the economy - is interesting, since these things can be felt viscerally (e.g. losing manufacturing jobs to China from outsourcing vs. GDP changing by X%).

Maybe most notably, Clinton is mentioned by name with the highest frequency of any of these terms, which suggests a primarily antagonistic approach.

This has further support in the fact that the distribution of these terms is notably sparse among tweets, with the exception of the names (or nicknames) of rival politicans

Clinton delivered roughly five times as much content in her speeches (525128 words after cleaning, vs. Trump's 106229), and the distribution of the length of each candidate's chosen words was roughly the same

As word length is at least some measure of complexity, this suggests that both candidates were calibrated to deliver a message to the same general audience.

However, we do see that Trump tended to use longer words (as determined by NLTK) more frequently than Clinton, so he may, in fact, have had the best words.

Using jsvine/markovify, we can also easily build a Tweet generator, e.g.

>> COUNT=10 ./bin/ < data/2016_donald-trump/tweets/donald-trump-tweets.csv 
#ICYMI - I will be an all-time record?Congratulations to Jim Herman, my ass't golf pro at Trump National Doral.
To shop please visit --- it is to play his great father..@CGasparino Good seeing you.
No way they hit Senators Cruz & Rubio.
Tune in tonight at 10:00.
WE WILL MAKE AMERICA GREAT AGAIN!Wow, @CNN is a terrific guy and a conflict for all of Congress from ObamaCare.
What is going home?
He is making people aware of how good they stand for.  at Trump TowerWe pause today to wish everyone A HAPPY AND HEALTHY NEW YEAR.
U.S. should NEVER have made the Jump to Trump.

Continue reading on