Unpacking the YDAWG’s 2020 US Presidential Election Twitter analysis

In the latest Actuaries Institute Podcast, Kriti Khullar is joined by the Institute’s Young Data Analytics Working Group (YDAWG) members, Henry Ma and Ean Chan, to discuss their recent analysis of the 2020 US Presidential Election Twitter feed. With a focus on both incumbent President Donald Trump, Democratic nominee Joe Biden, their respective parties and the public, Henry and Ean discuss the method they created for their analysis, the meaningful insights they extracted and reflect on how accurate their analysis was.

Listen to the summary podcast below:

Listen to “Unpacking the YDAWG’s 2020 US Presidential Election Twitter analysis” on Spreaker.

Read the transcript

Explaining the method behind their analysis, Henry explains, “we downloaded the tweets from several Twitter accounts we selected and cleaned that data first,” before focusing on, “a descriptive analysis on the frequently used words and phrases, the candidates and parties used.”

Once these tasks were completed, Henry discusses the next step they employed which was a, “high-level sentiment analysis on both the candidates own tweets and the tweets by the general public who directly commented or mentioned the two candidates and parties”. This allowed us to see what the level of positivity or negativity of those tweets was.”

Ean explains that from this data, the YDAWG decided to have fun by generating fake tweets based on the ~10,000 most recent tweets from each of the candidate twitter accounts.

“We actually kept it simple and used these tweets to set up a Markov chain with which we could generate thousands of fake tweets per second. We were able to replicate the tweets in the Twitter format/layout by changing some of the HTML code on the Twitter site to make them look like actual tweets,” Ean explains.

Another thing Ean and the team made use of during this experiment was the enormous amount of ‘Trumpian’ analysis already online. This included a WaveRNN ‘Trump voice’ tool, which was used to read out the fake Trump tweets, and video manipulation techniques (using generative adversarial networks) to have Trump’s face mirror another video, which we used to sync up a Trump video to the voice.

Ean notes that through the team’s ability to work agilely, YDAWG could, “come up with an idea and then whoever was free and ready could work on it.” This allowed them to include all of these fun new data analytics techniques in the article.

In agreement with Ean, Henry comments that the purpose of creating the article was, in-short, to have fun using analytics.

“I think everyone does some type of analytics at work but applying analytics to a different context is definitely fun, especially when we look at what’s going on in the world – there is so much hysteria online and in other platforms regarding the US Presidential Election,” Henry explains.

Ean commented on some other data sources which they had thought about including in the article, “Turn out for each state is publicly available knowledge, so you can actually find some interesting correlations between the turnout for a state in the 2016 election and various metrics like average age, income or rainfall in November. You can either use that as the fun little tidbit it is or try and predict the 2020 turnouts, but being the unprecedent year that it is, I think that would’ve fallen a bit flat.”

As Ean notes, regardless of the analytical results, a main takeaway of this experience was that, “There’s basically a lot of different, rich data sources out there and all of them can be analysed in some way and I guess that’s what we’re trying to do at YDAWG – make people realise that you can analyse anything and it’s all out there, and it’s all ready for us to analyse.”

Find out more about the YDAWG’s analysis of the 2020 US Presidential Election.

CPD: Actuaries Institute Members can claim two CPD points for every podcast listened to.