Star Wars: The Audience Turns to the Dark Side

Star Wars: The Audience Turns to the Dark Side

Here is a guest post from George McIntire

George McIntire is a data science writer, educator and Thinkful mentor. He’s interested in using the power data science to educate the public and make better sense of our world.

We’re about a week away from the premiere of the latest installment in the Star Wars saga: The Last Jedi. Almost three years have passed since The Force Awakens stormed the box office and fans of the franchise – both old and new – are expected to make the Last Jedi the biggest movie event of the year.
With numerous cliffhangers leftover from The Force Awakens and new questions stemming from The Last Jedi’s cleverly crafted trailers, millions have flocked to social media to voice their thoughts on the new film. Given all this chatter, we wanted to gain a deeper understanding of how the Star Wars fanbase collectively feels heading into opening weekend. Using Twitter’s API and data science techniques, our analysis showed that Star Wars fans overall are dreading to find out what happens in The Last Jedi.
Headed to the Dark Side Audiences Are
After analyzing nearly 33000 tweets, our analysis shows that Star Wars fans are turning to the dark side.

As demonstrated in the scatter plot above, the majority of Star Wars related tweets contain strong and ominous language. Polarity score is a metric used by data scientists to show to what degree text is negative or positive. Subjectivity score is another data science metric that measures to what degree text is objective or subjective. Nearly three-quarters of Star Wars related tweets score negative (polarity score <0), two-thirds of tweets have a polarity score of less than -0.3, and the median polarity score for all tweets is a staggering -0.55. This tweet by @dalordzprince perfectly encapsulates the anxiety captured in our data set:

Dire feelings towards the film extended to the film’s three most popular characters also. Tweets discussing Kylo Ren, Rey, and Luke Skywalker (the three most mentioned characters in our data set) all generated negative polarity scores.

The histogram above shows the distribution of positive and negative tweets concerning Rey, the mysterious heroine from The Force Awakens. An overwhelming majority were negative. While this might reignite memories of the racist and sexist #BoycottStarWars movement, we can partially attribute the negativity to the rumors swirling of Rey’s flirtations with the Dark Side. As @FelicityRidley’s tweet shows, some folks are concerned about this potential plot twist.

Luke Skywalker, seen in the final minutes of The Force Awaken and expected to play a significant role in The Last Jedi isn’t immune from the negativity either. A majority of Luke-related tweets received a polarity score less than zero with a median polarity score of -0.27.

So what’s gone wrong with the main protagonist from the original series? Many are starting to question whether Luke is in fact the Chosen One and some like @PracticallyGeek are nervous his storyline comes to an end in The Last Jedi.

Data Science Used We
To understand public sentiment around Star Wars: The Last Jedi, we undertook a three-step process.
First, we compiled a dataset over 33,000 Star Wars related tweets from the dates November 22nd to December 1st. Pulled using Twitter’s API search function, our data set included any tweet that contained one or more of the following terms: Star Wars, Last Jedi, #starwars, #lastjedi, and #maytheforcebewithyou.
Next, we honed in on two data science tools to conduct a sentiment analysis of the tweets in our dataset. Sentiment analysis is a natural language processing term that refers to the ability of computers to assign a sentiment score to written text. The first tool, the vaderSentiment Python library allows the computer to derive a polarity score for each tweet on a scale from -1 to 1. Data scientists use vaderSentiment frequently for social media sentiment analysis because of its ability to analyze short pieces of text. The second tool, the TextBlob Python library is a general purpose NLP tool that allows the computer to assign a subjectivity score for each tweet on a scale from 0 to 1.






Polarity Score

Measures to what degree text is positive or negative

-1 (most negative) to 1 (most positive)


Subjectivity Score

Measures to what degree text is objective or subjective

0 (most objective to 1 (most subjective)

Finally, we parsed our tweets using these two sentiment analysis tools. As mentioned earlier, our analysis found that Star Wars fans are feeling anxious leading up to the film’s premiere.

Link: Star Wars: The Audience Turns to the Dark Side