• Savidu Dias

How I Built A Chrome Extension that Filters out Angry Tweets

Updated: Aug 15

Twitter is the ideal social media platform to discuss hot social issues, but lately it feels like the platform gives a bigger spotlight to the loudest people who come up with the most extreme ideas. Setting aside the fact that these people would accomplish pretty much nothing by constantly screaming into an echo chamber, it’s almost impossible for anyone to have fun anymore.


I joined Twitter back in 2014, mostly to follow topics such as sports, gaming, memes, and tech. Ever since then, the social media platform slowly became consumed by people from all sides of the political spectrum trying to shove their ideals down your throat through constant screaming matches.


Despite the community, I still believe that Twitter is the best platform that would allow people to follow their interests. Around 2 months ago, I had enough of this community and thought to myself what I could do as a Software Engineer to make this better.


Something to do with Natural Language Processing was the obvious answer. To be more specific, I thought that sentiment analysis would be the way to go.

What is Sentiment Analysis?


The basic idea behind sentiment analysis is the determining whether a piece of text is positive or negative (or neutral). We can use sentiment analysis to determine if a tweet is negative. If the Tweet was determined to be having a negative sentiment, we simply get rid of it.

So the first thing we have to do is try to figure out if something is angry or not. This is what we like to call “Classification”. The first thing we must do is build a classifier that tells us if a Tweet has a negative sentiment.


How on Earth do we go about doing that? Well, Machine Learning is your friend. There are a lot of ways to go about this. You can use sentiment analysis systems developed, tested, and used by thousands of developers such as Perspective API. However, I decided that I hate myself and wanted to spend THREE WHOLE DAYS developing my own model to do this.


Developing a Sentiment Analysis Model


Our job now is to figure out a way to make a computer be able to understand if a Tweet has a negative sentiment using Machine Learning. So how do we do this? Think of your computer as a toddler who knows absolutely nothing. Except unlike toddlers, a computer won’t ruin your life.


You start off by going through a bunch of tweets yourself and labeling them as positive, or negative depending on how they sound like to you. Then you show them to your toddler, and computer and ask them to go through it.

Now your computer would have a good idea on what a positive, and negative sounding tweets would look like. Your toddler on the other hand might be too busy figuring out ways to ruin your life.


So if you show them any random Tweet now, they would be able to have a good guess based on what they have learned before.

That is basically a simplified explanation as to how sentiment analysis would work using Machine Learning. So how did I actually do it?


In the real world, you can’t simply teach your computer to correctly identify positive/negative texts from a few samples. Instead, you need to show it a few thousand to really get the message across. Did I say thousand? HOW ABOUT 1.6 MILLION??? Because that is exactly what I did. I found this dataset of 1.6 million tweets on Kaggle that have already been labelled as positive or negative. All I have to do is figure out a way to use this to train my computer. There are several steps involved in this.


Step 1: Preprocessing


In Machine Learning, preprocessing is usually the first step that is involved. This is the process where we remove anything that we would not need from the tweet that we are looking at.


For example, some tweets may have URLs in them, and we do not care about that because a URL would not indicate if a tweet has a positive or negative sentiment.


Additionally, we also have words in the English language that are so commonly used that we can’t possibly figure out if a tweet is positive/negative by looking at it. For example, if a tweet had the word “horrible” we can make a good guess and say that it has a negative sentiment.


Just like that, if a tweet has the word “amazing”, it’s a good indicator that the tweet has a positive sentiment. However, words like “I”, “him”, “the”, “it”, etc are so common that we cannot guess the sentiment of a tweet just by looking at it. These types of words are called “stopwords”. A part of preprocessing is removing all stopwords, because they don’t contribute anything to our sentiment analysis model.


These are a few things that would not contribute to the sentiment analysis process and as a result should be removed. Here is a full list of things that are removed from the text during preprocessing:

  1. URLs

  2. Stopwords

  3. Usernames

  4. Hashtags

  5. Non-alphabet characters (anything that is not A-Z or a-z)

Additionally, certain words in the English language can be used in different variations. For example, the word “car” can be used in variations like car, cars, car’s cars’, etc. Because of this, we need to be able to identify different variations of the same word and normalize it to a single word. This is known as Lemmatization.


eg:

car, cars, car's, cars' ⇒ car

the boy's cars are different colors ⇒ the boy car be differ color


Each word that is not removed from a tweet needs to be converted to its lemmatized form.

Once preprocessing is applied in each and every one of the 1.6 million tweets, we are left with a well-organized bunch of tweets.


Step 2: Training the Dataset


Now that we have an organized set of tweets that are labelled to have a positive or negative sentiment, the next step is to train our computer to go over these organized 1.6 million tweets and learn how certain tweets can be negative and others can be positive.


The first step in training the dataset is splitting our data into two parts called the training set and the test set. I’m going to use 80% of the 1.6 M tweets to train our model, and the other 20% to see how accurate our model is once it’s trained.


Then, I used my training set with a Word2Vec neural net model. All of these were done using existing deep learning libraries like genism and keras.

Step 3: Building the Model


The final step in building the sentiment analyzer is building our model. This is basically setting up a neural network and running our program and waiting till it finishes building the model.


Once the model is built, we save it as a file so that we can just open it and use the model whenever we classify a piece of text. This would save us a bunch of time, because the alternative is to wait hours to classify each tweet.

If you don’t have a fancy shmancy GPU, the process of building the model using neural networks is going to take you a really long time. I mean a REALLy long time… like 10 hours. I have plenty of time to think about my sad life. Like why do the people I love always leave me?


Once the model has completed building, I was able to get an accuracy of 79%, which I think is pretty good.

320000/320000 [==============================] - 113s 352us/step

ACCURACY: 0.791134375
LOSS: 0.4442952796936035
CPU times: user 2min 25s, sys: 16.9 s, total: 2min 42s
Wall time: 1min 52s


Building the Classifier


Now that we have a model that is capable of identifying positive and negative tweets, it is now time to build the classifier. The task of the classifier is to determine the sentiment score of a tweet.


A tweet with a sentiment score closer to 0 would be considered to have a negative sentiment, and one with a score closer to 1 would be considered to have a positive sentiment. Following were the threshold limits and the values I have set for the sentiment scores.

  • 0.0 - 0.4: NEGATIVE

  • 0.4 - 0.7: NEUTRAL

  • 0.7 - 1.0: POSITIVE

In this case, any tweet with a sentiment score of less than 0.4 would be considered to have a negative sentiment.

We can test our classifier out with different texts and see the sentiment that we get.


predict("I love the music")

{'label': 'POSITIVE',
 'score': 0.9656286239624023,
 'elapsed_time': 0.4439425468444824}

predict("I hate the rain")

{'label': 'NEGATIVE',
 'score': 0.010753681883215904,
 'elapsed_time': 0.26644086837768555}


predict("i don't know what i'm doing")

{'label': 'NEGATIVE',
 'score': 0.2742374837398529,
 'elapsed_time': 0.25728774070739746}

Great! Now we have a fully functioning classifier capable of assigning a sentiment score to a tweet and identifying its sentiment.


Building the Chrome Extension


Once the classifier is built, the next step is to build the Chrome extension that reads the text of each tweet. To be honest, this was the most difficult part for me. For starters, I had no idea how to build a Chrome extension. So it took me a couple of days just to read the documentation and figure out how to do it. The next challenge was that this was written using Javascript, which is something I am not as comfortable working with.


Almost every Chrome extension has three main components:

  1. Content script: reads everything in the page

  2. Background script: performing operations in the background

  3. Popup: The menu shown when clicked on the extension icon on the top right.


In our case, we used the content script to read the content on the Twitter home page and get the DOM elements of each tweet and the text associated with it. The content script will be communicating with the model we built and removing the DOMs of tweets having a negative sentiment.


We use the popup to display the text of all of the tweets that were removed. The background script is used as a bridge between the content script and the popup. Once the content script removes a tweet, it sends details of the removed tweet over to the background script, which will store these details in memory. When the user clicks on the extension icon, the popup will communicate with the background script and get details of the tweets that were removed.

Now we have all of the components needed to build our final application. It’s just a matter of putting them all together.

All we have to do now is make our Chrome extension pass in the text of each tweet to the classifier. The simplest solution to this is to make a web server that bridges between these two components, and that is exactly what I did.


I built a web server using Flask for Python. The main reason for doing this is the fact that my classifier is written using Python, and it would be much easier if my web server can communicate with the classifier.


Then I sent text details of the tweets loaded into the DOM from the content script in the Chrome extension to the web server using Ajax.



Now that we have put everything together, it’s just a matter of opening up Twitter on our Chrome browser and letting the extension do all of its magic with the classifier and get rid of all the tweets having a negative sentiment.


Now I can enjoy Twitter because I can now see the content I wanted to see. If you managed to get this far, pat yourself on the back because now you can go back to enjoying your life.


If you want to see the full project for yourself, you can check out the project repository on GitHub.


64 views

© 2020 by savidude.com

  • Twitter
  • github_edited
  • LinkedIn