Sentiment Analysis—extracting emotion through machine learning

September 9, 2021 / by Mahavir Ambekar in Data Science

Our lives have grown considerably easier because of technological advancements such as smartphones, smartwatches, tablets, and other such devices.

We can stay on top of everything going on in the world. Please tell me about the weather outdoors, Google. Siri, please inform me where the nearest ATM is located. However, no Gadgate or Siri can tell us our mood, how we’re feeling today, or our emotions. But advances in sentiment analysis and machine learning are bringing these computers closer to answering such inquiries.

Introduction

Let’s take a look at an example of a product review. “I loved this product.” If we were asked to rate this statement about a specific product on a scale of 1 to 10, with 0 being negative and 10 being positive, we would all agree that this is a positive statement and give it a score of 9 to 10 out of 10. If we alter the verb from love to like and say, “I liked this product”, it is still a good review, but with a lesser score, perhaps 6/7. Let’s imagine the comment reads, “I hated this thing”. This is plainly a negative comment, and the product would receive a 0/1 rating.

Picture credit: KDnuggets

How do these machines learn, you might wonder?


Picture credit: Automate Intellect

Well, machine learning is similar to a mathematical function that takes some or many numbers as inputs and gives some numbers as an output. These ML models are often neural networks that simulate the structure of our brains to get inputs and their associations to build models predicting future inputs. Machines understand only binary language (1 and 0 ) so whatever we feed them(text, images, audio, or video), first it converts those
inputs into numbers.

One such example of a machine learning model is a classification of dogs and cats. Machines do not understand like humans about how cats or dogs look, however machines understand the numbers. We need to train the ML classification model with a different set of images of cats and dogs which are labeled. Then it extracts some features from those images and remembers those features and makes some association. So whenever we feed a new image the model automatically gives the probability whether the image is a dog or cat based on the association of features it has made on the trained data of images.

How sentiment analysis is different from machine learning?

We can’t utilize the same strategy for sentiment analysis as we used for photos, audio, or video. Because a single word in the text might have multiple meanings, it fails to discern semantic similarities between words. This approach, for example, fails to notice the resemblance between terms like “love” and “like,” despite the low similarity between “love” and “hated”. This is where word vectors, one of the most important ideas in semantic research, come into play. It aids in the evaluation of word relationships.  By adding more and more dimensions to these word vectors we are able to express the relationships between words. Once we have these word vectors we can associate them to the words in a sentence and convert them into numbers. We can feed these statistics to machines, which will be able to forecast the sentiment of any given statement.

How does it actually work?

Let’s understand it step by step with an example, “The product was great!”. The very first step here would be using the tokenization
  • Step 1:
    Tokenization is dividing the set of statements into different sets of words. So the above statement will be broken into
    – The
    – Product
    – Was
    – Great
    – !
  • Step 2:
    To clean the text by removing special characters like *, #, !, (, ^, $, ), etc. from the data.
    In the above tokens
    – The
    – Product
    – Was
    – Great
    – ! ( This is a special character word so we can remove it )
  • Step 3:
    To remove stop words like he/she/the/a that do not add much value to our analytics part. Or to remove any other words which do not add any value to the analytics part from the above tokens we can remove below stops words
    The
    – Product
    Was
    – Great
  • Step 4:
    Classification: Here our task is to classify the words into positive, negative, and neutral categories. We assign some scores based on the classification like for positive word we assign some +ve score and for neutral it will be around 0 and for negative we assign some -ve score. This is exactly how machine learning comes into the picture. These sentiment scores can be assigned using a machine learning model. We can model our data using a bag of words or lexicons which is nothing but a pre-classified set of words and now once the model is trained then we can perform the test on an analysis statement and the more the accuracy score the better will be the classification. For below tokens
    – Product: This is a neutral word so here the score would be 0
    – Great:   This is a positive word so here we can assign > 0 sentiment score
  • Step 5:
    Final calculation of sentiment score for the whole sentence. We sum the sentiment scores of each meaning full analytical clean tokens within a sentence to get the overall sentiment of a sentence. For the above example if we add Product(0)+Great(>0) overall we get the sentiment positive sentiment score. Now we can say that as the polarity is more than zero this is a positive sentence.

We can use different python packages to get the sentiment scores for the given sentences. 

Few libraries are :

NLTK (Natural Language Toolkit), TextBlob, SpaCy, Gensim

Different approaches of sentiment analysis

There are different approaches to achieve sentiment analysis like

  • Sentence level sentiment analysis
  • Document-level sentiment analysis
  • Entity level sentiment analysis
  • Phrase level sentiment analysis
  • Feature level sentiment analysis

Why sentiment analysis is important?

Finding the relevant text with the right context from a huge text data is a difficult task if we do it manually, it is doable but will be extremely inefficient. Emotions and attitudes towards a topic can become actionable pieces of information useful in numerous areas of business and research. As the technology develops, sentiment analysis will be more accessible and affordable for the public and smaller companies as well.

Where do we use Sentiment analysis?

Sentiment analysis is used to track and analyze social phenomena, as well as to identify potentially problematic situations and gauge the mood of the blogosphere. It is used in the political field to keep track of political views and to find consistency and inconsistency between government claims and actions. It can also be used to forecast election results! Companies in the marketing area utilize it to establish strategies, analyze customers’ attitudes about products or brands, how people react to campaigns or new launches, and why consumers don’t buy certain things.

Below are a few business/applications where sentiment analysis is being used

  • Society: Schools, Elections, Debate, etc
  • Security: Attacks, threats, disasters
  • Travel: Airline, restaurants, food, hotel
  • Finance: Advertising, brand, sales, firms, banks, financial forecasting, software projects, the stock price
  • Medical: Disease, health, patients, healthcare, drugs, suicide, depression
  • Entertainment: Books, IMDb(international movie database), television programs, game, player, newspaper, soccer, fan box office
  • Others: Citations analysis, Education, traffic, crowdsourcing

Product reviews on social media, to medication side effects, to cyberbullying, to spam detection


Picture credit: Columbia University
  • 7000 research papers published to date making it one of the most researched areas
  • It is having really good solutions available – as usual-NLP requires a custom solution to every problem
  • Anger and happiness – simple polarity will help
  • Anger and Grief- Emotions are not just two hence, sentiment is much more than polarity

Benefits

  • The large volume of data sorting: It is a very difficult task to manually sort thousands of tweets, customer reviews, product reviews or surveys. Sentiment analysis helps businesses to process huge amounts of data in an efficient and cost-effective way.
  • Real-Time Analysis: Sentiment analysis models can help us immediately identify critical issues in real-time.
    • More recent advantages:
    • Arabic, Indian, Sanskrit, and Chinese languages gaining popularity
    • Product reviews, customer reviews were common earlier, social media particularly twitter and facebook common these days

Challenges

Sentiment analysis is one of the hardest tasks in natural language processing because even humans struggle to analyze sentiments accurately.

  • Subjectivity: There are subjective and objective texts where subjective texts contain explicit sentiments whereas objective does not have sentiments.
  • Context: Analysing sentiment without context is very difficult. The machine can’t learn on its own without being explicitly mentioned.
  • Sarcasm: Sometimes people express their negative sentiments using positive words which can be difficult for machines to understand.
  • Comparisons: Sometimes it is difficult to analyze comparative sentences like “It is better than the previous one” or “This is nothing to none”.
  • Emojis: Emoji play an important role in the field of sentiment analysis and it is a bit difficult to analyze sentiments from emojis.

Conclusion

Opinion mining and extraction are other terms for sentiment analysis. Sentiment analysis can be used to calculate, recognize, and communicate sentiment in a variety of domains. This blog explains sentiment analysis, how it works, and why it’s important in a variety of areas, sectors, and e-commerce businesses.

Leave a Reply