Sentiment Analysis—extracting emotion through machine learning
Our lives have grown considerably easier because of technological advancements such as smartphones, smartwatches, tablets, and other such devices.
We can stay on top of everything going on in the world. Please tell me about the weather outdoors, Google. Siri, please inform me where the nearest ATM is located. However, no Gadgate or Siri can tell us our mood, how we’re feeling today, or our emotions. But advances in sentiment analysis and machine learning are bringing these computers closer to answering such inquiries.
Let’s take a look at an example of a product review. “I loved this product.” If we were asked to rate this statement about a specific product on a scale of 1 to 10, with 0 being negative and 10 being positive, we would all agree that this is a positive statement and give it a score of 9 to 10 out of 10. If we alter the verb from love to like and say, “I liked this product”, it is still a good review, but with a lesser score, perhaps 6/7. Let’s imagine the comment reads, “I hated this thing”. This is plainly a negative comment, and the product would receive a 0/1 rating.
How do these machines learn, you might wonder?
Picture credit: Automate Intellect
Well, machine learning is similar to a mathematical function that takes some or many numbers as inputs and gives some numbers as an output. These ML models are often neural networks that simulate the structure of our brains to get inputs and their associations to build models predicting future inputs. Machines understand only binary language (1 and 0 ) so whatever we feed them(text, images, audio, or video), first it converts those
inputs into numbers.
How sentiment analysis is different from machine learning?
How does it actually work?
- Step 1:
Tokenization is dividing the set of statements into different sets of words. So the above statement will be broken into
- Step 2:
To clean the text by removing special characters like *, #, !, (, ^, $, ), etc. from the data.
In the above tokens
– ! ( This is a special character word so we can remove it )
- Step 3:
To remove stop words like he/she/the/a that do not add much value to our analytics part. Or to remove any other words which do not add any value to the analytics part from the above tokens we can remove below stops words
- Step 4:
Classification: Here our task is to classify the words into positive, negative, and neutral categories. We assign some scores based on the classification like for positive word we assign some +ve score and for neutral it will be around 0 and for negative we assign some -ve score. This is exactly how machine learning comes into the picture. These sentiment scores can be assigned using a machine learning model. We can model our data using a bag of words or lexicons which is nothing but a pre-classified set of words and now once the model is trained then we can perform the test on an analysis statement and the more the accuracy score the better will be the classification. For below tokens
– Product: This is a neutral word so here the score would be 0
– Great: This is a positive word so here we can assign > 0 sentiment score
- Step 5:
Final calculation of sentiment score for the whole sentence. We sum the sentiment scores of each meaning full analytical clean tokens within a sentence to get the overall sentiment of a sentence. For the above example if we add Product(0)+Great(>0) overall we get the sentiment positive sentiment score. Now we can say that as the polarity is more than zero this is a positive sentence.
We can use different python packages to get the sentiment scores for the given sentences.
Few libraries are :
NLTK (Natural Language Toolkit), TextBlob, SpaCy, Gensim
Different approaches of sentiment analysis
There are different approaches to achieve sentiment analysis like
- Sentence level sentiment analysis
- Document-level sentiment analysis
- Entity level sentiment analysis
- Phrase level sentiment analysis
- Feature level sentiment analysis
Why sentiment analysis is important?
Finding the relevant text with the right context from a huge text data is a difficult task if we do it manually, it is doable but will be extremely inefficient. Emotions and attitudes towards a topic can become actionable pieces of information useful in numerous areas of business and research. As the technology develops, sentiment analysis will be more accessible and affordable for the public and smaller companies as well.
Sentiment analysis is used to track and analyze social phenomena, as well as to identify potentially problematic situations and gauge the mood of the blogosphere. It is used in the political field to keep track of political views and to find consistency and inconsistency between government claims and actions. It can also be used to forecast election results! Companies in the marketing area utilize it to establish strategies, analyze customers’ attitudes about products or brands, how people react to campaigns or new launches, and why consumers don’t buy certain things.
Below are a few business/applications where sentiment analysis is being used
- Society: Schools, Elections, Debate, etc
- Security: Attacks, threats, disasters
- Travel: Airline, restaurants, food, hotel
- Finance: Advertising, brand, sales, firms, banks, financial forecasting, software projects, the stock price
- Medical: Disease, health, patients, healthcare, drugs, suicide, depression
- Entertainment: Books, IMDb(international movie database), television programs, game, player, newspaper, soccer, fan box office
- Others: Citations analysis, Education, traffic, crowdsourcing
Product reviews on social media, to medication side effects, to cyberbullying, to spam detection
Picture credit: Columbia University
- 7000 research papers published to date making it one of the most researched areas
- It is having really good solutions available – as usual-NLP requires a custom solution to every problem
- Anger and happiness – simple polarity will help
- Anger and Grief- Emotions are not just two hence, sentiment is much more than polarity
- The large volume of data sorting: It is a very difficult task to manually sort thousands of tweets, customer reviews, product reviews or surveys. Sentiment analysis helps businesses to process huge amounts of data in an efficient and cost-effective way.
- Real-Time Analysis: Sentiment analysis models can help us immediately identify critical issues in real-time.
- More recent advantages:
- Arabic, Indian, Sanskrit, and Chinese languages gaining popularity
- Product reviews, customer reviews were common earlier, social media particularly twitter and facebook common these days
Sentiment analysis is one of the hardest tasks in natural language processing because even humans struggle to analyze sentiments accurately.
- Subjectivity: There are subjective and objective texts where subjective texts contain explicit sentiments whereas objective does not have sentiments.
- Context: Analysing sentiment without context is very difficult. The machine can’t learn on its own without being explicitly mentioned.
- Sarcasm: Sometimes people express their negative sentiments using positive words which can be difficult for machines to understand.
- Comparisons: Sometimes it is difficult to analyze comparative sentences like “It is better than the previous one” or “This is nothing to none”.
- Emojis: Emoji play an important role in the field of sentiment analysis and it is a bit difficult to analyze sentiments from emojis.
Opinion mining and extraction are other terms for sentiment analysis. Sentiment analysis can be used to calculate, recognize, and communicate sentiment in a variety of domains. This blog explains sentiment analysis, how it works, and why it’s important in a variety of areas, sectors, and e-commerce businesses.