Sentiment analysis with Hugging face

satish1v
3 min readMay 25, 2021

Hugging Face is an NLP library based on deep learning models called Transformers. We will be using the library to do the sentiment analysis with just a few lines of code.

In this blog post, we will use the pre-trained model or the shelf model.

You can install the library using Pip, Which is similar to Nuget, NPM, and cargo.

How it Works

Once you have installed the library, You need to create the pipeline. For Eg, if you want a sentiment analysis pipeline.

Similarly, you can create for

  • Text generation (in English): provide a prompt, and the model will generate what follows.
  • Name entity recognition (NER): in an input sentence, label each word with the entity it represents (person, place, etc.)
  • Question answering: provide the model with some context and a question and extract the context's answer.
  • Filling masked text: given a text with masked words (e.g., replaced by [MASK]), fill the blanks.
  • Summarization: generate a summary of a long text.
  • Translation: translate a text into another language.
  • Feature extraction: return a tensor representation of the text.

Courtesy: Hugging face

We are going to run our data across the model. Let us with the data. You can find out how I got the data from this Blog Post.

To get a glimpse of the data

So we will be using loading the data using pandas. This contains two main fields ratings which are review text which we pulled from amazon about a product and ratings are human-annotated.

Transformer Model-Based Sentiment analysis

As shown above we can get the score and label by passing the string through the model. Using the same we will pass the data frame reviews to the model

Once we got the Label and score we can compare it with the human-annotated

Results

To calculate our method’s efficiency, we need to compare the human reviews(1 to 5) to model reviews(-1 to 1).

To solve this impedance mismatch, let us find common ground by normalizing the data(predicated vs. human) to Negative(0), Neutral(1), and Positive(2)

To normalize the predicated value, we will be using the following function. This function simply takes the Label and score and determines if the review is negative, neutral, and Positive.

To Normalize target(human) data, we will do similar to the previous method.

Visualize the Results

Once we normalized the data, we can find out the accuracy of the model. Before that, we need to understand what accuracy is in machine learning.

The output of this function looks like this.

In a much better visual form.

In the previous blog post, we have seen using a Dictionary-based sentiment analyzer we got 41% accuracy. Now with Transformer based we got 63 %.

In the upcoming blog post, we will fine-tune the model so we can improve the accuracy of the model.

--

--