Every now and then, we benchmark against an industry standard to validate our results for sentiment analysis, such as Amazon Product Reviews. The large volume and variety of products in Amazon.com provides an excellent benchmark to test sentiment analysis. We have used machine learning to train sentiments for text in product reviews. This means that whether your product review is from Amazon or not, the sentiment has tested and validated with real product reviews. Data Ninja API includes not only the overall sentiment of a review, but sentiment of individual entities and keywords. You might use the positive entities and keywords in your next blog, marketing materials, or on your website.
We integrated large datasets of Amazon user reviews for different categories of products to validate and improve the current Smart Sentiment and Smart Content services. We wanted to make sure our sentiment analysis has a wide range of sentiments and is not heavily weighted in favor of positive, negative, or neutral. For this blog we will look at sample reviews in 3 categories at Amazon: Clothing, Shoes, & Jewelry, Electronics & Computers, and Amazon Echo & Alexa. The scores for Data Ninja reflect the overall sentiment of the review. We converted the Amazon stars to numeric equivalents to allow the same measurement for a comparison. One star is -1, two stars are -0.5, three stars are 0, four stars are +0.5, and five stars are +1. The stars are assigned by the reviewers, so their text and sentiment may not exactly match our automated service, but will be in the same range. Because Data Ninja is not limited to five data points (stars), the scoring is more precise. You can tell if a review is leaning more in the positively or negatively than the value of the stars. The range around the star value is approximately ½ of the value between the stars or 0.25 positive or negative.
Clothing, Shoes & Jewelry Product Review
We ran the same text in the tutu product review through our Smart Sentiment demo. The tutu product review was a very positive review for both Amazon and Data Ninja API. Data Ninja Smart Sentiment scores each of the individual entities and keywords, and assigns a sentiment to each. These additional data points create a deeper understanding of which words contributed to the overall positive sentiment of the entire text. The keyword score shows how confident this text is about the keyword. This is an Amazon review, so the “Amazon” keyword score is higher than the other keywords.
The sentiment score is related only to the text in this review. Tutu is described as “great”, “great price”, “affordable”, and “isn’t made poorly”. With more mentions of positive attributes, the keyword “tutu” rises up on the positive sentiment scale. The highest positive sentiment score is 1 and 0.97 for tutu is very close. All the keywords are mentioned in the text of the review. All of the keywords found in this text have a positive sentiment and very positive sentiment score.
Smart Sentiment Entity Extraction
Not all keywords are entities. The entity listing below shows entities that are persons, locations, organizations, or things. A++ is a grade and can be applied to a multitute of entities, so it is not listed as an entity in our service. Apparently, there is not enough information on the categories and concepts to have an entry for a tutu at this time. We are constantly updating the concepts and categories, so this will probably change. There is a single entity in this review – Amazon.com. The keyword “Amazon” resolves to the the entity “Amazon.com” and has the exact same sentiment. Countless hours are saved by not having to search for all variations of a keyword and combining them under a single entity. Data Ninja API automatically maps the keywords to entitities. The entities do not need to be in the original text, but are extracted based on their relationship to the keywords.
The final overall sentiment score is +0.90 from Data Ninja Smart Sentiment. Data Ninja matches within the 5 star rating range for the tutu product review.
Electronics & Computers, Wearable Technology Product Review
We wanted to give an example of a negative review at the other end of the sentiment spectrum. Even though the overall sentiment is -0.90 from Data Ninja Smart Sentiment, there are several positive entities. The highest ranking entity score shows how important the entity is to the text. Samsung VR is mentioned several times in this review and is the top ranking entity in the Data Ninja Sentiment results. You may get a different result for the top ranked entity if the product is not mentioned in the text of the review. On the sentiment side, both Google Play and Virtual reality entities are positive. Data Ninja API takes the keywords in the text and shows the most common entity name. In this example, Samsung VR (keyword) is related to the entity name “Samsung Gear VR.” The entity name is not necessary the formal product name, but the name the public uses to identify this entity. The entity name is based on thousands of articles and texts in our knowledge database.
The sentiment of the individual entities and keywords gives clues to words that might be useful in describing your product in advertising or responding to a negative review. The sentiment scores are created from the text, but based on the context. You can get a feel for the general sentiment of the words or phrases from these results.
Amazon Echo & Alexa Product Review
This two star review reveals two competitors to Amazon Tap, Bose SoundLink and the iPad. The iPad is the reason the Amazon Tap was purchased. The Bose SoundLink is the replacement for the Amazon Tap. The sentiment is based on the content in the text, so in this case the Bose SoundLink has a neutral sentiment. This means there about the same sentiment positively and negatively mentioned in the text. In this review, there is only 1 sentence including the phrase, Bose SoundLink. If there were more information about the Bose SoundLink the entity would probably be more positive. The Sentiment Score of -0.02 is within the range of +0.25 and -0.25 to result in a neutral sentiment result for Bose SoundLink.
Additional Domain Areas in Smart Sentiment
Data Ninja API has trained datasets to process sentiment for both News and Social Media in addition to Product Reviews. Each dataset is slightly different to accommodate the variety in the range of sentiments based on the domain. The same product review may have a different outcome when processed in a different domain. All of the results in this blog are from our customer demo. The demo has only a subset of the results and many more are available with APIs. Check out the Smart Sentiment trial API and the documentation or our customer demo.
The initial product review research and validation was done by our research engineer intern, Long Pei.