Review De-Duplication in Product Insights

Updated 

What is reviews deduplication?

Customers write reviews for the products or services they purchased or used or had experience with, on e-retailer sites such as amazon.com, bestbuy.com, target.com, samsung.com, and others. These reviews are an important source of customer feedback to understand what product attributes are being liked/disliked by users of your products/services.

When brands aggregate reviews across multiple such websites there are cases of duplication due to the following reasons:

Syndicated Reviews

  • A review from a website is syndicated to another website.

  • For example - a review originally posted on samsung.com might be syndicated to walmart.com

Variant reviews

  • A review for a particular product is posted across all other variants on the same website.

  • For example - a review originally posted for iPhone 64 GB variant on amazon.com might be duplicated for iPhone 14 128 Gb variant on the same website.

Sprinklr allows users to identify and filter out duplicated reviews to understand the correct measure of the Share of Voice and also reduce cases of redundant feedback getting generated from duplication of such reviews.

How do I remove duplicate reviews in Sprinklr?

To filter out all duplicate reviews (syndicated & variant) from the feed, use the following filter:

Show Duplicate Reviews → not containing → True

The above field is a combination of two fields that individually identify syndicated and variant reviews:

  • Is review syndicated: This field when marked ‘True’ indicates that the review was originally posted on another website.

  • Is review duplicate: This field when marked ‘True’ indicates that the review is already posted for some other variant product from the same domain and is available in Sprinklr. A review is marked as duplicate when the following properties are an exact match:

    • Author Name

    • Review Title

    • Review Text

    • Review Create Time

    • Star Rating

Review de-duplication happens every 1 hour. So, all reviews are marked as duplicates or non-duplicates within 1 hour of them reflecting in the dashboard.

Key Points

While the two filters combined remove most of the duplicate reviews, there are certain edge cases where similar reviews might not be marked as duplicates:

  • Review Content is not an exact match: If the review content contains additional text, it is not identified as a variant review. For a review to be identified as duplicate, the text should be an exact match.

  • Different review properties: There may be cases where review content is exactly the same but other properties such as title, author name or Star Rating are different.

  • Syndication information is not present on a website: Most websites explicitly declare if a review has been syndicated from another website. In cases where there are no such flags, the syndication field will be marked as ‘False’.

  • Additional Spaces and Paragraphs in the review text: If the review text contains additional spaces or paragraphs then it is not identified as a duplicate review.

  • All reviews marked as duplicates: If a review is syndicated from a source website, even if the original website/product URL is not available as a data source in Sprinklr, all the syndicated reviews would be marked as duplicates.