Sources FAQs

Updated 

Below are some of the frequently asked questions about Sources in Media Monitoring & Analytics –

Yes. If the customer has both listening and MM&A, the listening data is enriched with MM&A metrics, such as Impact, Influence, Web shares on Twitter/ Facebook/ Reddit, Similarweb Metrics such as Potential Reach and Global Rank, Publication, TV channel, Print Source Name, etc.

Media Monitoring & Analytics (MM&A) takes data from the global Sprinklr database which consists of data from the Twitter firehose (partial), News firehose, Facebook (public pages), Reddit, TV, and Print mentions that are pulled into the Sprinklr database.

All the Twitter mentions from verified profiles are considered for the story clustering algorithm whereas the entire Twitter firehose is considered for social shares estimation (i.e. if a tweet contains a news article URL).

On top of these, MM&A does offer some additional sources:

  1. Financial Times (Available by default)

  2. 1500+ web sources from NLA (Available by default)

  3. 5000+ print and web sources from Factiva (Available as a separate SKU)

  4. TVeyes (Available as a separate SKU)

No. It is a separate product and available to be purchased separately within Sprinklr’s Sprinklr Insights product family.

  1. News: Text content as available on the online native site and whatever is legally permitted for crawling.

  2. Print: Text transcripts as printed on the offline print source (physical newspaper or magazine).

  3. TV: Transcripts of the TV Broadcast via voice to text transcription or closed captioning technology.

  4. Radio: Transcripts of the audio recordings via voice to text transcription.

Our data providers provide data for our traditional sources like Online News with a general latency of less than an hour. More than 80% of the news data comes within the first hour of the article being published. Latencies for Print depends on the publication.

  • The categorization of whether a mention is a news or a blog is taken care of when the data is extracted from the web

  • Crawlers are written for each section of a site. So all articles that are extracted from a news domain, for example the domain "cnn.com" are labeled as NEWS

  • In general, News data is set to be clean. For example, if there is a WordPress blog with an individual's personal thoughts, it would not be eligible for addition to the News content set.

  • It is to be noted as News and Blogs are overlapping categories and there is some grey area. Some News sites have hosted blogs and some blogs have developed into full News sites. If a blog has Editorial oversight of content (a team of writers and editors), it might be eligible as News.

Yes, as the database for both listening and MM&A are the same, completing the source verification will add the source coverage in MM&A as well. Backfill of a newly added source is not possible in MM&A. For further details, please refer to Source verification FAQs.

We constantly expand the number of web sources that are available for monitoring so that you don’t miss a mention:

  • We continuously add the web sources that are requested by our clients. Once a source is added, we ensure the data from the newly added sources are available for all our clients.

  • An inhouse team that dedicatedly works on identifying new domains that are not already covered within Sprinklr. Once identified these sources are added for coverage.

  • Sprinklr also works on building partnerships directly with global data vendors or publications to bring in exclusive paywalled/premium content. For example, Financial Times and NLA feed.

We obtain a certain list of print sources from our data vendors as part of our vendor partnership contract. Our data vendors in turn have contracts with print media houses to procure the transcripts of the print data. The availability of new print sources depends on API availability as well as on-boarding the new publishers depends on licensing and other legal aspects. Sprinklr can take requests for new print sources and pass on to our vendors. However, it cannot be added immediately or ad hoc as it depends on how the relationship is set between the vendor and publisher.

We also continuously work towards building partnerships with vendors specialising in print content to continuously expand print coverage.

Different factors go behind in setting up the location for a news/blog site. Majorly, the location is tagged at the parent domain level. Assigning country to the parent domain based on the below points in priority:

  1. Countries are determined based on the ccTLDs (country code top-level domains). For example, If the publication is the dailymail.co.uk → Country=United Kingdom

  2. Countries are determined based on the ccTLDs present in the URL (country code top-level domains). For example, if the article URL is www.cnn.com/uk/this-is-the-article-name.html → Country=United Kingdom

  3. Country can also be tagged based on the publication’s headquarters available on the "About" page. This is performed by human taggers.

  4. There could be certain domains that are hard to code, in spite of the above heuristics. In that case, the best human judgement is taken to codify the country. Hence, classification errors could be present.

Also, it is important to know that ambiguous news sites are not assigned any country tag. This means filtering based on countries can result in lower mention counts.