Source verification FAQs

Updated 

Get answers to the most frequently asked questions about verifying sources supported by Sprinklr.

Source Verification in Sprinklr is a process to determine if Sprinklr’s Listening can pull data from specific sources.

Source verification is a two-step process.

  • Verify whether a particular list of domains/URLs fetches data into Sprinklr’s Listening database.

  • Get the domains and URLs added to the listening coverage if they are not fetching any mentions.

Source verification is not only important for clients but also for Sprinklr Insights in expanding the overall spectrum of Listening sources. Source verification serves the following use cases at different stages of the customer lifecycle –

  1. Pre-sales

    • When a client wants to understand how robust our offering is in comparison to their current tool.

    • When a client/prospect has certain sources they really care about and do not want to miss any data coming in from those sources.

    • When a prospect/client has a paid blogging/ influencer marketing program and wants to verify blog coverage.

  2. Enablement: Source verification helps enablement teams when:

    • Clients want to listen to a set of domains for their brand, marketing campaign, or product-related use-cases.

  3. Post-enablement: When a client is expanding the listening rollout to a new internal team, and/or a new country. This process is when there is a requirement to add specific review threads to the coverage stream if the client is launching a new product/ product line.

  4. When clients complain about low data volume.

Source Verification is a simple two-step process:

  • Open this link, copy this sheet, and add domains, corresponding media types, and relevant fields needed for source coverage and addition.

  • Use this form to submit the request. Do not forget to mention the link to the Google sheet in the form.

Note: It is recommended to make the status editable for Sprinklr Employees so that SV teams can update the status on the sheet and avoid unnecessary mails.

Here are the relevant contacts:

  • Harsh Jain – For General SV-related queries/status updates

  • Piyush Pratap Singh – For Escalations

No, we only support data going forward in normal listening via the SV process. 

Note: After SV is completed for the requested domain/URL, the data is expected to be available from the date of addition (i.e., going forward data only). This applies to all the sources processed via the SV process in general

Date of publishing – when raising a request for coverage for any media type, it is crucial to ensure that the full date of publication is available for the requested site. The date of publishing is a mandatory field required by our data vendor, and its absence or incomplete date can result in the inability to achieve coverage. Please ensure that you thoroughly check and verify the availability of the publication date on the site before submitting any media-type requests.

The Source Verification team runs a script on the provided domains/URLs to check if the submitted domains/URLs are already covered or not. The status on already covered domains is provided within 3-4 working days after the Source Verification request is submitted.

The domains/URLs which are not covered by Listening are sent to our partner data vendor. Our data vendor analyzes the submitted domains and URLs, develops custom parsers, and pushes data to Sprinklr’s Listening database.

No. The site must have a valid media type, i.e. it must contain either of the following –

  • Blogs

  • Forums

  • News

  • Reviews

  • YouTube Channel

  • Keywords for YouTube Search

No. Adding a domain/ URL with a valid media type (blogs, forums, news, reviews) depends on many factors. Sprinklr’s Source Verification process doesn’t guarantee that all submitted sources will be added. The success managers, delight team, implementation, or solution consultants must set client expectations accordingly.

The Source Verification process is generally a time-consuming process. Depending on the volume of domains submitted for the request, it can take anywhere from 2 weeks up to 4 weeks to add uncovered domains.

Existing coverage results are generally provided within 4-5 working days after the SV request submission. Additions of uncovered domains and URLs are then submitted to the data vendors for coverage. Sometimes source coverage may take longer than usual due to the unforeseen large number of client requests and limited data vendor coverage.

Therefore, it is always suggested to set the right client expectations around source-verification timelines. It is crucial to raise the requests in due time for any critical upcoming campaigns or product launches to avoid misses in the data during the launch period.

The coverage and latency can vary for sites and media types (forums, blogs, news, reviews, videos). As Sprinklr sources data from multiple data vendors, the crawling capabilities of parsers vary for different vendors. Following are the latency numbers for different sources –

  • News: Instant to 12-24 hours

  • Blogs & Websites Comments: 2-6 hrs to 24 hours

  • Forums: 5-10 minutes to 24 hours

  • Reviews: less than 24 hours to 2-3 Days

Following is the entire process for Source Verification –

  • SC/SM/MS team members submit the Source Verification Request using this form.

  • The form inputs are then sent to the Asana board for request management purposes.

  • Our SV Team members run an initial RAC to check which domains are already covered with Sprinklr.

  • Initial response (for above) is provided within 3-5 working days.

  • We send the sources not present in Sprinklr to our Data Vendor for coverage.

  • Blogs, forums, reviews, news, and YouTube channels/ videos are the valid Media types for domains.

  • Once submitted to the vendor, they analyze each domain and write custom parsers to fetch the data. This process is lengthy and time-consuming as the parsing squad has to go through each domain manually and decide which sections to cover based on the requested media type.

  • The above process is done based on priority status marked on the domains & date of the request submitted, as we get close to 20-25 requests per week for source coverage.

  • Our team keeps a close eye on the previously raised requests in hindsight. As soon as any request has status on all domains, our SV Team members reach out to the SM with the status of coverage (Added, Already Covered, Can't be added, etc.).

  • Also, there are specific points that should be noted for this process.

    • Not all submitted sites can be covered as some domains can block crawlers, some may not have a valid media feed (forums, blog, news, reviews), and some have ancient data on the sites.

    • % Coverage – Our vendors do not claim to get entire data for the covered domains. Due to crawler limitations, there could be misses. Also, some sections might not be fetching data.

No, Facebook page listening does not come under the purview of Source Verification.

FB deprecated broad keyword-based search API long ago. Any listening topic query based on Keyword search criteria will not fetch from FB as there is no API exposed. This is not a Sprinklr limitation but a Facebook channel limitation. This is also true for our competitor tools in the market.

However, we have a few ways we can still enable our brands to listen to FB posts –

  • Account Data: The data grabbed in Sprinklr via the authenticated FB accounts on the Core side. Creating an Account Topic should bring in their "owned" data within Listening.

  • Facebook Profile Listening: This enables you to add Facebook Page URL or Owned FB Group URL in the "Include URLs" section of the topic and listen to it.

  • Keyword-based Facebook Public Data Listening: We have a vast repository of FB data that we have pulled in across all of our partners for years (and continuing). In addition to this, we can build keyword-based topic queries to pull in mentions.

No, Instagram Hashtag listening does not come under the purview of the Source Verification Process. Users need to register the hashtag (30 per partner) and set up topics containing those hashtags to fetch mentions from public Instagram accounts.

Yes, video posts and comments from YouTube can be added after source-verifying their channels or videos of interest. Keywords of interest can also be submitted for Source Verification. However, please note that, in this case, fetched data would be a representative set of entire data on native due to API rate limit restrictions.

Contact Piyush Pratap Singh to prioritize the request. If the number of domains is <20, we can try to get them added within 7 working days, but if the volume of submissions is high, the SM/Account Executive must inform the client about the realistic timelines.

There could be multiple reasons for a domain not to fetch mentions –

  • There is no new content on the added domain (from the date of addition) – To be checked by SC/ SM/ Client.

  • Incorrectly set up the domain-based topic. It is always suggested to use a substring identifier for URLs to fetch mentions.

  • There are no reviews present on the review website.

  • The vendor has not customized the site properly or pushed the data. (Raise an issue only if the above three are checked)

No. We receive a fixed set of traditional media sources as a part of our contract with our vendor.

The source addition process is a custom process. We need media types to ensure we cover the right content required by the clients on the websites.

Supported Media Types: News, Blogs, Forums, Reviews, Blog Comments, and YouTube Channel/ Video.

Note: Requests without media types won't be considered for Source Verification.

Sprinklr does not support crawling login-based domains because they do not come under publicly available data. It is non-compliant to use a single client login to ingest data and make it available for all users.

Only a few paywalled sources with a direct partnership with our vendor provide some licensed articles to Sprinklr.

Refer to this sheet for commonly requested review domains. We suggest you remove the irrelevant parts of the URL, end of review URLs containing sort or search terms.

It depends. Our vendors try their best to fetch data from sites. However, if they still cannot do it, the client must set expectations accordingly.

Coverage Status

  • Covered: The source/thread is already in our coverage.

  • Added: The requested thread was added to its associated source's prioritized coverage (usually for reviews).

  • Blocked: Manual checks returned HTTP Status code 4XX-->5XX - content access blocked.

  • Completed: The custom parser was completed, and new content is available.

  • Can't Scan: Various errors - requires login/subscription, no content in the last 30 days, etc.

  • In Progress: Request was submitted to the parsing squad for further handling.

  • Pending QA: Custom parser created, pending check that content arrived in our API as expected.

  • Sample Required: Could not locate relevant content to cover, do this. We need a full path with relevant content you would like to cover.

  • Pending Client: Could not create custom parser due to several issues (broken link, content is in PDF, media type not found, etc.)

  • Done – Auto Parsing: This source is covered by an automatic 1-fit-all parser, usually an RSS extractor.

  • Already Customized: Source already has a custom parser.

Below are the reasons for the Can't Scan status –

  • No newly published content: No content published in the last 30 days, meaning that we can not add the site to our coverage.

  • No access to the date / No pub dates to parse: We only collect content published in the last 30 days, so we won’t collect it if there are no published dates for the content.

  • Source Down/Timed-out: Unable to reach source due to a time-out.

  • Broken link: The link does not reach a website.

  • No content to parse: No News / Blogs / Discussions / Reviews content on the source.

  • Duplicate: Addition request raised for same source / URL.

  • Not allowed by robots: Blocked by Robots.txt.

No. LinkedIn is a channel (unlike web sources), and it does not have APIs for Social Listening.

For the added/covered domains from the Source Verification request –

  • Choose a domain-based topic as a query type in the Topic Builder.

  • Add the covered/ added URLs in the domains field and/or add keywords in the query.

  • Enable fetching for the topic.

  • Backfill from the day of addition.

500+ clients have been using the research cloud for the past five years, and we have been continuously adding sources based on client needs. The list (blogs/websites/news sites) runs into millions and cannot be shared.

Connect with the SV team with the coverage validation of a particular set of sources the customer is looking for.

No, general crawling requires explicit reviews/forums pages to be source-checked. Regardless of whether the review domain is covered or not, it is a requirement to have the specific thread URLs for media-type forums and specific review URLs for media-type reviews submitted for continuous crawling on those reviews/forums pages.