Validate an existing model training project for Text Classifier

Updated 

In this article, we will guide you through the process of validating an existing model for Sprinklr's Text Classifier.

Sprinklr's Text Classifier can categorize messages that match a specific query or come from a particular account without the need for any rules or keyword lists. With Text Classifier, you can filter out common terms that are not relevant to your brand, product, or industry.

To validate an existing AI Model, you must create a new project. If you want to create a new training project for the Text Classifier model, please refer to the article Create a new model training project for Text Classifier in AI Studio.

The following diagram illustrates the workflow for validating an existing Text Classifier model.

Enablement note: Use of this feature requires that AI Studio be enabled in your environment. AI Studio is a paid module, available on demand. To learn more about getting this capability enabled in your environment, please work with your Success Manager.

To create a new validation project

  1. Click the New Tab icon. Under Sprinklr Insights, click AI Studio within Learn.

  2. On the AI Projects window, click Create New Project in the top right corner.

  3. On the Create New Project window, enter a name for your project and select Existing Model Validation as the project type.

  4. After selecting the project type, you need to select Text Classifier as your model. You will have two options: Text Classifier (Single Select) and Text Classifier (Multi Select).

    • Choosing the Text Classifier (Single Select) option supports only one classification on each message.

    • Choosing the Text Classifier (Multi Select) option supports more than one classification on each message.

  5. Enter a description of your project. This step is optional.

  6. Select the Start Date and End Date for your project –

    • Start Date: The project will become active at this date.

    • End Date: Validation and Training on project will be disabled after this date. This is optional.

  7. As an optional step, you can either select Project tags from the dropdown or enter new tags you want to add to your project.

  8. After you select the Text Classifier Model in the Select Model option, you will get additional Messages and Classifications related options which you would need to fill.

  9. Under the Select Messages for Model Training section, you can define your selection options for the messages that you will train. For more information, see Data Source of Messages – Terms & Descriptions.

  10. Under the Share Project section, add collaborators who can classify the messages for training and validate the predictions of the AI Model. To do that, select the user(s) and user group(s) as recipients.

  11. Under the Message Custom Properties section, define your custom fields' selection options for the messages that you will train (if any).

  12. After verifying the input for all options, click Create Project in the bottom right corner.

A new Text Classification Validation Project will be created as per the defined criterion. This can be accessed via the AI Projects Record Manager. (If the new project is not reflected, click on the refresh icon).

Note: You must wait for the project to be processed. Once the project status is updated from Processing to Processed, you can start classifying the text messages.

Data Source of Messages – Terms & Descriptions

Data Source

Description

Language

Select the desired language in which you want to train your model. Refer to the list of supported languages.

Message Start Date

Select the start date for your classified messages. The project will only propose messages that were created after this date to classify.

Message End Date

Select the end date for your classified messages. The project will not propose messages that were created after this date to classify.

Classify Sample Size

This is the number of messages that will be fetched to the classify messages form (defaulted to 5000, maximum of 9000 messages).

Sampling Type

The sampling type is always Random.

Sample Size

This is the number of messages that will be fetched to the review messages form (defaulted to 200, minimum of 200).

Filter

First, select the condition, and then select the values. For example, if you want to filter the messages by Topics, select Topics as condition and then select Topic(s) as the values.

The available conditions are – Review Source, Topics, Topic Groups, Topic Tags, Themes, Theme Tags, Domain Lists, Domain List Tags, Keyword Lists, Channels, Account, Account Group, Message Type, Media Type, Post Type, Data Ingestion File Name, and Data Ingestion Import Tag.

Note:

  • You can select multiple conditions at a time.

  • Conditions are consistent across Text Classifier (Single Select) or Text Classifier (Multi Select) models.

Message Custom Properties

Select the custom field values as available in your environment.

Note: You can select multiple values at a time.

List of languages supported in Text Classifier

  • Afrikaans

  • Albanian

  • Amharic

  • Arabic

  • Armenian

  • Azerbaijani

  • Basque

  • Bengali

  • Belarusian

  • Bihari

  • Bosnian

  • Breton

  • Bulgarian

  • Cebuano

  • Catalan

  • Cherokee

  • Chinese

  • Chinese (Traditional)

  • Croatian

  • Czech

  • Danish

  • Dutch

  • English

  • Estonia

  • Finnish

  • French

  • Frisian

  • Galician

  • Ganda

  • Georgian

  • German

  • Greek

  • Gujarati

  • Haitian

  • Creole

  • Hausa

  • Hebrew

  • Hindi

  • Hmong

  • Hungarian

  • Icelandic

  • Indonesian

  • Inuktitut Irish

  • Italian

  • Javanese

  • Japanese

  • Kannada

  • Kazakh

  • Khmer

  • Kinyarwanda

  • Korean

  • Kurdish

  • Kurmanji

  • Kyrgyz

  • Lao

  • Latvian

  • Limbu

  • Lithuanian

  • Macedonian

  • Malagasy

  • Malay

  • Malayalam

  • Maltese

  • Maldivian

  • Marathi

  • Myanmar

  • Nepali

  • Norwegian

  • Oriya

  • Papiamento

  • Persian

  • Polish

  • Portuguese

  • Punjabi

  • Pashto

  • Romanian

  • Russian

  • Scottish Gaelic

  • Serbian

  • Sindhi

  • Sinhalese

  • Slovak

  • Slovene

  • Somali

  • Sorani Kurdish

  • Spanish

  • Swedish

  • Filipino

  • Tamil

  • Telugu

  • Thai

  • Tibetan

  • Turkish

  • Ukrainian

  • Urdu

  • Uyghur

  • Uzbek

  • Vietnamese

  • Welsh

  • Xhosa

  • Yiddish 

  • Zawgyi

Best practices

Here are some recommended best practices to follow – 

  • Utilize the First Party Data Ingestion (FPDI) feature to efficiently classify messages in bulk. 

  • Aim to classify a diverse range of messages that are not too like one another. For instance, messages that contain identical text but different emojis are not considered unique for training purposes. 

  • Keep in mind that AI Studio can only train or predict messages in one language at a time. Therefore, it is advisable to create multiple projects when dealing with large-scale multilingual datasets. 

  • Carefully choose the sample size for your project, as it should not be too close to the total number of classified or ingested messages.  This is to ensure that we have enough messages remaining after the pre-processing* step.

  • When creating a new project, ensure that message-level custom properties are entered correctly. Note that these properties cannot be edited later when modifying project details. 

  • The fields that you can edit are Messages Start Date, Messages End Date, Sample Size, and Source. 

*Pre-processing: It is worth noting that the system performs several pre-processing tasks on the backend to clean and standardize the messages. These tasks typically involve removing hashtags, emojis, multiple spaces, and punctuation marks, among other things.

Once the pre-processing tasks are complete, the system then removes any duplicate messages. This means that two distinct messages that have the same text but different emojis, for example, will eventually be considered duplicates after pre-processing and removed from the dataset. Examples are given below –

  • Saving for a trip to Disney! $AmazonFindOfTheMonth #savings 🥰😘

  • Saving for a trip to Disney! $AmazonFindOfTheMonth #savings 😇😍

It is important to keep this in mind when classifying the data, as the removal of duplicates can impact the final set of messages. However, by removing duplicates, we can ensure that the data is clean and standardized, which is essential for accurate analysis and modelling.