Pre-processing Pipelines for Intent Model

Updated 

Before You Start

To learn about Intent Model, see How to Create an Intent Model: A Step-by-Step Guide

Steps for Creating an Intent Model

  1. On the Intent Model Creation Screen, Browse to Pre-Processing Pipelines dropdown.

2. Now choose the preprocessors that align most effectively with your use case. For instance, if your training data comprises an abundance of hashtags and emojis, consider utilizing preprocessors such as "Remove Hashtags" and "Remove Emojis" to filter out these elements from your data prior to commencing model training.

Note:

- If you don't select any preprocessors, A standard preprocessoing pipeline would run on your data no matter what.

List of Pre-Processors

Pre-processor

Description

Replace Newlines
 This preprocessor replaces all newline characters ("\n") in the text with a space.

Remove Email IDs

 This preprocessor removes all email ids from the text. Email ids are extracted using regular expressions.

Remove Emojis

 This preprocessor removes old style emojis from the text.

Remove Hashtags

 This preprocessor removes all hashtags in the text. Hashtags are the word phrases preceded by "#" characters in social media posts.

Remove HTML Tags

 This preprocessor removes HTML tags from the text.

Remove Multiple Spaces

 This preprocessor removes multiple spaces in the text with a single space.

Remove New Emojis

 This preprocessor removes all emojis from the text. Emojis are identified using the emoji pattern defined in the code.

Remove Numbers

 This preprocessor removes all numeric characters from the text.

Remove Special Characters & Punctuations

 This preprocessor removes certain special characters and punctuations. It also replaces square brackets,round brackets and single quotes with a space.

Remove Punctuations

 This preprocessor removes all punctuationsymbols from the text such as exclamation marks, question marks,dots,commas etc.

Remove Special Characters

 This preprocessor removes specific special
characters and certain punctuation marks from the text.

Remove User IDs

 This preprocessor removes user ids from the text. User ids are identified as word phrases preceded by "@" character.

Replace URLs

 This preprocessor removes all URLs in thetext. URLs are identified using URL pattern defined in the code.

Replace URL Placeholders

 This preprocessor replaces URL placeholders in the text with a space. URL placeholders are strings like "#UNIVERSAL_RESOURCE_LOCATOR", which are used as a placeholder for URLs in a pre-processing step.

Convert to Lowercase

 This preprocessor converts all characters in the text to lower case.