Pre-processing Pipelines for Intent Model
Updated
Before You Start
To learn about Intent Model, see How to Create an Intent Model: A Step-by-Step Guide
Steps for Creating an Intent Model
On the Intent Model Creation Screen, Browse to Pre-Processing Pipelines dropdown.
2. Now choose the preprocessors that align most effectively with your use case. For instance, if your training data comprises an abundance of hashtags and emojis, consider utilizing preprocessors such as "Remove Hashtags" and "Remove Emojis" to filter out these elements from your data prior to commencing model training.
Note: - If you don't select any preprocessors, A standard preprocessoing pipeline would run on your data no matter what. |
List of Pre-Processors
Pre-processor | Description |
Replace Newlines | This preprocessor replaces all newline characters ("\n") in the text with a space. |
Remove Email IDs | This preprocessor removes all email ids from the text. Email ids are extracted using regular expressions. |
Remove Emojis | This preprocessor removes old style emojis from the text. |
Remove Hashtags | This preprocessor removes all hashtags in the text. Hashtags are the word phrases preceded by "#" characters in social media posts. |
Remove HTML Tags | This preprocessor removes HTML tags from the text. |
Remove Multiple Spaces | This preprocessor removes multiple spaces in the text with a single space. |
Remove New Emojis | This preprocessor removes all emojis from the text. Emojis are identified using the emoji pattern defined in the code. |
Remove Numbers | This preprocessor removes all numeric characters from the text. |
Remove Special Characters & Punctuations | This preprocessor removes certain special characters and punctuations. It also replaces square brackets,round brackets and single quotes with a space. |
Remove Punctuations | This preprocessor removes all punctuationsymbols from the text such as exclamation marks, question marks,dots,commas etc. |
Remove Special Characters | This preprocessor removes specific special characters and certain punctuation marks from the text. |
Remove User IDs | This preprocessor removes user ids from the text. User ids are identified as word phrases preceded by "@" character. |
Replace URLs | This preprocessor removes all URLs in thetext. URLs are identified using URL pattern defined in the code. |
Replace URL Placeholders | This preprocessor replaces URL placeholders in the text with a space. URL placeholders are strings like "#UNIVERSAL_RESOURCE_LOCATOR", which are used as a placeholder for URLs in a pre-processing step. |
Convert to Lowercase | This preprocessor converts all characters in the text to lower case. |