# Tokenize Text

## How to Access This Feature

### From + (plus) Button

There are two ways to access. One is to access from 'Add' (Plus) button.

![](/files/-M4oNI0OZMHDeC-oWttl)

Another way is to access from a column header menu.

![](/files/-M4oNI0UvvhdCDsi2a1O)

## Parameters

![](/files/-M4oNI0WXGATkOkN3EQy)

* Column to Tokenize - Set the text column you want to split or tokenize.
* Tokenize By - The default is "words". Select the unit of token from
  * Words
  * Sentences
* Keep Other Columns - The default is FALSE. Whether existing columns should remain.
* Keep Original Column - Whether input column should be removed. The default is No.
* With Sentence ID - Whether the sentence ID should be in the output. The default is Yes
* Remove Stopwords - Default is Yes.
* Language for Stopwords - By default it is automatically selected based on the content of the text.
* Additional Stopwords - Words to be added to the default set of stopwords.
* Exclude from Stopwords - Words to be excluded from the default set of stopwords.
* Words To Be Treated As One Word - If a word or phrase that should be treated as one token is separated into multiple tokens, it can be fixed by specifying the word/phrase here.
* Remove Punctuations
* Remove Numbers
* Clean Up Twitter Data - Whether to remove hashtag (starts with #) and mention (starts with @). The default is No.
* Remove Hiragana Only Words - You can treat often meaningless short Japanese Hiragana words as stopwords altogether by selecting an option here.
* Column Name for Output Data - The default is "token". Set a column name for the new column to store the tokenized values.
* Format for Output Data - Format for the output tokens. The default is lowercase.
  * Lowercase
  * Titlecase
  * Uppercase


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.exploratory.io/main/do_tokenize.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
