# Calculate tf-idf

## How to Access This Feature

### From + (plus) Button

You can access it from 'Add' (Plus) button. "Text Mining..." -> "Calculate TF-IDF". ![](/files/-M4oNE7gJvG3uj_fk4va)

## How to Use?

![](/files/-M4oNE7i_sKCTiNIbBDZ)

* Select a column as document id - A column considered as document id. If you run [do\_tokenize](/main/do_tokenize.md) beforehand, this can be document\_id.
* Select a column that has tokenized text - Set a column that has tokens. This is "token" column if it's tokenized by [do\_tokenize](/main/do_tokenize.md) function.
* TF Weight (Optional) - The default is "raw".
  * "raw" is count of a term in a document.
  * "binary" is if it exists or not. If it exists, it is 1 and if not, it is 0.
  * "log\_scale" is `1+log(count of a term in a document)`.
* IDF Log Scale Function (Optional) - The default is log. This is a function to suppress the increase of idf value. Idf is calculated by `log_scale_function((the total number of documents)/(the number of documents which have the token))`. It's how rare the token is in the set of documents. It might be worth trying log2 or log10. log2 increases the value more easily and log10 increases it more slowly.
* Normalization (Optional) - The default is l2. How to normalize the tfidf vector.
  * "l2" is normalization that Euclidean distance of the tfidf vector for a document becomes 1.
  * "l1" is normalization that Manhattan distance (sum of values) of the tfidf vector for a document becomes 1.
  * FALSE doesn't normalize the result.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.exploratory.io/main/do_tfidf.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
