> For the complete documentation index, see [llms.txt](https://docs.exploratory.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.exploratory.io/main/do_tfidf.md).

# Calculate tf-idf

## How to Access This Feature

### From + (plus) Button

You can access it from 'Add' (Plus) button. "Text Mining..." -> "Calculate TF-IDF". ![](/files/-M4oNE7gJvG3uj_fk4va)

## How to Use?

![](/files/-M4oNE7i_sKCTiNIbBDZ)

* Select a column as document id - A column considered as document id. If you run [do\_tokenize](/main/do_tokenize.md) beforehand, this can be document\_id.
* Select a column that has tokenized text - Set a column that has tokens. This is "token" column if it's tokenized by [do\_tokenize](/main/do_tokenize.md) function.
* TF Weight (Optional) - The default is "raw".
  * "raw" is count of a term in a document.
  * "binary" is if it exists or not. If it exists, it is 1 and if not, it is 0.
  * "log\_scale" is `1+log(count of a term in a document)`.
* IDF Log Scale Function (Optional) - The default is log. This is a function to suppress the increase of idf value. Idf is calculated by `log_scale_function((the total number of documents)/(the number of documents which have the token))`. It's how rare the token is in the set of documents. It might be worth trying log2 or log10. log2 increases the value more easily and log10 increases it more slowly.
* Normalization (Optional) - The default is l2. How to normalize the tfidf vector.
  * "l2" is normalization that Euclidean distance of the tfidf vector for a document becomes 1.
  * "l1" is normalization that Manhattan distance (sum of values) of the tfidf vector for a document becomes 1.
  * FALSE doesn't normalize the result.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.exploratory.io/main/do_tfidf.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
