# Cosine Similarity

## Introduction

Calculate cosine similarity of each of the pairs of categories. This is often used as similarity of documents.

## How to Access?

You can access from 'Add' (Plus) button.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oNDaJ-SzpC8OiNdjS%2Fsim_add.png?generation=1586795484428050\&alt=media)

## How to Use?

### Calculate Distances Among Categories

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oNDaL7CxPGB0mkaUc%2Fsim_dialog.png?generation=1586795484519233\&alt=media)

#### Column Selection

Category, dimension and measure are like this.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oNDaNPjridRfO5eqE%2Fskv_origin.png?generation=1586795484528788\&alt=media)

Category column is a column that has categories. They are parameterized by measures with the dimensions.

![](https://2850417076-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M4HLCK3olgduYoe3RVS%2F-M4oMvCUDQwHTJ0eWi_f%2F-M4oNDaSq8iW09ffLg4h%2Fskv_expand.png?generation=1586795484462447\&alt=media)

In this case, similarities of airline carriers are calculated based on count of flight. Think that each carrier is represented as a vector of flight count in each week day and cosine similarities of them are calculated.

If there are duplicated values, they will be aggregated by "Aggregate with".

### Parameters

* Keep Only Unique Pairs (Optional) - The default is FALSE. Whether the pair of output should be unique. If this is TRUE, a pair appears only once but if it's FALSE, a pair appears twice in swapped order. If you want to filter the pairs by names, it's better to be FALSE.
* Keep Diagonal Pairs (Optional) - The default is FALSE. Whether the output should contain the similarity of documents with itself.
