Correlation
Here's a step-by-step tutorial guide on how to run correlation algorithm to calculate the correlation among either multiple columns or categories in Exploratory.
How to Access?
There are two ways to access. One is to access from 'Add' (Plus) button.
Another way is to access from the column header menu.
How to Use?
Calculate Correlations Among Multiple Columns (Variables)
Column Selection
There are many ways to select columns. You want to select Numeric data type columns here.
Parameters
Correlation Method (Optional) - The default is "Pearson". This can be:
"pearson"
"kendall"
"spearman"
Operation in case of NA (Optional) - The default is "pairwise.complete.obs". This can be:
pairwise.complete.obs
everything
all.obs
complete.obs
na.or.complete
Keep Only Unique Pairs (Optional) - The default is FALSE. Whether the pair of output should be unique. If this is TRUE, a pair appears only once but if it's FALSE, a pair appears twice in swapped order. If you want to filter the pairs by names, it's better to be FALSE.
Keep Diagonal Pairs (Optional) - The default is FALSE. Whether the output should contain the similarity of documents with itself.
fill (Optional) - The default is 0. This is what should be used for missing value in groups.
Take a look at the reference document for the 'cor' function from base R for more details on the parameters.
Calculate Correlations Among Categories
The simplest way to understand this option for calculating the correlation among the categories is to draw the data on Chart like below.
In this case, we want to understand the correlation among the airline carriers (Color) based on how their average arrival times (Y-Axis) are transitioning on the time (X-Axis). If two airline carriers move the same way, which means the two lines are 'correlated', then the correlation value for these two carriers would be high.
So, with this example, you want to select the airline carrier column to Category column, the flight date column to Dimension column, then the arrival delay column to Measure and select the aggregate function to 'Average'. This will calculate the correlation between all the pairs of the airline carriers.
Column Selection
Column for Categories -
Dimension -
Measure -
Aggregate Function
Replace NA With -
Parameters
Correlation Method (Optional) - The default is "Pearson". This can be:
"pearson"
"kendall"
"spearman"
Operation in case of NA (Optional) - The default is "pairwise.complete.obs". This can be:
pairwise.complete.obs
everything
all.obs
complete.obs
na.or.complete
Keep Only Unique Pairs (Optional) - The default is FALSE. Whether the pair of output should be unique. If this is TRUE, a pair appears only once but if it's FALSE, a pair appears twice in swapped order. If you want to filter the pairs by names, it's better to be FALSE.
Keep Diagonal Pairs (Optional) - The default is FALSE. Whether the output should contain the similarity of documents with itself.
fill (Optional) - The default is 0. This is what should be used for missing value in groups.
Take a look at the reference document for the 'cor' function from base R for more details on the parameters.
Last updated