Clusters multiple time series data into groups. (This is a feature planned for the upcoming release. Stay tuned!)
Input data should be a time series data with category. Each row should represent one observation with date/time. It may have multiple rows for a date/time, in which case the rows are internally aggregated into one row for the date/time. It should have the following columns.
Group - A categorical (character or factor) column. The categories specified here are clustered into groups.
Date/Time - A Date or POSIXct column to indicate when the observations took place.
Value (Optional) - A column that stores observed values. Values for multiple rows for one date/time for a category are internally aggregated into one value by the specified aggregation function to form a time series for the category to be clustered. If not specified, the number of rows for each date/time is used as the time series to cluster.
Other Columns to Keep (Optional) - Other columns for values to keep in the output data. Values for multiple rows for one date/time for a category are internally aggregated into one value by the specified aggregation function, to be put together in the output.
Number of Clusters - The number of clusters to group the time series data into.
Cluster Center Method - Method to calculate cluster center time series (centroid) for each iteration.
DTW Barycenter Averaging
Soft DTW Centroids
Partition around Medoids
Distance Method - Method to calculate distance between the cluster center time series (centroid) and each time series for each iteration.
DTW with L2 Norm
DTW Guided by Lemire's Lower Bound
Keogh's Lower Bound for DTW
Lemire's Lower Bound for DTW
Global Alignment Kernels
Random Seed - Random seed set before the clustering, so that the results are constant when the same calculations are repeated.
NA Fill Type - How to fill NAs that appear between the first and last non-NA value in a time series.
Fill with Previous Non-NA Value
Fill with 0
NA Fill Type - Beginning - How to fill NAs that appear before the first non-NA value in a time series.
Fill with 0
Fill with First Non-NA Value
NA Fill Type - Ending - How to fill NAs that appear after the last non-NA value in a time series.
Fill with 0
Fill with Last Non-NA Value
Remove Groups with NAs
When NA Ratio Is Greater Than - If the time series data for a category has more NAs than this ratio, the category is removed from the data before the clustering is performed.
Normalize Value - Whether to normalize the aggregated values or not.
Time Series Clustering Step uses the dtwclust R Package under the hood.
For details about
dtwclust usage in Exploratory R Package, please refer to the github repository