Time Series Clustering

Time Series Clustering

Clusters multiple time series data into groups.

Input Data

Input data should be a time series data with category. Each row should represent one observation with date/time. It may have multiple rows for a date/time, in which case the rows are internally aggregated into one row for the date/time. It should have the following columns.

  • Group - A categorical (character or factor) column. The categories specified here are clustered into groups.

  • Date/Time - A Date or POSIXct column to indicate when the observations took place.

  • Value (Optional) - A column that stores observed values. Values for multiple rows for one date/time for a category are internally aggregated into one value by the specified aggregation function to form a time series for the category to be clustered. If not specified, the number of rows for each date/time is used as the time series to cluster.

  • Other Columns to Keep (Optional) - Other columns for values to keep in the output data. Values for multiple rows for one date/time for a category are internally aggregated into one value by the specified aggregation function, to be put together in the output.

Parameters

  • Number of Clusters - The number of clusters to group the time series data into.

  • Cluster Center Method - Method to calculate cluster center time series (centroid) for each iteration.

    • Mean

    • Median

    • Shape Averaging

    • DTW Barycenter Averaging

    • Soft DTW Centroids

    • Partition around Medoids

  • Include Cluster Center Data - If set to TRUE, the output data includes the calculated cluster center time series data (centroid) for each cluster.

  • Distance Method - Method to calculate distance between the cluster center time series (centroid) and each time series for each iteration.

    • DTW

    • DTW with L2 Norm

    • DTW Basic

    • DTW Guided by Lemire's Lower Bound

    • Keogh's Lower Bound for DTW

    • Lemire's Lower Bound for DTW

    • Shape-Based Distance

    • Global Alignment Kernels

    • Soft-DTW

  • NA Fill Type - How to fill NAs that appear between the first and last non-NA value in a time series.

    • Fill with Previous Non-NA Value

    • Fill with 0

    • Linear Interpolation

    • Spline Interpolation

  • NA Fill Type - Beginning - How to fill NAs that appear before the first non-NA value in a time series.

    • Fill with 0

    • Fill with First Non-NA Value

  • NA Fill Type - Ending - How to fill NAs that appear after the last non-NA value in a time series.

    • Fill with 0

    • Fill with Last Non-NA Value

  • Remove Groups When NA Ratio Is Greater Than - If the time series data for a category has more NAs than this ratio, the category is removed from the data before the clustering is performed.

  • Normalize Value - Whether to normalize the aggregated values or not.

  • Random Seed - Random seed set before the clustering, so that the results are constant when the same calculations are repeated.

R Package

Time Series Clustering Step uses the dtwclust R Package under the hood.

Exploratory R Package

For details about dtwclust usage in Exploratory R Package, please refer to the github repository

Last updated