Exploratory
Search
⌃K

Anomaly Detection

Introduction

Step-by-Step Tutorial with Access Log data.

It detects anomaly in time series data frame. It employs an algorithm referred to as Seasonal Hybrid ESD (S-H-ESD), which can detect both global as well as local anomalies in the time series data by taking seasonality and trend into account. It’s built by a team at Twitter for their use on monitoring their traffics.

How to Access?

How to Configure?

Column Selection

  • Date/Time Column - Select a Date or POSIXct data type column that holds date/time information.
    • Aggregation Level - When data type is Date, data is aggregated (e.g. summed, averaged, etc.) for each day. When data type is POSIXct, level of aggregation can be day, hour, minute, or second.
  • Value Column - Select either 'Number of Rows' or a numeric column for which you want to detect anomalies.
    • Aggregation Function - Select an aggregate function such as 'sum', 'mean', etc. to aggregate the values.

Parameters

  • How to Fill NA - This algorithm requires NAs to be filled. The default is Fill with Previous Value. This can be...
    • Fill with Previous Value
    • Fill with Zero
    • Linear Interpolation
    • Spline Interpolation
  • Direction of Anomaly (Optional) - The default is "both". Direction of anomaly. This can be...
    • "both" - Both positive and negative direction.
    • "pos" - Only positive direction.
    • "neg" - Only negative direction.
  • With Expected Values (Optional) - The default is TRUE. Whether expected_values should be returned.
  • Maximum Ratio of Anomaly Data (Optional) - The default is 0.1. The maximum ratio of anomaly data compared to the number of total data.
  • Alpha (Sensitivity to Anomaly Data) (Optional) - The default is 0.05. The larger the value, the more anomaly data are captured.
  • Report Only Last Values within (Optional) - The default is NULL. Find only last anomalies within a day or hour. This can be
    • NULL - Find all anomalies.
    • "day" - Find last anomalies within a day.
    • "hr" - Find last anomalies within an hour.
  • Threshold of Positive Anomaly (Optional) - The default is 'None'. If this is specified, only positive anomalies above the threshold are reported. This can be
    • 'None' - No threshold.
    • 'med_max' - Median of daily max values.
    • 'p95' - 95th percentile of the daily max values.
    • 'p99' - 99th percentile of the daily max values.
  • Longer Time Span than a Month (Optional) - The default is FALSE. This should be TRUE if the time span is longer than a month.
  • Piecewise Median Time Window (Optional) - The default is 2. The size of piecewise median time window (span of seasons). The unit is weeks.

How to Read the Result?

  • Date / Time Column
  • Value Column
  • pos_anomaly - Returns TRUE if anomaly is detected in the positive detection for each row.
  • pos_value - Anomaly values in the positive direction.
  • neg_anomaly - Returns TRUE if anomaly is detected in the negative detection for each row.
  • neg_value - Anomaly values in the negative direction.
  • expected_value - The values that the model would have expected based on the underlying trend.

Underlying R Package

Step-by-Step Tutorial with Access Log data.