calculate_timeseries_properties_operator

Operator to aggregate column features based on a column

class tasrif.processing_pipeline.kats.calculate_timeseries_properties_operator.CalculateTimeseriesPropertiesOperator(date_feature_name='time', value_column='value', method='kats', **kwargs)

This method extracts timeseries features from passed dataframe object

>>> import numpy as np
>>> import pandas as pd
>>>
>>> from tasrif.processing_pipeline.custom import ExtractTimeseriesFeaturesOperator
>>>
>>>
>>> dates = pd.date_range('2016-12-31', '2020-01-08', freq='D').to_series()
>>> df = pd.DataFrame()
>>> df["Date"] = dates
>>> df['Steps'] = np.random.randint(1000,25000, size=len(df))
>>> df['Calories'] = np.random.randint(1800,3000, size=len(df))
>>> df
Date    Steps   Calories
2016-12-31  2016-12-31  14648   2926
2017-01-01  2017-01-01  9320    2190
2017-01-02  2017-01-02  2798    2521
2017-01-03  2017-01-03  11050   2330
2017-01-04  2017-01-04  6536    2172
...     ...     ...     ...
2020-01-04  2020-01-04  22739   2365
2020-01-05  2020-01-05  4845    1849
2020-01-06  2020-01-06  1143    2420
2020-01-07  2020-01-07  5577    2821
2020-01-08  2020-01-08  10435   1830
>>> operator = ExtractTimeseriesFeaturesOperator(date_feature_name="Date", value_column='Steps')
>>> features = operator.process(df)[0]
>>> features
{'length': 1104,
 'mean': 13024.617753623188,
 'var': 49311921.9535254,
 'entropy': 0.9344372604411008,
 'lumpiness': 98331798003649.39,
 'stability': 2856199.257016417,
 'flat_spots': 1,
 'hurst': 0.008804860201927526,
 'std1st_der': 4942.8445155096715,
 'crossing_points': 559,
 'binarize_mean': 0.5,
 'unitroot_kpss': 0.040225392515162994,
 'heterogeneity': 12.514530846049983,
 'histogram_mode': 1008.0,
 'linearity': 0.0010899360162310767,
 'trend_strength': 0.254325167656127,
 'seasonality_strength': 0.3490641758024736,
 'spikiness': 757835496.8906168,
 'peak': 5,
 'trough': 2,
 'level_shift_idx': 526,
 'level_shift_size': 1180.800000000001,
 'y_acf1': 0.04351456010311684,
 'y_acf5': 0.004305213630472914,
 'diff1y_acf1': -0.4827096027299026,
 'diff1y_acf5': 0.23486910790170704,
 'diff2y_acf1': -0.657077984628258,
 'diff2y_acf5': 0.45760076640469927,
 'y_pacf5': 0.004035237116844455,
 'diff1y_pacf5': 0.46635744288898123,
 'diff2y_pacf5': 1.0429819204003172,
 'seas_acf1': 0.0074998374601003975,
 'seas_pacf1': 0.010035485775513319,
 'firstmin_ac': 3,
 'firstzero_ac': 6,
 'holt_alpha': 0.21714285714285714,
 'holt_beta': 0.09771428571428571,
 'hw_alpha': 0.040357142857142855,
 'hw_beta': 0.024214285714285716,
 'hw_gamma': 0.03427295918367347}
__init__(date_feature_name='time', value_column='value', method='kats', **kwargs)

Creates a new instance of ExtractTimeseriesFeaturesOperator

Parameters
  • date_feature_name – str Name of the datetime column

  • value_column – str Name of the column that contains values per date

  • method – str Name of feature extractor method

  • **kwargs – None or List[str]; list of feature/feature group name(s) key word arguments passed to method’s parameters