sliding_window_operator

Operator to slide a fixed length window across a timeseries dataframe

class tasrif.processing_pipeline.custom.sliding_window_operator.SlidingWindowOperator(winsize='1h15t', period=15, time_col='time', label_col='CGM', pid_col='patientID')

From a timeseries dataframe of participants, this function generates two dataframes: <time_series_features>, <labels> The first dataframe can be used with tsfresh later on, while the second has all the labels that we want to predict.

Notice that the default winsize is 1h and 15 minutres (1h15t). We use the first hour to extract the features and the 15 min only to collect the ground_truth labels.

import pandas as pd from tasrif.processing_pipeline.custom import SlidingWindowOperator

>>> df = pd.DataFrame([
...    ["2020-02-16 11:45:00",27,102.5],
...    ["2020-02-16 12:00:00",27,68.5],
...    ["2020-02-16 12:15:00",27,40.0],
...    ["2020-02-16 15:15:00",27,282.5],
...    ["2020-02-16 15:30:00",27,275.0],
...    ["2020-02-16 15:45:00",27,250.0],
...    ["2020-02-16 16:00:00",27,235.0],
...    ["2020-02-16 16:15:00",27,206.5],
...    ["2020-02-16 16:30:00",27,191.0],
...    ["2020-02-16 16:45:00",27,166.5],
...    ["2020-02-16 17:00:00",27,171.5],
...    ["2020-02-16 17:15:00",27,152.0],
...    ["2020-02-16 17:30:00",27,124.0],
...    ["2020-02-16 17:45:00",27,106.0],
...    ["2020-02-16 18:00:00",27,96.5],
...    ["2020-02-16 18:15:00",27,86.5],
...    ["2020-02-16 17:30:00",31,186.0],
...    ["2020-02-16 17:45:00",31,177.0],
...    ["2020-02-16 18:00:00",31,171.0],
...    ["2020-02-16 18:15:00",31,164.0],
...    ["2020-02-16 18:30:00",31,156.0],
...    ["2020-02-16 18:45:00",31,157.0],
...    ["2020-02-16 19:00:00",31,158.0],
...    ["2020-02-16 19:15:00",31,158.5],
...    ["2020-02-16 19:30:00",31,150.0],
...    ["2020-02-16 19:45:00",31,145.0],
...    ["2020-02-16 20:00:00",31,137.0],
...    ["2020-02-16 20:15:00",31,141.0],
...    ["2020-02-16 20:45:00",31,146.0],
...    ["2020-02-16 21:00:00",31,141.0]],
...    columns=['dateTime','patientID','CGM'])
>>> df['dateTime'] = pd.to_datetime(df['dateTime'])
>>> df
>>> op = SlidingWindowOperator(winsize="1h15t",
...                           time_col="dateTime",
...                           label_col="CGM",
...                           pid_col="patientID")
>>> df_timeseries, df_labels, df_label_time, df_pids = op.process(df)[0]
>>> df_timeseries
.   dateTime    CGM     seq_id
0   2020-02-16 15:15:00     282.5   0
1   2020-02-16 15:30:00     275.0   0
2   2020-02-16 15:45:00     250.0   0
3   2020-02-16 16:00:00     235.0   0
4   2020-02-16 15:30:00     275.0   1
...     ...     ...     ...
143     2020-02-16 19:45:00     145.0   35
144     2020-02-16 19:15:00     158.5   36
145     2020-02-16 19:30:00     150.0   36
146     2020-02-16 19:45:00     145.0   36
147     2020-02-16 20:00:00     137.0   36
148 rows × 3 columns
__init__(winsize='1h15t', period=15, time_col='time', label_col='CGM', pid_col='patientID')

Creates a new instance of SlidingWindowsOperator

Parameters
  • winsize (int, offset) – Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size. If its an offset then this will be the time period of each window.

  • period (int) – periodicity expected between rows. Only used if winsize is an offset

  • time_col (str) – time column in the dataframe

  • label_col (str) – label column in the dataframe

  • pid_col (str) – patient id column in the dataframe