sliding_window_operator¶
Operator to slide a fixed length window across a timeseries dataframe
-
class
tasrif.processing_pipeline.custom.sliding_window_operator.
SlidingWindowOperator
(winsize='1h15t', period=15, time_col='time', label_col='CGM', pid_col='patientID')¶ From a timeseries dataframe of participants, this function generates two dataframes: <time_series_features>, <labels> The first dataframe can be used with tsfresh later on, while the second has all the labels that we want to predict.
Notice that the default winsize is 1h and 15 minutres (1h15t). We use the first hour to extract the features and the 15 min only to collect the ground_truth labels.
import pandas as pd from tasrif.processing_pipeline.custom import SlidingWindowOperator
>>> df = pd.DataFrame([ ... ["2020-02-16 11:45:00",27,102.5], ... ["2020-02-16 12:00:00",27,68.5], ... ["2020-02-16 12:15:00",27,40.0], ... ["2020-02-16 15:15:00",27,282.5], ... ["2020-02-16 15:30:00",27,275.0], ... ["2020-02-16 15:45:00",27,250.0], ... ["2020-02-16 16:00:00",27,235.0], ... ["2020-02-16 16:15:00",27,206.5], ... ["2020-02-16 16:30:00",27,191.0], ... ["2020-02-16 16:45:00",27,166.5], ... ["2020-02-16 17:00:00",27,171.5], ... ["2020-02-16 17:15:00",27,152.0], ... ["2020-02-16 17:30:00",27,124.0], ... ["2020-02-16 17:45:00",27,106.0], ... ["2020-02-16 18:00:00",27,96.5], ... ["2020-02-16 18:15:00",27,86.5], ... ["2020-02-16 17:30:00",31,186.0], ... ["2020-02-16 17:45:00",31,177.0], ... ["2020-02-16 18:00:00",31,171.0], ... ["2020-02-16 18:15:00",31,164.0], ... ["2020-02-16 18:30:00",31,156.0], ... ["2020-02-16 18:45:00",31,157.0], ... ["2020-02-16 19:00:00",31,158.0], ... ["2020-02-16 19:15:00",31,158.5], ... ["2020-02-16 19:30:00",31,150.0], ... ["2020-02-16 19:45:00",31,145.0], ... ["2020-02-16 20:00:00",31,137.0], ... ["2020-02-16 20:15:00",31,141.0], ... ["2020-02-16 20:45:00",31,146.0], ... ["2020-02-16 21:00:00",31,141.0]], ... columns=['dateTime','patientID','CGM']) >>> df['dateTime'] = pd.to_datetime(df['dateTime']) >>> df >>> op = SlidingWindowOperator(winsize="1h15t", ... time_col="dateTime", ... label_col="CGM", ... pid_col="patientID") >>> df_timeseries, df_labels, df_label_time, df_pids = op.process(df)[0] >>> df_timeseries . dateTime CGM seq_id 0 2020-02-16 15:15:00 282.5 0 1 2020-02-16 15:30:00 275.0 0 2 2020-02-16 15:45:00 250.0 0 3 2020-02-16 16:00:00 235.0 0 4 2020-02-16 15:30:00 275.0 1 ... ... ... ... 143 2020-02-16 19:45:00 145.0 35 144 2020-02-16 19:15:00 158.5 36 145 2020-02-16 19:30:00 150.0 36 146 2020-02-16 19:45:00 145.0 36 147 2020-02-16 20:00:00 137.0 36 148 rows × 3 columns
-
__init__
(winsize='1h15t', period=15, time_col='time', label_col='CGM', pid_col='patientID')¶ Creates a new instance of SlidingWindowsOperator
- Parameters
winsize (int, offset) – Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size. If its an offset then this will be the time period of each window.
period (int) – periodicity expected between rows. Only used if winsize is an offset
time_col (str) – time column in the dataframe
label_col (str) – label column in the dataframe
pid_col (str) – patient id column in the dataframe
-