resample_operator

Operator to resample a timeseries based dataframe

class tasrif.processing_pipeline.custom.resample_operator.ResampleOperator(rule, aggregation_definition, **resample_args)

Group and aggregate rows in 2D data frame based on a column feature. This operator works on a 2D data frames where the columns represent the features. The returned data frame contains aggregated values as the column features together with the base feature used for grouping.

Examples

>>> import pandas as pd
>>> from tasrif.processing_pipeline.custom import ResampleOperator
>>> df = pd.DataFrame([
>>>     [1, "2020-05-01 00:00:00", 1],
>>>     [1, "2020-05-01 01:00:00", 1],
>>>     [1, "2020-05-01 03:00:00", 2],
>>>     [2, "2020-05-02 00:00:00", 1],
>>>     [2, "2020-05-02 01:00:00", 1]],
>>>     columns=['logId', 'timestamp', 'sleep_level'])
>>>
>>> df['timestamp'] = pd.to_datetime(df['timestamp'])
>>> df = df.set_index('timestamp')
>>> op = ResampleOperator('D', {'sleep_level': 'mean'})
>>> op.process(df)
[            sleep_level
timestamp
2020-05-01     1.333333
2020-05-02     1.000000]
__init__(rule, aggregation_definition, **resample_args)

Creates a new instance of ResampleOperator

Parameters
  • rule (ruleDateOffset, Timedelta, str) – The offset string or object representing target conversion.

  • aggregation_definition (dict, str) – Dictionary containing feature to aggregation functions mapping. function defining the aggregation behavior (‘sum’, ‘mean’, ‘ffill’, etc.)

  • **resample_args – key word arguments passed to pandas DataFrame.resample method