add_duration_operator

Operator to aggregate column features based on a column

class tasrif.processing_pipeline.custom.add_duration_operator.AddDurationOperator(groupby_feature_names, date_feature_name='timestamp', duration_feature_name='duration')

Given a 2D dataframe representing a timeseries where each row represents a time event, this operator will add a new feature duration to compute duration.

Examples

>>> import pandas as pd
>>>
>>> from tasrif.processing_pipeline.custom import AddDurationOperator
>>>
>>> df0 = pd.DataFrame([[1, "2020-05-01 00:00:00", 1], [1, "2020-05-01 01:00:00", 1],
>>> [1, "2020-05-01 03:00:00", 2], [2, "2020-05-02 00:00:00", 1],[2, "2020-05-02 01:00:00", 1]],
>>>               columns=['logId', 'timestamp', 'sleep_level'])
>>> df0['timestamp'] = pd.to_datetime(df0['timestamp'])
>>>
>>> operator = AddDurationOperator(
>>>    groupby_feature_names="logId",
>>>    date_feature_name="timestamp",
>>>    duration_feature_name="duration")
>>> df0 = operator.process(df0)
>>>
>>> print(df0)
[   logId           timestamp  sleep_level        duration
0      1 2020-05-01 00:00:00            1 0 days 00:00:00
1      1 2020-05-01 01:00:00            1 0 days 01:00:00
2      1 2020-05-01 03:00:00            2 0 days 02:00:00
3      2 2020-05-02 00:00:00            1 0 days 00:00:00
4      2 2020-05-02 01:00:00            1 0 days 01:00:00]
__init__(groupby_feature_names, date_feature_name='timestamp', duration_feature_name='duration')

Creates a new instance of AddDurationOperator

Parameters
  • groupby_feature_names (str) – Name of the feature to identify related timestamp series

  • date_feature_name (str) – Name of the feature respresenting the timestamp

  • duration_feature_name (str) – Name of the feature representing the duration