aggregate_operator

Operator to aggregate column features based on a column

class tasrif.processing_pipeline.custom.aggregate_operator.AggregateOperator(groupby_feature_names, aggregation_definition, observers=None)

Group and aggregate rows in 2D data frame based on a column feature. This operator works on a 2D data frames where the columns represent the features. The returned data frame contains aggregated values as the column features together with the base feature used for grouping.

Examples

>>> import pandas as pd
>>>
>>> from tasrif.processing_pipeline.custom import AggregateOperator
>>> from tasrif.processing_pipeline.custom import LinearFitOperator
>>>
>>> df = pd.DataFrame([['001', 25, 30], ['001', 17, 50], ['002', 20, 40], ['002', 21, 42]],
...                     columns=['pid', 'min_activity', 'max_activity'])
>>>
>>> operator = AggregateOperator(
...    groupby_feature_names ="pid",
...    aggregation_definition= {"min_temp": ["mean", "std"],
...                             "r2,_,intercept": LinearFitOperator(feature_names='min_activity',
...                                                                 target='max_activity')})
>>> df0 = operator.process(df0)
>>>
>>> print(df0)
[   pid  min_activity_mean  min_activity_std   r2     intercept
 0  001               21.0          5.656854  1.0  9.250000e+01
 1  002               20.5          0.707107  1.0  7.105427e-15]
__init__(groupby_feature_names, aggregation_definition, observers=None)

Creates a new instance of AggregateOperator

Parameters
  • groupby_feature_names (str) – Name of the feature to base the grouping on. In case groupby_feature_names includes non string such as a function call like pd.Grouper(), the column is not shown in the result.

  • aggregation_definition (dict) – Dictionary containing feature to aggregation functions mapping.

  • observers (list[Observer]) – Python list of observers

set_observers(observers)

Function to store the observers for the given operator.

Parameters

observers (list of Observer) – Observer objects that observe the operator