Custom Operators

Tasrif also allows the user to build their own Operators if they need to do so.

The interface for an Operator is simple:

class ProcessingOperator
    def process(self, *data_frames):
        pass

By subclassing from ProcessingOperator, a user can build their own Operators. For example, suppose we needed an Operator that computes the rows of each DataFrame passed to it:

from tasrif.processing_pipeline import ProcessingOperator

class RowCountOperator(ProcessingOperator):
    def process(self, *data_frames):
        output = []

        for df in data_frames:
            len(df.index)

        return output

Let’s test our new Operator:

>>> df1 = pd.DataFrame({
...     'Date':   ['05-06-2021', '06-06-2021', '07-06-2021', '08-06-2021'],
...     'Steps':  [        4500,         None,         5690,         6780]
... })

>>> df2 = pd.DataFrame({
...     'Date':   ['12-07-2021', '13-07-2021', '14-07-2021', '15-07-2021'],
...     'Steps':  [        2100,         None,         None,         5400]
... })

>>> RowCountOperator().process(df1, df2)
[4, 4]

To ease the creation of custom Operators, we have created two convenience classes: MapProcessingOperator and ReduceProcessingOperator. As their names suggest, these are Operators that can be subclassed to create custom Operators that have map or reduce processing behavior.

For example, let’s use the MapProcessingOperator to build the RowCountOperator:

from tasrif.processing_pipeline import MapProcessingOperator

class RowCountOperator(MapProcessingOperator):
    def processing_function(self, df):
        return len(df.index)

As you can see, these convenience classes are a quick way of creating simple, custom Operators.