split_operator

Module that defines the SequenceOperator class

class tasrif.processing_pipeline.split_operator.SplitOperator(split_operators, bind_list=None, observers=None, num_processes=1)

Class representing a split operation. The input coming into this operator is split into multiple branches represented by split operators that are passed in the constructor.

__init__(split_operators, bind_list=None, observers=None, num_processes=1)

Constructs a split operator using the provided arguments

Parameters
  • split_operators (list[ProcessingOperator]) – Python list of processing operators

  • bind_list (list[Integer]) – Specifies the bind order of data passed to the split operators, with each value in the bind_list corresponding to the index of the argument for the operator at that index. For example: a bind_order of [0, 1, 1] means that the first operator receives the first argument (index 0) and the second and third operator receives the second argument (index 1). Note that an error is raised if len(bind_list) != len(split_operators). If no bind_list is passed, arguments are passed in the same order as they are received (representing a bind_list of [0, 1, 2, …]).

  • observers (list[Observer]) – Python list of observers

  • num_processes – int number of logical processes to use to process the operator

Raises
  • ValueError – Occurs when one of the objects in the split_operators list is not a ProcessingOperator.

  • ValueError – If the number of operators does not match the number of elements in the bind_list.

Examples

>>> import pandas as pd
>>> from tasrif.processing_pipeline import SplitOperator
>>> from tasrif.processing_pipeline.pandas import DropNAOperator, DropDuplicatesOperator
>>> df0 = pd.DataFrame({
...     'Date':  ['05-06-2021', '06-06-2021', '07-06-2021', '08-06-2021'],
...     'Steps': [       pd.NA,         2000,        pd.NA,         4000]
... })
>>> df1 = pd.DataFrame({
... 'Date':  ['05-06-2021', '06-06-2021', '06-06-2021', '07-06-2021', '07-06-2021', '08-06-2021'],
... 'Steps': [       pd.NA,         2000,         2000,        pd.NA,        pd.NA,         4000]
... })
>>> operator = SplitOperator([
...     DropNAOperator(),
...     DropDuplicatesOperator()
... ])
>>> operator.process(df0, df1)
    [(         Date Steps
    1  06-06-2021  2000
    3  08-06-2021  4000,),
    (         Date Steps
    0  05-06-2021  <NA>
    1  06-06-2021  2000
    3  07-06-2021  <NA>
    5  08-06-2021  4000,)]
set_observers(observers)

Function to store the observers for the given operator.

Parameters

observers (list of Observer) – Observer objects that observe the operator

is_functional()

Function that returns whether the operator is functional or infrastructure

Returns

whether is_functional

Return type

is_functional (bool)