split_operator¶
Module that defines the SequenceOperator class
-
class
tasrif.processing_pipeline.split_operator.
SplitOperator
(split_operators, bind_list=None, observers=None, num_processes=1)¶ Class representing a split operation. The input coming into this operator is split into multiple branches represented by split operators that are passed in the constructor.
-
__init__
(split_operators, bind_list=None, observers=None, num_processes=1)¶ Constructs a split operator using the provided arguments
- Parameters
split_operators (list[ProcessingOperator]) – Python list of processing operators
bind_list (list[Integer]) – Specifies the bind order of data passed to the split operators, with each value in the bind_list corresponding to the index of the argument for the operator at that index. For example: a bind_order of [0, 1, 1] means that the first operator receives the first argument (index 0) and the second and third operator receives the second argument (index 1). Note that an error is raised if len(bind_list) != len(split_operators). If no bind_list is passed, arguments are passed in the same order as they are received (representing a bind_list of [0, 1, 2, …]).
observers (list[Observer]) – Python list of observers
num_processes – int number of logical processes to use to process the operator
- Raises
ValueError – Occurs when one of the objects in the split_operators list is not a ProcessingOperator.
ValueError – If the number of operators does not match the number of elements in the bind_list.
Examples
>>> import pandas as pd >>> from tasrif.processing_pipeline import SplitOperator >>> from tasrif.processing_pipeline.pandas import DropNAOperator, DropDuplicatesOperator
>>> df0 = pd.DataFrame({ ... 'Date': ['05-06-2021', '06-06-2021', '07-06-2021', '08-06-2021'], ... 'Steps': [ pd.NA, 2000, pd.NA, 4000] ... })
>>> df1 = pd.DataFrame({ ... 'Date': ['05-06-2021', '06-06-2021', '06-06-2021', '07-06-2021', '07-06-2021', '08-06-2021'], ... 'Steps': [ pd.NA, 2000, 2000, pd.NA, pd.NA, 4000] ... })
>>> operator = SplitOperator([ ... DropNAOperator(), ... DropDuplicatesOperator() ... ])
>>> operator.process(df0, df1) [( Date Steps 1 06-06-2021 2000 3 08-06-2021 4000,), ( Date Steps 0 05-06-2021 <NA> 1 06-06-2021 2000 3 07-06-2021 <NA> 5 08-06-2021 4000,)]
-
set_observers
(observers)¶ Function to store the observers for the given operator.
- Parameters
observers (list of Observer) – Observer objects that observe the operator
-
is_functional
()¶ Function that returns whether the operator is functional or infrastructure
- Returns
whether is_functional
- Return type
is_functional (bool)
-