sequence_operator

Module that defines the SequenceOperator class

class tasrif.processing_pipeline.sequence_operator.SequenceOperator(processing_operators, observers=None)

Class representing a pipeline of processing operators. The definition of the pipeline is passed in the constructor as a list of ProcessingOperator objects. Data flows from one operator to another in a chained fashion.

__init__(processing_operators, observers=None)

Constructs a sequence operator from a list of operators

Parameters
  • processing_operators (list[ProcessingOperator]) – Python list of processing operators

  • observers (list[Observer]) – Python list of observers

Raises

ValueError – Occurs when one of the objects in the specified list is not a ProcessingOperator

Examples

>>> from tasrif.processing_pipeline import SequenceOperator
>>> from tasrif.processing_pipeline.pandas import DropDuplicatesOperator, DropNAOperator
>>> df = pd.DataFrame({"pid": ['001', '002', '003'],
...                 "height": [np.nan, 188, 170],
...                 "born": [pd.NaT, pd.Timestamp("1940-04-25"),
...                          pd.NaT]})
>>> pipeline = SequenceOperator([DropDuplicatesOperator(), DropNAOperator()])
>>> pipeline.process(df)
(   pid  height       born
 1  002   188.0 1940-04-25,)
set_observers(observers)

Function to store the observers for the given operator.

Parameters

observers (list of Observer) – Observer objects that observe the operator

is_functional()

Function that returns whether the operator is functional or infrastructure

Returns

whether is_functional

Return type

is_functional (bool)