Pipelines¶
Pipelines are used to gather multiple Operators together to create a processing workflow.
Each Operator in a Pipeline is executed sequentially, with the output from one Operator becoming the input for the next.
As a follow up to the previous example, let’s do the following using Tasrif Operators wrapped in a Pipeline:
Drop all rows with missing values
Concatenate both DataFrames
Get the mean step count
>>> import pandas as pd
>>> from tasrif.processing_pipeline.pandas import DropNAOperator, \
... ConcatOperator, \
... MeanOperator
>>> from tasrif.processing_pipeline import SequenceOperator
>>> df1 = pd.DataFrame({
... 'Date': ['05-06-2021', '06-06-2021', '07-06-2021', '08-06-2021'],
... 'Steps': [ 4500, None, 5690, 6780]
... })
>>> df2 = pd.DataFrame({
... 'Date': ['12-07-2021', '13-07-2021', '14-07-2021', '15-07-2021'],
... 'Steps': [ 2100, None, None, 5400]
... })
>>> pipeline = SequenceOperator([
... DropNAOperator(),
... ConcatOperator(),
... MeanOperator()
... ])
>>> results = pipeline.process(df1, df2)
>>> results[0]
Steps 4894.0
dtype: float64
As before, we instantiate all Operators with their appropriate parameters,
except this time, they are grouped together within the
SequenceOperator
.
Finally, the .process
method is called on the pipeline instead of the
operators. This causes the pipeline to execute the operators in sequential
order. Essentially, the pipeline by itself can be considered as an Operator, as
it is simply a composition of other Operators.