distributed_upsample_operator

Operator to upsample a timeseries based dataframe in a distributed way

class tasrif.processing_pipeline.custom.distributed_upsample_operator.DistributedUpsampleOperator(rule)

Upsamples the dataframe based assuming the index

Example:

>>> import pandas as pd
>>> from tasrif.processing_pipeline.custom import DistributedUpsampleOperator
>>> df = pd.DataFrame([
>>>     ["2020-05-01", 16.5],
>>>     ["2020-05-02", 19.1],
>>>     ['2020-05-03', 0]],
>>>     columns=['timestamp', 'sedentary_hours'])
>>>
>>> df['timestamp'] = pd.to_datetime(df['timestamp'])
>>> df = df.set_index('timestamp')
>>> op = DistributedUpsampleOperator('6h')
>>> df = op.process(df)
>>>   [            sleep_level
>>>   timestamp
>>>   2020-05-01     1.333333
>>>   2020-05-02     1.000000]
[                     sedentary_hours
timestamp
2020-05-01 00:00:00            4.125
2020-05-01 06:00:00            4.125
2020-05-01 12:00:00            4.125
2020-05-01 18:00:00            4.125
2020-05-02 00:00:00            4.775
2020-05-02 06:00:00            4.775
2020-05-02 12:00:00            4.775
2020-05-02 18:00:00            4.775
2020-05-03 00:00:00            0.000]
__init__(rule)

Creates a new instance of ResampleOperator

Parameters

rule (ruleDateOffset, Timedelta, str) – The offset string or object representing target conversion.