DataReaders¶
DataReaders are a type of built-in Operator used to import data from popular eHealth datasets into Tasrif. They are essentially operators that act as specialized inputs for the pipeline.
Supported datasets include:
MyHeartCounts
Data from FitBit devices
SIHA
SleepHealth
Withings
ZenodoFitBit
DataReaders are instantiated with the path to the file/folder containing the dataset, and any other optional parameters.
For example, here’s a pipeline that uses body measurement data obtained from FitBit devices:
>>> from tasrif.processing_pipeline import SequenceOperator
>>> from tasrif.data_readers.fitbit_interday_dataset import FitbitInterdayDataset
>>> from tasrif.processing_pipeline.pandas import ConvertToDatetimeOperator, SetIndexOperator
>>> interday_folder_path = "path/to/data/from/FitBit/device"
>>> pipeline = SequenceOperator([
... FitbitInterdayDataset(
... interday_folder_path,
... table_name="Body"
... ),
... ConvertToDatetimeOperator(
... feature_names=['Date'],
... infer_datetime_format=True
... ),
... SetIndexOperator('Date')
... ])
>>> dfs = pipeline.process()
>>> dfs[0]
Weight BMI Fat
Date
2019-07-01 84.02 29.77 30.103
2019-07-02 83.93 29.74 30.103
2019-07-03 83.85 29.71 30.103
2019-07-04 83.76 29.68 30.103
2019-07-05 83.68 29.65 30.103
2019-07-06 83.59 29.62 30.103
2019-07-07 83.50 29.58 30.103
2019-07-08 83.51 29.59 30.028
Most DataReaders are straightforward and simply read the dataset files into DataFrames. However, other DataReaders perform more complex processing for datasets that need parsing before they can be used in a Tasrif pipeline.