visualize_days_observer

Module that defines the VisuzlizeDays class

class tasrif.processing_pipeline.observers.visualize_days_observer.VisualizeDaysObserver(date_feature_name, signals, participant_identifier, participant_filter=- 1, signals_as_area=None, start_hour_col=None, end_date_feature_name=None, granularity='T', log_scale=False, figsize=7, 4)

DataprepObserver class to create a report for a dataframe

__init__(date_feature_name, signals, participant_identifier, participant_filter=- 1, signals_as_area=None, start_hour_col=None, end_date_feature_name=None, granularity='T', log_scale=False, figsize=7, 4)

VisualizeDaysObserver constructor

Parameters
  • date_feature_name (str) – time column

  • signals (str, list) – name of column(s) to plot as a line plot

  • participant_identifier (str) – participant identifier column

  • participant_filter (int, list) – participants to plot. Can be one of the following values - -1 (default), to plot the days of the first participant in the dataframe - -2, to plot all participants days - list of values in participant_identifier, to plot specific participants days

  • signals_as_area (str, list) – name of column(s) to plot as area plot. Columns have to be of boolean type.

  • start_hour_col (str) – optional parameter to draw the days starting from the given hour. Default is to draw from midnight-to-midnight. This column can be created using SetStartHourOfDayOperator

  • end_date_feature_name (str) – Optional column time name that represents the end time of the activity period

  • granularity (str) – Used when end_date_feature_name is set. Represents the rounding of start and end date_feature_name columns. Default is T which is 1 minute

  • log_scale (bool) – whether to draw y-axis in log-scale

  • figsize (tuple) – figure size

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from tasrif.processing_pipeline import SequenceOperator, NoopOperator
>>> from tasrif.processing_pipeline.custom import VisualizeDaysOperator, SetStartHourOfDayOperator
>>> from tasrif.processing_pipeline.observers import VisualizeDaysObserver
>>> from tasrif.processing_pipeline.pandas import FillNAOperator
>>>
>>> def generate_days(periods, freq, participant=1, start_day="2018-01-01", name='startTime'):
...     idx = pd.date_range(start_day, periods=periods, freq=freq, name='startTime')
...     activity = np.random.randint(0, 100, periods)
...     df = pd.DataFrame(data=activity, index=idx, columns=['activity'])
...     df['steps'] = np.random.randint(100, 1000, periods)
...     df['participant'] = participant
...     return df
>>>
>>> def generate_sleep(df, start_time='23:30', end_time='8:00', name='sleep'):
...     df[name] = False
...     time_filter = df.between_time(start_time=start_time, end_time=end_time).index
...     df.loc[time_filter, name] = True
...     df['not_' + name] = ~df[name]
...
...     # reduce activity between 23:30 - 08:00
...     df.loc[time_filter, 'activity'] = df.loc[time_filter, 'activity'] / 50
...     df.loc[time_filter, 'steps'] = 0
...     return df
>>>
>>> def generate_data(participants=2, days=2):
...     dfs = []
...     for i in range(participants):
...         df = generate_days(periods=24*days, freq='H', participant=i)
...         df = generate_sleep(df)
...         dfs.append(df)
...     return pd.concat(dfs)
>>>
>>> df = generate_data()
>>>
>>> # Add None to activity first day for participant 0
>>> df.iloc[36:48, 0] = None
>>> df
            activity    steps   participant     sleep   not_sleep
startTime
2018-01-01 00:00:00     0.42    0   0   True    False
2018-01-01 01:00:00     0.70    0   0   True    False
2018-01-01 02:00:00     0.08    0   0   True    False
2018-01-01 03:00:00     0.00    0   0   True    False
2018-01-01 04:00:00     0.92    0   0   True    False
...     ...     ...     ...     ...     ...
2018-01-02 19:00:00     90.00   121     1   False   True
2018-01-02 20:00:00     48.00   312     1   False   True
2018-01-02 21:00:00     57.00   303     1   False   True
2018-01-02 22:00:00     76.00   916     1   False   True
2018-01-02 23:00:00     55.00   474     1   False   True
>>> # With no shift
>>> observer = VisualizeDaysObserver(date_feature_name='startTime',
...                                  signals=['activity', 'steps'],
...                                  participant_identifier='participant',
...                                  signals_as_area=['sleep'])
>>>
>>> pipeline = SequenceOperator([NoopOperator()], observers=[observer])
>>> pipeline.process(df)[0]
    activity    steps   participant     sleep   not_sleep
startTime
2018-01-01 00:00:00     1.56    0   0   True    False
2018-01-01 01:00:00     0.64    0   0   True    False
2018-01-01 02:00:00     1.72    0   0   True    False
2018-01-01 03:00:00     1.08    0   0   True    False
2018-01-01 04:00:00     1.70    0   0   True    False
...     ...     ...     ...     ...     ...
2018-01-02 19:00:00     16.00   805     1   False   True
2018-01-02 20:00:00     61.00   566     1   False   True
2018-01-02 21:00:00     48.00   895     1   False   True
2018-01-02 22:00:00     68.00   818     1   False   True
2018-01-02 23:00:00     23.00   883     1   False   True
>>> # With shift
>>> observer = VisualizeDaysObserver(date_feature_name='startTime',
...                                 signals=['activity', 'steps'],
...                                 participant_identifier='participant',
...                                 signals_as_area=['sleep'],
...                                 start_hour_col='shifted_time_col')
>>>
>>>
>>> pipeline = SequenceOperator([
...      SetStartHourOfDayOperator(date_feature_name='startTime',
...                                participant_identifier='participant',
...                                shifted_date_feature_name='shifted_time_col',
...                                shift=6),
...      FillNAOperator(value=300),
... ], observers=[observer])
>>>
>>>
>>> pipeline.process(df)[0]
    activity    steps   participant     sleep   not_sleep   shifted_time_col
startTime
2018-01-01 00:00:00     1.56    0   0   True    False   2017-12-31 18:00:00
2018-01-01 01:00:00     0.64    0   0   True    False   2017-12-31 19:00:00
2018-01-01 02:00:00     1.72    0   0   True    False   2017-12-31 20:00:00
2018-01-01 03:00:00     1.08    0   0   True    False   2017-12-31 21:00:00
2018-01-01 04:00:00     1.70    0   0   True    False   2017-12-31 22:00:00
...     ...     ...     ...     ...     ...     ...
2018-01-02 19:00:00     16.00   805     1   False   True    2018-01-02 13:00:00
2018-01-02 20:00:00     61.00   566     1   False   True    2018-01-02 14:00:00
2018-01-02 21:00:00     48.00   895     1   False   True    2018-01-02 15:00:00
2018-01-02 22:00:00     68.00   818     1   False   True    2018-01-02 16:00:00
2018-01-02 23:00:00     23.00   883     1   False   True    2018-01-02 17:00:00