visualize_days_observer¶
Module that defines the VisuzlizeDays class
-
class
tasrif.processing_pipeline.observers.visualize_days_observer.
VisualizeDaysObserver
(date_feature_name, signals, participant_identifier, participant_filter=- 1, signals_as_area=None, start_hour_col=None, end_date_feature_name=None, granularity='T', log_scale=False, figsize=7, 4)¶ DataprepObserver class to create a report for a dataframe
-
__init__
(date_feature_name, signals, participant_identifier, participant_filter=- 1, signals_as_area=None, start_hour_col=None, end_date_feature_name=None, granularity='T', log_scale=False, figsize=7, 4)¶ VisualizeDaysObserver constructor
- Parameters
date_feature_name (str) – time column
signals (str, list) – name of column(s) to plot as a line plot
participant_identifier (str) – participant identifier column
participant_filter (int, list) – participants to plot. Can be one of the following values - -1 (default), to plot the days of the first participant in the dataframe - -2, to plot all participants days - list of values in participant_identifier, to plot specific participants days
signals_as_area (str, list) – name of column(s) to plot as area plot. Columns have to be of boolean type.
start_hour_col (str) – optional parameter to draw the days starting from the given hour. Default is to draw from midnight-to-midnight. This column can be created using SetStartHourOfDayOperator
end_date_feature_name (str) – Optional column time name that represents the end time of the activity period
granularity (str) – Used when end_date_feature_name is set. Represents the rounding of start and end date_feature_name columns. Default is T which is 1 minute
log_scale (bool) – whether to draw y-axis in log-scale
figsize (tuple) – figure size
Examples
>>> import numpy as np >>> import pandas as pd >>> from tasrif.processing_pipeline import SequenceOperator, NoopOperator >>> from tasrif.processing_pipeline.custom import VisualizeDaysOperator, SetStartHourOfDayOperator >>> from tasrif.processing_pipeline.observers import VisualizeDaysObserver >>> from tasrif.processing_pipeline.pandas import FillNAOperator >>> >>> def generate_days(periods, freq, participant=1, start_day="2018-01-01", name='startTime'): ... idx = pd.date_range(start_day, periods=periods, freq=freq, name='startTime') ... activity = np.random.randint(0, 100, periods) ... df = pd.DataFrame(data=activity, index=idx, columns=['activity']) ... df['steps'] = np.random.randint(100, 1000, periods) ... df['participant'] = participant ... return df >>> >>> def generate_sleep(df, start_time='23:30', end_time='8:00', name='sleep'): ... df[name] = False ... time_filter = df.between_time(start_time=start_time, end_time=end_time).index ... df.loc[time_filter, name] = True ... df['not_' + name] = ~df[name] ... ... # reduce activity between 23:30 - 08:00 ... df.loc[time_filter, 'activity'] = df.loc[time_filter, 'activity'] / 50 ... df.loc[time_filter, 'steps'] = 0 ... return df >>> >>> def generate_data(participants=2, days=2): ... dfs = [] ... for i in range(participants): ... df = generate_days(periods=24*days, freq='H', participant=i) ... df = generate_sleep(df) ... dfs.append(df) ... return pd.concat(dfs) >>> >>> df = generate_data() >>> >>> # Add None to activity first day for participant 0 >>> df.iloc[36:48, 0] = None >>> df activity steps participant sleep not_sleep startTime 2018-01-01 00:00:00 0.42 0 0 True False 2018-01-01 01:00:00 0.70 0 0 True False 2018-01-01 02:00:00 0.08 0 0 True False 2018-01-01 03:00:00 0.00 0 0 True False 2018-01-01 04:00:00 0.92 0 0 True False ... ... ... ... ... ... 2018-01-02 19:00:00 90.00 121 1 False True 2018-01-02 20:00:00 48.00 312 1 False True 2018-01-02 21:00:00 57.00 303 1 False True 2018-01-02 22:00:00 76.00 916 1 False True 2018-01-02 23:00:00 55.00 474 1 False True
>>> # With no shift >>> observer = VisualizeDaysObserver(date_feature_name='startTime', ... signals=['activity', 'steps'], ... participant_identifier='participant', ... signals_as_area=['sleep']) >>> >>> pipeline = SequenceOperator([NoopOperator()], observers=[observer]) >>> pipeline.process(df)[0] activity steps participant sleep not_sleep startTime 2018-01-01 00:00:00 1.56 0 0 True False 2018-01-01 01:00:00 0.64 0 0 True False 2018-01-01 02:00:00 1.72 0 0 True False 2018-01-01 03:00:00 1.08 0 0 True False 2018-01-01 04:00:00 1.70 0 0 True False ... ... ... ... ... ... 2018-01-02 19:00:00 16.00 805 1 False True 2018-01-02 20:00:00 61.00 566 1 False True 2018-01-02 21:00:00 48.00 895 1 False True 2018-01-02 22:00:00 68.00 818 1 False True 2018-01-02 23:00:00 23.00 883 1 False True
>>> # With shift >>> observer = VisualizeDaysObserver(date_feature_name='startTime', ... signals=['activity', 'steps'], ... participant_identifier='participant', ... signals_as_area=['sleep'], ... start_hour_col='shifted_time_col') >>> >>> >>> pipeline = SequenceOperator([ ... SetStartHourOfDayOperator(date_feature_name='startTime', ... participant_identifier='participant', ... shifted_date_feature_name='shifted_time_col', ... shift=6), ... FillNAOperator(value=300), ... ], observers=[observer]) >>> >>> >>> pipeline.process(df)[0] activity steps participant sleep not_sleep shifted_time_col startTime 2018-01-01 00:00:00 1.56 0 0 True False 2017-12-31 18:00:00 2018-01-01 01:00:00 0.64 0 0 True False 2017-12-31 19:00:00 2018-01-01 02:00:00 1.72 0 0 True False 2017-12-31 20:00:00 2018-01-01 03:00:00 1.08 0 0 True False 2017-12-31 21:00:00 2018-01-01 04:00:00 1.70 0 0 True False 2017-12-31 22:00:00 ... ... ... ... ... ... ... 2018-01-02 19:00:00 16.00 805 1 False True 2018-01-02 13:00:00 2018-01-02 20:00:00 61.00 566 1 False True 2018-01-02 14:00:00 2018-01-02 21:00:00 48.00 895 1 False True 2018-01-02 15:00:00 2018-01-02 22:00:00 68.00 818 1 False True 2018-01-02 16:00:00 2018-01-02 23:00:00 23.00 883 1 False True 2018-01-02 17:00:00
-