categorize_duration_operator¶
Operator to extract features from a duration column
-
class
tasrif.processing_pipeline.custom.categorize_duration_operator.
CategorizeDurationOperator
(duration_feature_name='duration', category_definition='minute')¶ Given a 2D dataframe representing a timeseries where each row represents a time duration, this operator will add a new feature(s) that represent a categorization of the duration. The categorization specification is provided in the constructor.
Examples
>>> from datetime import timedelta >>> import numpy as np >>> import pandas as pd >>> import seaborn as sns >>> from tasrif.processing_pipeline.custom import CategorizeDurationOperator >>> >>> >>> dates = pd.date_range('2016-12-31', '2017-01-08', freq='D').to_series() >>> df = pd.DataFrame() >>> df["Date"] = dates >>> df['Last_Date'] = df['Date'].apply(lambda x: x + timedelta(days=np.random.randint(3), >>> hours=np.random.randint(24), >>> minutes=np.random.randint(60))) >>> df['Duration'] = df['Last_Date'] - df['Date'] >>> df['Steps'] = np.random.randint(1000,25000, size=len(df)) >>> df['Calories'] = np.random.randint(1800,3000, size=len(df)) >>> >>> # %% >>> df # pylint: disable=pointless-statement >>> >>> Date Last_Date Duration Steps Calories 2016-12-31 2016-12-31 2016-12-31 05:01:00 0 days 05:01:00 10858 1852 2017-01-01 2017-01-01 2017-01-01 23:51:00 0 days 23:51:00 19802 2126 2017-01-02 2017-01-02 2017-01-03 03:32:00 1 days 03:32:00 1924 2201 2017-01-03 2017-01-03 2017-01-04 01:31:00 1 days 01:31:00 3393 1935 2017-01-04 2017-01-04 2017-01-04 03:44:00 0 days 03:44:00 8177 2833 2017-01-05 2017-01-05 2017-01-06 14:24:00 1 days 14:24:00 21838 2893 2017-01-06 2017-01-06 2017-01-08 00:53:00 2 days 00:53:00 5671 2095 2017-01-07 2017-01-07 2017-01-09 21:26:00 2 days 21:26:00 6792 2350 2017-01-08 2017-01-08 2017-01-09 05:21:00 1 days 05:21:00 24555 2425
>>> df1 = df.copy() >>> operator = CategorizeDurationOperator(duration_feature_name="Duration", category_definition="day") >>> df1 = operator.process(df1)[0] >>> df1 # pylint: disable=pointless-statement >>> Date Last_Date Duration Steps Calories day_delta 2016-12-31 2016-12-31 2016-12-31 05:01:00 0 days 05:01:00 10858 1852 0 2017-01-01 2017-01-01 2017-01-01 23:51:00 0 days 23:51:00 19802 2126 0 2017-01-02 2017-01-02 2017-01-03 03:32:00 1 days 03:32:00 1924 2201 1 2017-01-03 2017-01-03 2017-01-04 01:31:00 1 days 01:31:00 3393 1935 1 2017-01-04 2017-01-04 2017-01-04 03:44:00 0 days 03:44:00 8177 2833 0 2017-01-05 2017-01-05 2017-01-06 14:24:00 1 days 14:24:00 21838 2893 1 2017-01-06 2017-01-06 2017-01-08 00:53:00 2 days 00:53:00 5671 2095 2 2017-01-07 2017-01-07 2017-01-09 21:26:00 2 days 21:26:00 6792 2350 2 2017-01-08 2017-01-08 2017-01-09 05:21:00 1 days 05:21:00 24555 2425 1
-
__init__
(duration_feature_name='duration', category_definition='minute')¶ Creates a new instance of CategorizeDurationOperator
- Parameters
duration_feature_name (str) – Name of the feature to identify related time delta series
category_definition (str, list[str]) –
str or array of str Value is one of “day”, “hour” or “minutes” to categorize based on number of days, hours of the minutes:
[ "day" ]
Array of dictionary customized column names are desired:
[ {"day": "day_of_week"}, ]
-