categorize_time_operator¶
Operator to aggregate column features based on a column
-
class
tasrif.processing_pipeline.custom.categorize_time_operator.
CategorizeTimeOperator
(date_feature_name='date', category_definition='day')¶ Given a 2D dataframe representing a timeseries where each row represents a time event, this operator will add a new feature(s) that represent a categorization of the date. The categorization specification is provided in the constructor.
Examples
>>> import numpy as np >>> import pandas as pd >>> >>> from tasrif.processing_pipeline.custom import CategorizeTimeOperator >>> >>> >>> dates = pd.date_range('2016-12-31', '2017-01-08', freq='D').to_series() >>> df = pd.DataFrame() >>> df["Date"] = dates >>> df['Steps'] = np.random.randint(1000,25000, size=len(df)) >>> df['Calories'] = np.random.randint(1800,3000, size=len(df)) >>> >>> df Date Steps Calories 2016-12-31 2016-12-31 5145 2486 2017-01-01 2017-01-01 5018 2344 2017-01-02 2017-01-02 11010 2426 2017-01-03 2017-01-03 9304 2903 2017-01-04 2017-01-04 13490 2283 2017-01-05 2017-01-05 14511 1976 2017-01-06 2017-01-06 18697 2213 2017-01-07 2017-01-07 19204 2185 2017-01-08 2017-01-08 4470 2333
>>> df5 = df.copy() >>> operator = CategorizeTimeOperator(date_feature_name="Date", >>> category_definition=[ >>> {"day": "weekday", "values": [1, 1, 1, 1, 0, 0, 1]}, >>> {"month": "in_may", "values": [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]}]) >>> operator.process(df5)[0] Date Steps Calories weekday in_may 2016-12-31 2016-12-31 5145 2486 0 0 2017-01-01 2017-01-01 5018 2344 1 0 2017-01-02 2017-01-02 11010 2426 1 0 2017-01-03 2017-01-03 9304 2903 1 0 2017-01-04 2017-01-04 13490 2283 1 0 2017-01-05 2017-01-05 14511 1976 1 0 2017-01-06 2017-01-06 18697 2213 0 0 2017-01-07 2017-01-07 19204 2185 0 0 2017-01-08 2017-01-08 4470 2333 1 0
-
__init__
(date_feature_name='date', category_definition='day')¶ Creates a new instance of CategorizeTimeOperator
- Parameters
date_feature_name (str) – Name of the feature to identify related timestamp series
category_definition (str, list, dict) –
Value is one of “day”, “month” or “hijri_month” to categorize based on day of the week, month of the year or hijri month Array of these values if multiple categorizations are desired.:
[ "days", "month" ]
Array of dictionary customized column names are desired:
[ {"days": "day_of_week"}, {"month", "calendar_month} ]
Array of dictionary with mapping if the default categories are to mapped to customized categories.py For example to categorize based on weekday:
[ { "days": "weekday", "values": [1, 1, 1, 1, 0, 0, 1]} { "month": "winter", "values": [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1] } ]
-