encode_cyclical_features_operator¶
Operator to aggregate column features based on a column
-
class
tasrif.processing_pipeline.custom.encode_cyclical_features_operator.
EncodeCyclicalFeaturesOperator
(date_feature_name='date', category_definition='hour')¶ This method converts datetime pandas series to machine learning acceptable format. It extracts year, month, day, hour, and minute from the datetime object. The method returns a dataframe, as shown in below example.
>>> import numpy as np >>> import pandas as pd >>> import seaborn as sns >>> from tasrif.processing_pipeline.custom import EncodeCyclicalFeaturesOperator >>> >>> >>> dates = pd.date_range('2016-12-31', '2017-01-08', freq='D').to_series() >>> df = pd.DataFrame() >>> df["Date"] = dates >>> df['Steps'] = np.random.randint(1000,25000, size=len(df)) >>> df['Calories'] = np.random.randint(1800,3000, size=len(df)) >>> >>> df3 = df.copy() >>> operator = EncodeCyclicalFeaturesOperator(date_feature_name="Date", >>> category_definition=["day", "day_in_month"]) >>> df3 = operator.process(df3)[0] Date Steps Calories day_sin day_cos day_in_month_sin day_in_month_cos 2016-12-31 2016-12-31 3906 1910 -0.974928 -0.222521 -2.449294e-16 1.000000 2017-01-01 2017-01-01 7079 2909 -0.781831 0.623490 2.012985e-01 0.979530 2017-01-02 2017-01-02 19877 2503 0.000000 1.000000 3.943559e-01 0.918958 2017-01-03 2017-01-03 12873 2298 0.781831 0.623490 5.712682e-01 0.820763 2017-01-04 2017-01-04 19647 2438 0.974928 -0.222521 7.247928e-01 0.688967 2017-01-05 2017-01-05 17891 2704 0.433884 -0.900969 8.486443e-01 0.528964 2017-01-06 2017-01-06 16573 2825 -0.433884 -0.900969 9.377521e-01 0.347305 2017-01-07 2017-01-07 16222 2752 -0.974928 -0.222521 9.884683e-01 0.151428 2017-01-08 2017-01-08 9702 2772 -0.781831 0.623490 9.987165e-01 -0.050649
-
__init__
(date_feature_name='date', category_definition='hour')¶ Creates a new instance of EncodeCyclicalFeaturesOperator
- Parameters
date_feature_name – str Name of the feature to identify related timestamp series
category_definition –
str or array of str or dict Value is one of “day”, “month” to categorize based on day of the week, month of the year or hijri month Array of these values if multiple categorizations are desired.:
[ "days", "month" ]
Array of dictionary customized column names are desired:
[ {"days": "day_of_week"}, {"month", "calendar_month} ]
-