encode_cyclical_features_operator

Operator to aggregate column features based on a column

class tasrif.processing_pipeline.custom.encode_cyclical_features_operator.EncodeCyclicalFeaturesOperator(date_feature_name='date', category_definition='hour')

This method converts datetime pandas series to machine learning acceptable format. It extracts year, month, day, hour, and minute from the datetime object. The method returns a dataframe, as shown in below example.

>>> import numpy as np
>>> import pandas as pd
>>> import seaborn as sns
>>> from tasrif.processing_pipeline.custom import EncodeCyclicalFeaturesOperator
>>>
>>>
>>> dates = pd.date_range('2016-12-31', '2017-01-08', freq='D').to_series()
>>> df = pd.DataFrame()
>>> df["Date"] = dates
>>> df['Steps'] = np.random.randint(1000,25000, size=len(df))
>>> df['Calories'] = np.random.randint(1800,3000, size=len(df))
>>>
>>> df3 = df.copy()
>>> operator = EncodeCyclicalFeaturesOperator(date_feature_name="Date",
>>>                                           category_definition=["day", "day_in_month"])
>>> df3 = operator.process(df3)[0]
Date    Steps   Calories    day_sin     day_cos     day_in_month_sin    day_in_month_cos
2016-12-31  2016-12-31  3906    1910    -0.974928   -0.222521   -2.449294e-16   1.000000
2017-01-01  2017-01-01  7079    2909    -0.781831   0.623490    2.012985e-01    0.979530
2017-01-02  2017-01-02  19877   2503    0.000000    1.000000    3.943559e-01    0.918958
2017-01-03  2017-01-03  12873   2298    0.781831    0.623490    5.712682e-01    0.820763
2017-01-04  2017-01-04  19647   2438    0.974928    -0.222521   7.247928e-01    0.688967
2017-01-05  2017-01-05  17891   2704    0.433884    -0.900969   8.486443e-01    0.528964
2017-01-06  2017-01-06  16573   2825    -0.433884   -0.900969   9.377521e-01    0.347305
2017-01-07  2017-01-07  16222   2752    -0.974928   -0.222521   9.884683e-01    0.151428
2017-01-08  2017-01-08  9702    2772    -0.781831   0.623490    9.987165e-01    -0.050649
__init__(date_feature_name='date', category_definition='hour')

Creates a new instance of EncodeCyclicalFeaturesOperator

Parameters
  • date_feature_name – str Name of the feature to identify related timestamp series

  • category_definition

    str or array of str or dict Value is one of “day”, “month” to categorize based on day of the week, month of the year or hijri month Array of these values if multiple categorizations are desired.:

    [
        "days", "month"
    ]
    

    Array of dictionary customized column names are desired:

    [
        {"days": "day_of_week"},
        {"month", "calendar_month}
    ]