set_features_value_operator

Operator to select column feature_names with the option to set values for the selected data frame

class tasrif.processing_pipeline.custom.set_features_value_operator.SetFeaturesValueOperator(feature_names: list = None, selector: callable = None, value=None)

Selects a datafram using the lambda function self.selector, then optionally sets the values of the selected dataframe with self.values. if self.values is set, then the original dataframes are returned with the selected part is set to self.values

Examples

>>> import pandas as pd
>>> import numpy as np
>>>
>>> from tasrif.processing_pipeline.custom import SetFeaturesValueOperator
>>>
>>> df0 = pd.DataFrame([['Tom', 10], ['Alfred', 15], ['Alfred', 18], ['Juli', 14]], columns=['name', 'score'])
>>> df1 = pd.DataFrame({"name": ['Alfred', 'juli', 'Tom', 'Ali'],
...                    "score": [np.nan, 155, 159, 165],
...                    "born": [pd.NaT, pd.Timestamp("2010-04-25"), pd.NaT,
...                             pd.NaT]})
>>>
>>> print(df0)
>>> print(df1)
>>>
>>> print()
>>> print('=================================================')
>>> print('select rows where score >= 13')
>>> operator = SetFeaturesValueOperator(selector=lambda df: df.score >= 13)
>>> print(operator.process(df0, df1))
>>>
>>> print()
>>> print('=================================================')
>>> print('select rows where score >= 13 and set their scores to 15')
>>> operator = SetFeaturesValueOperator(selector=lambda df: df.score >= 13,
...                                     feature_names=['score'],
...                                     value=15)
>>> df0, df1 = operator.process(df0, df1)
>>> print(df0)
>>> print(df1)
     name  score
0     Tom     10
1  Alfred     15
2  Alfred     18
3    Juli     14
     name  score       born
0  Alfred    NaN        NaT
1    juli  155.0 2010-04-25
2     Tom  159.0        NaT
3     Ali  165.0        NaT
=================================================
select rows where age >= 13
[     name  score
1  Alfred     15
2  Alfred     18
3    Juli     14,    name  score       born
1  juli  155.0 2010-04-25
2   Tom  159.0        NaT
3   Ali  165.0        NaT]
=================================================
select rows where score >= 13 and set their scores to 15
     name  score
0     Tom     10
1  Alfred     15
2  Alfred     15
3    Juli     15
     name  score       born
0  Alfred    NaN        NaT
1    juli   15.0 2010-04-25
2     Tom   15.0        NaT
3     Ali   15.0        NaT
__init__(feature_names: list = None, selector: callable = None, value=None)

Creates a new instance of CreateFeatureOperator

Parameters
  • feature_names (list) – list of feature_names to select

  • selector (callable) – lambda function that result in pandas row indexing dataframe (a dataframe of trues and falses), see example.

  • value (int, optional) – value to replace the selected rows.