drop_index_duplicates_operator

Remove duplicate values from one or more dataframes.

class tasrif.processing_pipeline.custom.drop_index_duplicates_operator.DropIndexDuplicatesOperator(keep='first')

Remove duplicate indices from one or more dataframes.

Examples

>>> import pandas as pd
>>> import numpy as np
>>>
>>> from tasrif.processing_pipeline.custom import DropDuplicatesOperator
>>>
>>> idx = pd.Index(['1', '2', '2', '3'])
>>> df = pd.DataFrame([['tom', 10], ['Alfred', 15], ['Alfred', 18],
... ['juli', 14]], columns=['name', 'age'], index=idx)
>>>
>>> operator = DropIndexDuplicatesOperator(keep='first')
>>> df = operator.process(df)[0]
>>>
>>> print(df)
     name  age
1     tom   10
2  Alfred   15
3    juli   14
__init__(keep='first')

Initializes the operator

Parameters

keep ('first', 'last', False) – The value or values in a set of duplicates to mark as missing.