drop_index_duplicates_operator¶
Remove duplicate values from one or more dataframes.
-
class
tasrif.processing_pipeline.custom.drop_index_duplicates_operator.
DropIndexDuplicatesOperator
(keep='first')¶ Remove duplicate indices from one or more dataframes.
Examples
>>> import pandas as pd >>> import numpy as np >>> >>> from tasrif.processing_pipeline.custom import DropDuplicatesOperator >>> >>> idx = pd.Index(['1', '2', '2', '3']) >>> df = pd.DataFrame([['tom', 10], ['Alfred', 15], ['Alfred', 18], ... ['juli', 14]], columns=['name', 'age'], index=idx) >>> >>> operator = DropIndexDuplicatesOperator(keep='first') >>> df = operator.process(df)[0] >>> >>> print(df) name age 1 tom 10 2 Alfred 15 3 juli 14
-
__init__
(keep='first')¶ Initializes the operator
- Parameters
keep ('first', 'last', False) – The value or values in a set of duplicates to mark as missing.
-