set_index_operator

Set the DataFrame index using existing columns.

Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.

class tasrif.processing_pipeline.pandas.set_index_operator.SetIndexOperator(keys, **kwargs)

Examples

>>> import pandas as pd
>>> from tasrif.processing_pipeline.pandas import SetIndexOperator
>>> df = pd.DataFrame([
...     [1, "2020-05-01 00:00:00", 1],
...     [1, "2020-05-01 01:00:00", 1],
...     [1, "2020-05-01 03:00:00", 2],
...     [2, "2020-05-02 00:00:00", 1],
...     [2, "2020-05-02 01:00:00", 1]],
...     columns=['logId', 'timestamp', 'sleep_level'])
>>> df
logId       timestamp       sleep_level
0   1       2020-05-01 00:00:00     1
1   1       2020-05-01 01:00:00     1
2   1       2020-05-01 03:00:00     2
3   2       2020-05-02 00:00:00     1
4   2       2020-05-02 01:00:00     1
>>> op = SetIndexOperator('timestamp')
>>> op.process(df)
[                     logId  sleep_level
timestamp
2020-05-01 00:00:00      1            1
2020-05-01 01:00:00      1            1
2020-05-01 03:00:00      1            2
2020-05-02 00:00:00      2            1
2020-05-02 01:00:00      2            1]
__init__(keys, **kwargs)

Initializes the operator.

Parameters
  • keys (str or list) – This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays.

  • **kwargs – key word arguments passed to pandas DataFrame.dropna method