read_nested_csv_operator

Operator to aggregate column features based on a column

class tasrif.processing_pipeline.custom.read_nested_csv_operator.ReadNestedCsvOperator(folder_path, field, pipeline: tasrif.processing_pipeline.sequence_operator.SequenceOperator = None)

Operator that returns a Generator: one record per call.

Example

>>> import pandas as pd
>>> import numpy as np
>>>
>>> from tasrif.processing_pipeline.custom import ReadNestedCsvOperator
>>>
>>> df = pd.DataFrame({"name": ['Alfred', 'Roy'],
...                    "age": [43, 32],
...                    "file_details": ['details1', 'details2']})
>>>
>>> details1 = pd.DataFrame({'calories': [360, 540],
...                          'time': [pd.Timestamp("2015-04-25"), pd.Timestamp("2015-04-26")]
...                         })
>>>
>>> details2 = pd.DataFrame({'calories': [420, 250],
...                          'time': [pd.Timestamp("2015-05-16"), pd.Timestamp("2015-05-17")]
...                         })
>>>
>>>
>>> # Save File 1 and File 2
>>> details1.to_csv('details1.csv', index=False)
>>> details2.to_csv('details2.csv', index=False)
>>>
>>> operator = ReadNestedCsvOperator(folder_path='./', field='file_details', pipeline=None)
>>> generator = operator.process(df)
>>>
>>> # Iterates twice
>>> for record, details in generator:
...     print('Subject information:')
...     print(record)
...     print('')
...     print('Subject details:')
...     print(details)
...     print('============================')
Subject information:
name              Alfred
age                   43
file_details    details1
Name: 0, dtype: object
...
Subject details:
   calories        time
0       360  2015-04-25
1       540  2015-04-26
============================
Subject information:
name                 Roy
age                   32
file_details    details2
Name: 1, dtype: object
...
Subject details:
   calories        time
0       420  2015-05-16
1       250  2015-05-17
============================
__init__(folder_path, field, pipeline: tasrif.processing_pipeline.sequence_operator.SequenceOperator = None)

Creates a new instance of ReadNestedCsvOperator

Parameters
  • folder_path (str) – path to csv files

  • field (str) – column that contains the csv file names

  • pipeline (SequenceOperator) – pipeline to apply on dataframe record before yielding it