read_csv_folder_operator

Operator to read multiple csvs in a folder

class tasrif.processing_pipeline.custom.read_csv_folder_operator.ReadCsvFolderOperator(pipeline: tasrif.processing_pipeline.sequence_operator.SequenceOperator = None, name_pattern='*.csv', filename_column_name='filename', concatenate=True, **read_csv_kwargs)

Operator that returns a Generator: one csv file per call.

Example

>>> import pandas as pd
>>> import numpy as np
>>>
>>> from tasrif.processing_pipeline.custom import ReadCsvFolderOperator
>>> from tasrif.processing_pipeline.pandas import ConcatOperator
>>> from tasrif.processing_pipeline import SequenceOperator
>>>
>>>
>>> details1 = pd.DataFrame({'calories': [360, 540],
...                          'time': [pd.Timestamp("2015-04-25"), pd.Timestamp("2015-04-26")]
...                         })
>>>
>>> details2 = pd.DataFrame({'calories': [420, 250],
...                          'time': [pd.Timestamp("2015-05-16"), pd.Timestamp("2015-05-17")]
...                         })
>>>
>>>
>>> # Save File 1 and File 2
>>> details1.to_csv('./details1.csv', index=False)
>>> details2.to_csv('./details2.csv', index=False)
>>>
>>> pipeline = SequenceOperator([
...     ReadCsvFolderOperator(name_pattern='./*.csv', pipeline=None),
...     ConcatOperator()
... ])
>>>
>>> df = pipeline.process()[0]
>>> df
__init__(pipeline: tasrif.processing_pipeline.sequence_operator.SequenceOperator = None, name_pattern='*.csv', filename_column_name='filename', concatenate=True, **read_csv_kwargs)

Creates a new instance of ReadCsvFolderOperator

Parameters
  • pipeline (SequenceOperator) – pipeline to apply on dataframe csv file before yielding it

  • name_pattern (str) – regex pattern of the csv files that the user wishes to read

  • filename_column_name (str) – column to be created for the csv file representing the filename

  • concatenate (bool) – whether to concatenate the files to a single dataframe or not

  • **read_csv_kwargs – keyword arguments passed to Pandas read_csv method