read_csv_folder_operator¶
Operator to read multiple csvs in a folder
-
class
tasrif.processing_pipeline.custom.read_csv_folder_operator.
ReadCsvFolderOperator
(pipeline: tasrif.processing_pipeline.sequence_operator.SequenceOperator = None, name_pattern='*.csv', filename_column_name='filename', concatenate=True, **read_csv_kwargs)¶ Operator that returns a Generator: one csv file per call.
Example
>>> import pandas as pd >>> import numpy as np >>> >>> from tasrif.processing_pipeline.custom import ReadCsvFolderOperator >>> from tasrif.processing_pipeline.pandas import ConcatOperator >>> from tasrif.processing_pipeline import SequenceOperator >>> >>> >>> details1 = pd.DataFrame({'calories': [360, 540], ... 'time': [pd.Timestamp("2015-04-25"), pd.Timestamp("2015-04-26")] ... }) >>> >>> details2 = pd.DataFrame({'calories': [420, 250], ... 'time': [pd.Timestamp("2015-05-16"), pd.Timestamp("2015-05-17")] ... }) >>> >>> >>> # Save File 1 and File 2 >>> details1.to_csv('./details1.csv', index=False) >>> details2.to_csv('./details2.csv', index=False) >>> >>> pipeline = SequenceOperator([ ... ReadCsvFolderOperator(name_pattern='./*.csv', pipeline=None), ... ConcatOperator() ... ]) >>> >>> df = pipeline.process()[0] >>> df
-
__init__
(pipeline: tasrif.processing_pipeline.sequence_operator.SequenceOperator = None, name_pattern='*.csv', filename_column_name='filename', concatenate=True, **read_csv_kwargs)¶ Creates a new instance of ReadCsvFolderOperator
- Parameters
pipeline (SequenceOperator) – pipeline to apply on dataframe csv file before yielding it
name_pattern (str) – regex pattern of the csv files that the user wishes to read
filename_column_name (str) – column to be created for the csv file representing the filename
concatenate (bool) – whether to concatenate the files to a single dataframe or not
**read_csv_kwargs – keyword arguments passed to Pandas read_csv method
-