ROOT 6.12/07 Reference Guide |
TDataFrame data source class for reading CSV files.
The TCsvDS class implements a CSV file reader for TDataFrame.
A TDataFrame that reads from a CSV file can be constructed using the factory method ROOT::Experimental::TDF::MakeCsvDataFrame, which accepts three parameters:
true
). If false
, header names will be automatically generated.The types of the columns in the CSV file are automatically inferred. The supported types are:
true
and false
.These are some formatting rules expected by the TCsvDS implementation:
The current implementation of TCsvDS reads the entire CSV file content into memory before TDataFrame starts processing it. Therefore, before creating a CSV TDataFrame, it is important to check both how much memory is available and the size of the CSV file.
Definition at line 18 of file TCsvDS.hxx.
Public Member Functions | |
TCsvDS (std::string_view fileName, bool readHeaders=true, char delimiter=',') | |
Constructor to create a CSV TDataSource for TDataFrame. More... | |
~TCsvDS () | |
Destructor. More... | |
const std::vector< std::string > & | GetColumnNames () const |
Returns a reference to the collection of the dataset's column names. More... | |
std::vector< std::pair< ULong64_t, ULong64_t > > | GetEntryRanges () |
Return ranges of entries to distribute to tasks. More... | |
std::string | GetTypeName (std::string_view colName) const |
Type of a column as a string, e.g. More... | |
bool | HasColumn (std::string_view colName) const |
Checks if the dataset has a certain column. More... | |
void | Initialise () |
Convenience method called before starting an event-loop. More... | |
void | SetEntry (unsigned int slot, ULong64_t entry) |
Advance the "cursors" returned by GetColumnReaders to the selected entry for a particular slot. More... | |
void | SetNSlots (unsigned int nSlots) |
Inform TDataSource of the number of processing slots (i.e. More... | |
Public Member Functions inherited from ROOT::Experimental::TDF::TDataSource | |
virtual | ~TDataSource ()=default |
virtual void | Finalise () |
Convenience method called after concluding an event-loop. More... | |
virtual void | FinaliseSlot (unsigned int) |
Convenience method called at the end of the data processing associated to a slot. More... | |
template<typename T > | |
std::vector< T ** > | GetColumnReaders (std::string_view columnName) |
Called at most once per column by TDF. More... | |
virtual void | InitSlot (unsigned int, ULong64_t) |
Convenience method called at the start of the data processing associated to a slot. More... | |
Private Types | |
using | Record = std::vector< void * > |
Private Member Functions | |
void | FillHeaders (const std::string &) |
void | FillRecord (const std::string &, Record &) |
void | GenerateHeaders (size_t) |
std::vector< void * > | GetColumnReadersImpl (std::string_view, const std::type_info &) |
type-erased vector of pointers to pointers to column values - one per slot More... | |
void | InferColTypes (std::vector< std::string > &) |
void | InferType (const std::string &, unsigned int) |
std::vector< std::string > | ParseColumns (const std::string &) |
size_t | ParseValue (const std::string &, std::vector< std::string > &, size_t) |
Private Attributes | |
std::vector< std::deque< bool > > | fBoolEvtValues |
std::vector< std::vector< void * > > | fColAddresses |
std::map< std::string, std::string > | fColTypes |
std::list< std::string > | fColTypesList |
char | fDelimiter |
std::vector< std::vector< double > > | fDoubleEvtValues |
std::vector< std::pair< ULong64_t, ULong64_t > > | fEntryRanges |
std::string | fFileName |
std::vector< std::string > | fHeaders |
std::vector< std::vector< Long64_t > > | fLong64EvtValues |
unsigned int | fNSlots = 0U |
std::vector< Record > | fRecords |
std::vector< std::vector< std::string > > | fStringEvtValues |
Static Private Attributes | |
static TRegexp | doubleRegex1 |
static TRegexp | doubleRegex2 |
static TRegexp | falseRegex |
static TRegexp | intRegex |
static TRegexp | trueRegex |
Additional Inherited Members |
#include <ROOT/TCsvDS.hxx>
|
private |
Definition at line 21 of file TCsvDS.hxx.
ROOT::Experimental::TDF::TCsvDS::TCsvDS | ( | std::string_view | fileName, |
bool | readHeaders = true , |
||
char | delimiter = ',' |
||
) |
Constructor to create a CSV TDataSource for TDataFrame.
[in] | fileName | Path of the CSV file. |
[in] | readHeaders | true if the CSV file contains headers as first row, false otherwise (default true ). |
[in] | delimiter | Delimiter character (default ','). |
Definition at line 239 of file TCsvDS.cxx.
ROOT::Experimental::TDF::TCsvDS::~TCsvDS | ( | ) |
Destructor.
Definition at line 278 of file TCsvDS.cxx.
|
private |
Definition at line 95 of file TCsvDS.cxx.
|
private |
Definition at line 103 of file TCsvDS.cxx.
|
private |
Definition at line 129 of file TCsvDS.cxx.
|
virtual |
Returns a reference to the collection of the dataset's column names.
Implements ROOT::Experimental::TDF::TDataSource.
Definition at line 298 of file TCsvDS.cxx.
|
privatevirtual |
type-erased vector of pointers to pointers to column values - one per slot
Implements ROOT::Experimental::TDF::TDataSource.
Definition at line 136 of file TCsvDS.cxx.
|
virtual |
Return ranges of entries to distribute to tasks.
They are required to be contiguous intervals with no entries skipped. Supposing a dataset with nEntries, the intervals must start at 0 and end at nEntries, e.g. [0-5],[5-10] for 10 entries.
Implements ROOT::Experimental::TDF::TDataSource.
Definition at line 303 of file TCsvDS.cxx.
|
virtual |
Type of a column as a string, e.g.
GetTypeName("x") == "double"
. Required for jitting e.g. df.Filter("x>0")
.
[in] | columnName | The name of the column |
Implements ROOT::Experimental::TDF::TDataSource.
Definition at line 309 of file TCsvDS.cxx.
|
virtual |
Checks if the dataset has a certain column.
[in] | columnName | The name of the column |
Implements ROOT::Experimental::TDF::TDataSource.
Definition at line 320 of file TCsvDS.cxx.
|
private |
Definition at line 168 of file TCsvDS.cxx.
|
private |
Definition at line 177 of file TCsvDS.cxx.
|
virtual |
Convenience method called before starting an event-loop.
This method might be called multiple times over the lifetime of a TDataSource, since users can run multiple event-loops with the same TDataFrame. Ideally, Initialise
should set the state of the TDataSource so that multiple identical event-loops will produce identical results.
Reimplemented from ROOT::Experimental::TDF::TDataSource.
Definition at line 360 of file TCsvDS.cxx.
|
private |
Definition at line 197 of file TCsvDS.cxx.
|
private |
Definition at line 208 of file TCsvDS.cxx.
Advance the "cursors" returned by GetColumnReaders to the selected entry for a particular slot.
[in] | slot | The data processing slot that needs to be considered |
[in] | entry | The entry which needs to be pointed to by the reader pointers Slots are adopted to accommodate parallel data processing. Different workers will loop over different ranges and will be labelled by different "slot" values. |
Implements ROOT::Experimental::TDF::TDataSource.
Definition at line 325 of file TCsvDS.cxx.
|
virtual |
Inform TDataSource of the number of processing slots (i.e.
worker threads) used by the associated TDataFrame. Slots numbers are used to simplify parallel execution: TDataFrame guarantees that different threads will always pass different slot values when calling methods concurrently.
Implements ROOT::Experimental::TDF::TDataSource.
Definition at line 343 of file TCsvDS.cxx.
|
staticprivate |
Definition at line 39 of file TCsvDS.hxx.
|
staticprivate |
Definition at line 39 of file TCsvDS.hxx.
|
staticprivate |
Definition at line 39 of file TCsvDS.hxx.
|
private |
Definition at line 37 of file TCsvDS.hxx.
|
private |
Definition at line 29 of file TCsvDS.hxx.
|
private |
Definition at line 27 of file TCsvDS.hxx.
|
private |
Definition at line 28 of file TCsvDS.hxx.
|
private |
Definition at line 25 of file TCsvDS.hxx.
|
private |
Definition at line 32 of file TCsvDS.hxx.
Definition at line 30 of file TCsvDS.hxx.
|
private |
Definition at line 24 of file TCsvDS.hxx.
|
private |
Definition at line 26 of file TCsvDS.hxx.
|
private |
Definition at line 33 of file TCsvDS.hxx.
|
private |
Definition at line 23 of file TCsvDS.hxx.
|
private |
Definition at line 31 of file TCsvDS.hxx.
|
private |
Definition at line 34 of file TCsvDS.hxx.
|
staticprivate |
Definition at line 39 of file TCsvDS.hxx.
|
staticprivate |
Definition at line 39 of file TCsvDS.hxx.