RSqliteDS is an RDF data source implementation for SQL result sets from sqlite3 files.
The RSqliteDS is able to feed an RDataFrame with data from a SQlite SELECT query. One can use it like
auto rdf = ROOT::RDF::FromSqlite("/path/to/file.sqlite", "select name from table"); auto h = rdf.Define("lName", "name.length()").Histo1D("lName");
The data source has to provide column types for all the columns. Determining column types in SQlite is tricky as it is dynamically typed and in principle each row can have different column types. The following heuristics is used:
Definition at line 51 of file RSqliteDS.hxx.
Classes | |
struct | Value_t |
Used to hold a single "cell" of the SELECT query's result table. Can be changed to std::variant once available. More... | |
Public Member Functions | |
RSqliteDS (const std::string &fileName, const std::string &query) | |
Build the dataframe. | |
~RSqliteDS () | |
Frees the sqlite resources and closes the file. | |
const std::vector< std::string > & | GetColumnNames () const final |
Returns the SELECT queries names. | |
std::vector< std::pair< ULong64_t, ULong64_t > > | GetEntryRanges () final |
Returns a range of size 1 as long as more rows are available in the SQL result set. | |
std::string | GetLabel () final |
Return a string representation of the datasource type. | |
std::string | GetTypeName (std::string_view colName) const final |
Returns the C++ type for a given column name, implemented as a linear search through all the columns. | |
bool | HasColumn (std::string_view colName) const final |
A linear search through the columns for the given name. | |
void | Initialize () final |
Resets the SQlite query engine at the beginning of the event loop. | |
bool | SetEntry (unsigned int slot, ULong64_t entry) final |
Stores the result of the current active sqlite query row as a C++ value. | |
void | SetNSlots (unsigned int nSlots) final |
Almost a no-op, many slots can in fact reduce the performance due to thread synchronization. | |
Public Member Functions inherited from ROOT::RDF::RDataSource | |
virtual | ~RDataSource ()=default |
virtual void | Finalize () |
Convenience method called after concluding an event-loop. | |
virtual void | FinalizeSlot (unsigned int) |
Convenience method called at the end of the data processing associated to a slot. | |
template<typename T > | |
std::vector< T ** > | GetColumnReaders (std::string_view columnName) |
Called at most once per column by RDF. | |
virtual std::unique_ptr< ROOT::Detail::RDF::RColumnReaderBase > | GetColumnReaders (unsigned int, std::string_view, const std::type_info &) |
If the other GetColumnReaders overload returns an empty vector, this overload will be called instead. | |
virtual std::size_t | GetNFiles () const |
Returns the number of files from which the dataset is constructed. | |
virtual void | InitSlot (unsigned int, ULong64_t) |
Convenience method called at the start of the data processing associated to a slot. | |
Protected Member Functions | |
Record_t | GetColumnReadersImpl (std::string_view name, const std::type_info &) final |
Activates the given column's result value. | |
Protected Member Functions inherited from ROOT::RDF::RDataSource | |
virtual std::string | AsString () |
Private Types | |
enum class | ETypes { kInteger , kReal , kText , kBlob , kNull } |
All the types known to SQlite. Changes require changing fgTypeNames, too. More... | |
Private Member Functions | |
void | SqliteError (int errcode) |
Helper function to throw an exception if there is a fatal sqlite error, e.g. an I/O error. | |
Private Attributes | |
std::vector< std::string > | fColumnNames |
std::vector< ETypes > | fColumnTypes |
std::unique_ptr< Internal::RSqliteDSDataSet > | fDataSet |
ULong64_t | fNRow |
unsigned int | fNSlots |
std::vector< Value_t > | fValues |
The data source is inherently single-threaded and returns only one row at a time. This vector holds the results. | |
Static Private Attributes | |
static constexpr char const * | fgTypeNames [] |
Corresponds to the types defined in ETypes. | |
Additional Inherited Members | |
Protected Types inherited from ROOT::RDF::RDataSource | |
using | Record_t = std::vector< void * > |
#include <ROOT/RSqliteDS.hxx>
|
strongprivate |
All the types known to SQlite. Changes require changing fgTypeNames, too.
Enumerator | |
---|---|
kInteger | |
kReal | |
kText | |
kBlob | |
kNull |
Definition at line 55 of file RSqliteDS.hxx.
ROOT::RDF::RSqliteDS::RSqliteDS | ( | const std::string & | fileName, |
const std::string & | query | ||
) |
Build the dataframe.
[in] | fileName | The path to an sqlite3 file, will be opened read-only |
[in] | query | A valid sqlite3 SELECT query |
The constructor opens the sqlite file, prepares the query engine and determines the column names and types.
Definition at line 352 of file RSqliteDS.cxx.
ROOT::RDF::RSqliteDS::~RSqliteDS | ( | ) |
Frees the sqlite resources and closes the file.
Definition at line 438 of file RSqliteDS.cxx.
|
finalvirtual |
Returns the SELECT queries names.
The column names have been cached in the constructor. For expressions, the column name is the string of the expression unless the query defines a column name with as like in "SELECT 1 + 1 as mycolumn FROM table"
Implements ROOT::RDF::RDataSource.
Definition at line 451 of file RSqliteDS.cxx.
|
finalprotectedvirtual |
Activates the given column's result value.
Implements ROOT::RDF::RDataSource.
Definition at line 458 of file RSqliteDS.cxx.
|
finalvirtual |
Returns a range of size 1 as long as more rows are available in the SQL result set.
This inherently serialized the RDF independent of the number of slots.
Implements ROOT::RDF::RDataSource.
Definition at line 481 of file RSqliteDS.cxx.
|
finalvirtual |
Return a string representation of the datasource type.
The returned string will be used by ROOT::RDF::SaveGraph() to represent the datasource in the visualization of the computation graph. Concrete datasources can override the default implementation.
Reimplemented from ROOT::RDF::RDataSource.
Definition at line 529 of file RSqliteDS.cxx.
|
finalvirtual |
Returns the C++ type for a given column name, implemented as a linear search through all the columns.
Implements ROOT::RDF::RDataSource.
Definition at line 500 of file RSqliteDS.cxx.
|
finalvirtual |
A linear search through the columns for the given name.
Implements ROOT::RDF::RDataSource.
Definition at line 514 of file RSqliteDS.cxx.
|
finalvirtual |
Resets the SQlite query engine at the beginning of the event loop.
Reimplemented from ROOT::RDF::RDataSource.
Definition at line 521 of file RSqliteDS.cxx.
Stores the result of the current active sqlite query row as a C++ value.
Implements ROOT::RDF::RDataSource.
Definition at line 546 of file RSqliteDS.cxx.
|
finalvirtual |
Almost a no-op, many slots can in fact reduce the performance due to thread synchronization.
Implements ROOT::RDF::RDataSource.
Definition at line 583 of file RSqliteDS.cxx.
|
private |
Helper function to throw an exception if there is a fatal sqlite error, e.g. an I/O error.
Definition at line 594 of file RSqliteDS.cxx.
|
private |
Definition at line 83 of file RSqliteDS.hxx.
|
private |
Definition at line 84 of file RSqliteDS.hxx.
|
private |
Definition at line 80 of file RSqliteDS.hxx.
|
staticconstexprprivate |
Corresponds to the types defined in ETypes.
Definition at line 90 of file RSqliteDS.hxx.
|
private |
Definition at line 82 of file RSqliteDS.hxx.
|
private |
Definition at line 81 of file RSqliteDS.hxx.
|
private |
The data source is inherently single-threaded and returns only one row at a time. This vector holds the results.
Definition at line 86 of file RSqliteDS.hxx.