Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
ROOT::Experimental::Internal::RPageSource Class Referenceabstract

Abstract interface to read data from an ntuple.

The page source is initialized with the columns of interest. Alias columns from projected fields are mapped to the corresponding physical columns. Pages from the columns of interest can then be mapped into memory. The page source also gives access to the ntuple's meta-data.

Definition at line 551 of file RPageStorage.hxx.

Classes

class  RActivePhysicalColumns
 Keeps track of the requested physical column IDs and their in-memory target type via a column element identifier. More...
 
struct  RClusterInfo
 Summarizes cluster-level information that are necessary to load a certain page. More...
 
struct  RCounters
 Default I/O performance counters that get registered in fMetrics More...
 
struct  REntryRange
 Used in SetEntryRange / GetEntryRange. More...
 
class  RExclDescriptorGuard
 An RAII wrapper used for the writable access to RPageSource::fDescriptor. See GetSharedDescriptorGuard(). More...
 
class  RSharedDescriptorGuard
 An RAII wrapper used for the read-only access to RPageSource::fDescriptor. See GetExclDescriptorGuard()`. More...
 

Public Member Functions

 RPageSource (const RPageSource &)=delete
 
 RPageSource (RPageSource &&)=delete
 
 RPageSource (std::string_view ntupleName, const RNTupleReadOptions &fOptions)
 
 ~RPageSource () override
 
ColumnHandle_t AddColumn (DescriptorId_t fieldId, RColumn &column) override
 Register a new column.
 
void Attach ()
 Open the physical storage container and deserialize header and footer.
 
std::unique_ptr< RPageSourceClone () const
 Open the same storage multiple time, e.g.
 
void DropColumn (ColumnHandle_t columnHandle) override
 Unregisters a column.
 
REntryRange GetEntryRange () const
 
NTupleSize_t GetNElements (ColumnHandle_t columnHandle)
 
NTupleSize_t GetNEntries ()
 
const RNTupleReadOptionsGetReadOptions () const
 
const RSharedDescriptorGuard GetSharedDescriptorGuard () const
 Takes the read lock for the descriptor.
 
EPageStorageType GetType () final
 Whether the concrete implementation is a sink or a source.
 
virtual std::vector< std::unique_ptr< RCluster > > LoadClusters (std::span< RCluster::RKey > clusterKeys)=0
 Populates all the pages of the given cluster ids and columns; it is possible that some columns do not contain any pages.
 
virtual RPageRef LoadPage (ColumnHandle_t columnHandle, NTupleSize_t globalIndex)
 Allocates and fills a page that contains the index-th element.
 
virtual RPageRef LoadPage (ColumnHandle_t columnHandle, RClusterIndex clusterIndex)
 Another version of LoadPage that allows to specify cluster-relative indexes.
 
virtual void LoadSealedPage (DescriptorId_t physicalColumnId, RClusterIndex clusterIndex, RSealedPage &sealedPage)=0
 Read the packed and compressed bytes of a page into the memory buffer provided by sealedPage.
 
void LoadStructure ()
 Loads header and footer without decompressing or deserializing them.
 
RPageSourceoperator= (const RPageSource &)=delete
 
RPageSourceoperator= (RPageSource &&)=delete
 
void SetEntryRange (const REntryRange &range)
 Promise to only read from the given entry range.
 
RResult< RPageUnsealPage (const RSealedPage &sealedPage, const RColumnElementBase &element)
 
void UnzipCluster (RCluster *cluster)
 Parallel decompression and unpacking of the pages in the given cluster.
 
- Public Member Functions inherited from ROOT::Experimental::Internal::RPageStorage
 RPageStorage (const RPageStorage &other)=delete
 
 RPageStorage (RPageStorage &&other)=default
 
 RPageStorage (std::string_view name)
 
virtual ~RPageStorage ()
 
ColumnId_t GetColumnId (ColumnHandle_t columnHandle) const
 
virtual Detail::RNTupleMetricsGetMetrics ()
 Returns the default metrics object.
 
const std::string & GetNTupleName () const
 Returns the NTuple name.
 
RPageStorageoperator= (const RPageStorage &other)=delete
 
RPageStorageoperator= (RPageStorage &&other)=default
 
void SetTaskScheduler (RTaskScheduler *taskScheduler)
 

Static Public Member Functions

static std::unique_ptr< RPageSourceCreate (std::string_view ntupleName, std::string_view location, const RNTupleReadOptions &options=RNTupleReadOptions())
 Guess the concrete derived page source from the file name (location)
 
static RResult< RPageUnsealPage (const RSealedPage &sealedPage, const RColumnElementBase &element, RPageAllocator &pageAlloc)
 Helper for unstreaming a page.
 

Protected Member Functions

virtual RNTupleDescriptor AttachImpl ()=0
 LoadStructureImpl() has been called before AttachImpl() is called
 
virtual std::unique_ptr< RPageSourceCloneImpl () const =0
 Returns a new, unattached page source for the same data set.
 
void EnableDefaultMetrics (const std::string &prefix)
 Enables the default set of metrics provided by RPageSource.
 
RExclDescriptorGuard GetExclDescriptorGuard ()
 Note that the underlying lock is not recursive. See GetSharedDescriptorGuard() for further information.
 
virtual RPageRef LoadPageImpl (ColumnHandle_t columnHandle, const RClusterInfo &clusterInfo, ClusterSize_t::ValueType idxInCluster)=0
 
virtual void LoadStructureImpl ()=0
 
void PrepareLoadCluster (const RCluster::RKey &clusterKey, ROnDiskPageMap &pageZeroMap, std::function< void(DescriptorId_t, NTupleSize_t, const RClusterDescriptor::RPageRange::RPageInfo &)> perPageFunc)
 Prepare a page range read for the column set in clusterKey.
 
virtual void UnzipClusterImpl (RCluster *cluster)
 
- Protected Member Functions inherited from ROOT::Experimental::Internal::RPageStorage
void WaitForAllTasks ()
 

Protected Attributes

RActivePhysicalColumns fActivePhysicalColumns
 The active columns are implicitly defined by the model fields or views.
 
std::unique_ptr< RCountersfCounters
 
RNTupleReadOptions fOptions
 
RPagePool fPagePool
 Pages that are unzipped with IMT are staged into the page pool.
 
- Protected Attributes inherited from ROOT::Experimental::Internal::RPageStorage
Detail::RNTupleMetrics fMetrics
 
std::string fNTupleName
 
std::unique_ptr< RPageAllocatorfPageAllocator
 For the time being, we will use the heap allocator for all sources and sinks. This may change in the future.
 
RTaskSchedulerfTaskScheduler = nullptr
 

Private Member Functions

void UpdateLastUsedCluster (DescriptorId_t clusterId)
 Does nothing if fLastUsedCluster == clusterId.
 

Private Attributes

RNTupleDescriptor fDescriptor
 
std::shared_mutex fDescriptorLock
 
REntryRange fEntryRange
 Used by the cluster pool to prevent reading beyond the given range.
 
bool fHasStructure = false
 Set to true once LoadStructure() is called.
 
bool fIsAttached = false
 Set to true once Attach() is called.
 
DescriptorId_t fLastUsedCluster = kInvalidDescriptorId
 Remembers the last cluster id from which a page was requested.
 
std::map< NTupleSize_t, DescriptorId_tfPreloadedClusters
 Clusters from where pages got preloaded in UnzipClusterImpl(), ordered by first entry number of the clusters.
 

Additional Inherited Members

- Public Types inherited from ROOT::Experimental::Internal::RPageStorage
using ColumnHandle_t = RColumnHandle
 The column handle identifies a column with the current open page storage.
 
using SealedPageSequence_t = std::deque< RSealedPage >
 
- Static Public Attributes inherited from ROOT::Experimental::Internal::RPageStorage
static constexpr std::size_t kNBytesPageChecksum = sizeof(std::uint64_t)
 The page checksum is a 64bit xxhash3.
 

#include <ROOT/RPageStorage.hxx>

Inheritance diagram for ROOT::Experimental::Internal::RPageSource:
[legend]

Constructor & Destructor Documentation

◆ RPageSource() [1/3]

ROOT::Experimental::Internal::RPageSource::RPageSource ( std::string_view  ntupleName,
const RNTupleReadOptions fOptions 
)

Definition at line 146 of file RPageStorage.cxx.

◆ RPageSource() [2/3]

ROOT::Experimental::Internal::RPageSource::RPageSource ( const RPageSource )
delete

◆ RPageSource() [3/3]

ROOT::Experimental::Internal::RPageSource::RPageSource ( RPageSource &&  )
delete

◆ ~RPageSource()

ROOT::Experimental::Internal::RPageSource::~RPageSource ( )
override

Definition at line 151 of file RPageStorage.cxx.

Member Function Documentation

◆ AddColumn()

ROOT::Experimental::Internal::RPageStorage::ColumnHandle_t ROOT::Experimental::Internal::RPageSource::AddColumn ( DescriptorId_t  fieldId,
RColumn column 
)
overridevirtual

Register a new column.

When reading, the column must exist in the ntuple on disk corresponding to the meta-data. When writing, every column can only be attached once.

Implements ROOT::Experimental::Internal::RPageStorage.

Definition at line 174 of file RPageStorage.cxx.

◆ Attach()

void ROOT::Experimental::Internal::RPageSource::Attach ( )

Open the physical storage container and deserialize header and footer.

Definition at line 204 of file RPageStorage.cxx.

◆ AttachImpl()

virtual RNTupleDescriptor ROOT::Experimental::Internal::RPageSource::AttachImpl ( )
protectedpure virtual

◆ Clone()

std::unique_ptr< ROOT::Experimental::Internal::RPageSource > ROOT::Experimental::Internal::RPageSource::Clone ( ) const

Open the same storage multiple time, e.g.

for reading in multiple threads. If the source is already attached, the clone will be attached, too. The clone will use, however, it's own connection to the underlying storage (e.g., file descriptor, XRootD handle, etc.)

Definition at line 212 of file RPageStorage.cxx.

◆ CloneImpl()

virtual std::unique_ptr< RPageSource > ROOT::Experimental::Internal::RPageSource::CloneImpl ( ) const
protectedpure virtual

Returns a new, unattached page source for the same data set.

Implemented in ROOT::Experimental::Internal::RPageSourceDaos, and ROOT::Experimental::Internal::RPageSourceFile.

◆ Create()

std::unique_ptr< ROOT::Experimental::Internal::RPageSource > ROOT::Experimental::Internal::RPageSource::Create ( std::string_view  ntupleName,
std::string_view  location,
const RNTupleReadOptions options = RNTupleReadOptions() 
)
static

Guess the concrete derived page source from the file name (location)

Definition at line 154 of file RPageStorage.cxx.

◆ DropColumn()

void ROOT::Experimental::Internal::RPageSource::DropColumn ( ColumnHandle_t  columnHandle)
overridevirtual

Unregisters a column.

A page source decreases the reference counter for the corresponding active column. For a page sink, dropping columns is currently a no-op.

Implements ROOT::Experimental::Internal::RPageStorage.

Definition at line 184 of file RPageStorage.cxx.

◆ EnableDefaultMetrics()

void ROOT::Experimental::Internal::RPageSource::EnableDefaultMetrics ( const std::string &  prefix)
protected

Enables the default set of metrics provided by RPageSource.

prefix will be used as the prefix for the counters registered in the internal RNTupleMetrics object. A subclass using the default set of metrics is responsible for updating the counters appropriately, e.g. fCounters->fNRead.Inc() Alternatively, a subclass might provide its own RNTupleMetrics object by overriding the GetMetrics() member function.

Definition at line 433 of file RPageStorage.cxx.

◆ GetEntryRange()

REntryRange ROOT::Experimental::Internal::RPageSource::GetEntryRange ( ) const
inline

Definition at line 772 of file RPageStorage.hxx.

◆ GetExclDescriptorGuard()

RExclDescriptorGuard ROOT::Experimental::Internal::RPageSource::GetExclDescriptorGuard ( )
inlineprotected

Note that the underlying lock is not recursive. See GetSharedDescriptorGuard() for further information.

Definition at line 719 of file RPageStorage.hxx.

◆ GetNElements()

ROOT::Experimental::NTupleSize_t ROOT::Experimental::Internal::RPageSource::GetNElements ( ColumnHandle_t  columnHandle)

Definition at line 228 of file RPageStorage.cxx.

◆ GetNEntries()

ROOT::Experimental::NTupleSize_t ROOT::Experimental::Internal::RPageSource::GetNEntries ( )

Definition at line 223 of file RPageStorage.cxx.

◆ GetReadOptions()

const RNTupleReadOptions & ROOT::Experimental::Internal::RPageSource::GetReadOptions ( ) const
inline

Definition at line 743 of file RPageStorage.hxx.

◆ GetSharedDescriptorGuard()

const RSharedDescriptorGuard ROOT::Experimental::Internal::RPageSource::GetSharedDescriptorGuard ( ) const
inline

Takes the read lock for the descriptor.

Multiple threads can take the lock concurrently. The underlying std::shared_mutex, however, is neither read nor write recursive: within one thread, only one lock (shared or exclusive) must be acquired at the same time. This requires special care in sections protected by GetSharedDescriptorGuard() and GetExclDescriptorGuard() especially to avoid that the locks are acquired indirectly (e.g. by a call to GetNEntries()). As a general guideline, no other method of the page source should be called (directly or indirectly) in a guarded section.

Definition at line 751 of file RPageStorage.hxx.

◆ GetType()

EPageStorageType ROOT::Experimental::Internal::RPageSource::GetType ( )
inlinefinalvirtual

Whether the concrete implementation is a sink or a source.

Implements ROOT::Experimental::Internal::RPageStorage.

Definition at line 742 of file RPageStorage.hxx.

◆ LoadClusters()

virtual std::vector< std::unique_ptr< RCluster > > ROOT::Experimental::Internal::RPageSource::LoadClusters ( std::span< RCluster::RKey clusterKeys)
pure virtual

Populates all the pages of the given cluster ids and columns; it is possible that some columns do not contain any pages.

The page source may load more columns than the minimal necessary set from columns. To indicate which columns have been loaded, LoadClusters()must mark them withSetColumnAvailable(). That includes the ones from thecolumnsthat don't have pages; otherwise subsequent requests for the cluster would assume an incomplete cluster and trigger loading again. LoadClusters()` is typically called from the I/O thread of a cluster pool, i.e. the method runs concurrently to other methods of the page source.

Implemented in ROOT::Experimental::Internal::RPageSourceDaos, and ROOT::Experimental::Internal::RPageSourceFile.

◆ LoadPage() [1/2]

ROOT::Experimental::Internal::RPageRef ROOT::Experimental::Internal::RPageSource::LoadPage ( ColumnHandle_t  columnHandle,
NTupleSize_t  globalIndex 
)
virtual

Allocates and fills a page that contains the index-th element.

The default implementation searches the page and calls LoadPageImpl(). Returns a default-constructed RPage for suppressed columns.

Definition at line 360 of file RPageStorage.cxx.

◆ LoadPage() [2/2]

ROOT::Experimental::Internal::RPageRef ROOT::Experimental::Internal::RPageSource::LoadPage ( ColumnHandle_t  columnHandle,
RClusterIndex  clusterIndex 
)
virtual

Another version of LoadPage that allows to specify cluster-relative indexes.

Returns a default-constructed RPage for suppressed columns.

Definition at line 398 of file RPageStorage.cxx.

◆ LoadPageImpl()

virtual RPageRef ROOT::Experimental::Internal::RPageSource::LoadPageImpl ( ColumnHandle_t  columnHandle,
const RClusterInfo clusterInfo,
ClusterSize_t::ValueType  idxInCluster 
)
protectedpure virtual

◆ LoadSealedPage()

virtual void ROOT::Experimental::Internal::RPageSource::LoadSealedPage ( DescriptorId_t  physicalColumnId,
RClusterIndex  clusterIndex,
RSealedPage sealedPage 
)
pure virtual

Read the packed and compressed bytes of a page into the memory buffer provided by sealedPage.

The sealed page can be used subsequently in a call to RPageSink::CommitSealedPage. The fSize and fNElements member of the sealedPage parameters are always set. If sealedPage.fBuffer is nullptr, no data will be copied but the returned size information can be used by the caller to allocate a large enough buffer and call LoadSealedPage again.

Implemented in ROOT::Experimental::Internal::RPageSourceDaos, and ROOT::Experimental::Internal::RPageSourceFile.

◆ LoadStructure()

void ROOT::Experimental::Internal::RPageSource::LoadStructure ( )

Loads header and footer without decompressing or deserializing them.

This can be used to asynchronously open a file in the background. The method is idempotent and it is called as a first step in Attach(). Pages sources may or may not make use of splitting loading and processing meta-data. Therefore, LoadStructure() may do nothing and defer loading the meta-data to Attach().

Definition at line 197 of file RPageStorage.cxx.

◆ LoadStructureImpl()

virtual void ROOT::Experimental::Internal::RPageSource::LoadStructureImpl ( )
protectedpure virtual

◆ operator=() [1/2]

RPageSource & ROOT::Experimental::Internal::RPageSource::operator= ( const RPageSource )
delete

◆ operator=() [2/2]

RPageSource & ROOT::Experimental::Internal::RPageSource::operator= ( RPageSource &&  )
delete

◆ PrepareLoadCluster()

void ROOT::Experimental::Internal::RPageSource::PrepareLoadCluster ( const RCluster::RKey clusterKey,
ROnDiskPageMap pageZeroMap,
std::function< void(DescriptorId_t, NTupleSize_t, const RClusterDescriptor::RPageRange::RPageInfo &)>  perPageFunc 
)
protected

Prepare a page range read for the column set in clusterKey.

Specifically, pages referencing the kTypePageZero locator are filled in pageZeroMap; otherwise, perPageFunc is called for each page. This is commonly used as part of LoadClusters() in derived classes.

Definition at line 309 of file RPageStorage.cxx.

◆ SetEntryRange()

void ROOT::Experimental::Internal::RPageSource::SetEntryRange ( const REntryRange range)

Promise to only read from the given entry range.

If set, prevents the cluster pool from reading-ahead beyond the given range. The range needs to be within [0, GetNEntries()).

Definition at line 189 of file RPageStorage.cxx.

◆ UnsealPage() [1/2]

ROOT::Experimental::RResult< ROOT::Experimental::Internal::RPage > ROOT::Experimental::Internal::RPageSource::UnsealPage ( const RSealedPage sealedPage,
const RColumnElementBase element 
)

Definition at line 529 of file RPageStorage.cxx.

◆ UnsealPage() [2/2]

ROOT::Experimental::RResult< ROOT::Experimental::Internal::RPage > ROOT::Experimental::Internal::RPageSource::UnsealPage ( const RSealedPage sealedPage,
const RColumnElementBase element,
RPageAllocator pageAlloc 
)
static

Helper for unstreaming a page.

This is commonly used in derived, concrete page sources. The implementation currently always makes a memory copy, even if the sealed page is uncompressed and in the final memory layout. The optimization of directly mapping pages is left to the concrete page source implementations.

Definition at line 535 of file RPageStorage.cxx.

◆ UnzipCluster()

void ROOT::Experimental::Internal::RPageSource::UnzipCluster ( RCluster cluster)

Parallel decompression and unpacking of the pages in the given cluster.

The unzipped pages are supposed to be preloaded in a page pool attached to the source. The method is triggered by the cluster pool's unzip thread. It is an optional optimization, the method can safely do nothing. In particular, the actual implementation will only run if a task scheduler is set. In practice, a task scheduler is set if implicit multi-threading is turned on.

Definition at line 233 of file RPageStorage.cxx.

◆ UnzipClusterImpl()

void ROOT::Experimental::Internal::RPageSource::UnzipClusterImpl ( RCluster cluster)
protectedvirtual

Definition at line 239 of file RPageStorage.cxx.

◆ UpdateLastUsedCluster()

void ROOT::Experimental::Internal::RPageSource::UpdateLastUsedCluster ( DescriptorId_t  clusterId)
private

Does nothing if fLastUsedCluster == clusterId.

Otherwise, updated fLastUsedCluster and evict unused paged from the page pool of all previous clusters. Must not be called when the descriptor guard is taken.

Definition at line 335 of file RPageStorage.cxx.

Member Data Documentation

◆ fActivePhysicalColumns

RActivePhysicalColumns ROOT::Experimental::Internal::RPageSource::fActivePhysicalColumns
protected

The active columns are implicitly defined by the model fields or views.

Definition at line 687 of file RPageStorage.hxx.

◆ fCounters

std::unique_ptr<RCounters> ROOT::Experimental::Internal::RPageSource::fCounters
protected

Definition at line 683 of file RPageStorage.hxx.

◆ fDescriptor

RNTupleDescriptor ROOT::Experimental::Internal::RPageSource::fDescriptor
private

Definition at line 605 of file RPageStorage.hxx.

◆ fDescriptorLock

std::shared_mutex ROOT::Experimental::Internal::RPageSource::fDescriptorLock
mutableprivate

Definition at line 606 of file RPageStorage.hxx.

◆ fEntryRange

REntryRange ROOT::Experimental::Internal::RPageSource::fEntryRange
private

Used by the cluster pool to prevent reading beyond the given range.

Definition at line 607 of file RPageStorage.hxx.

◆ fHasStructure

bool ROOT::Experimental::Internal::RPageSource::fHasStructure = false
private

Set to true once LoadStructure() is called.

Definition at line 608 of file RPageStorage.hxx.

◆ fIsAttached

bool ROOT::Experimental::Internal::RPageSource::fIsAttached = false
private

Set to true once Attach() is called.

Definition at line 609 of file RPageStorage.hxx.

◆ fLastUsedCluster

DescriptorId_t ROOT::Experimental::Internal::RPageSource::fLastUsedCluster = kInvalidDescriptorId
private

Remembers the last cluster id from which a page was requested.

Definition at line 612 of file RPageStorage.hxx.

◆ fOptions

RNTupleReadOptions ROOT::Experimental::Internal::RPageSource::fOptions
protected

Definition at line 685 of file RPageStorage.hxx.

◆ fPagePool

RPagePool ROOT::Experimental::Internal::RPageSource::fPagePool
protected

Pages that are unzipped with IMT are staged into the page pool.

Definition at line 690 of file RPageStorage.hxx.

◆ fPreloadedClusters

std::map<NTupleSize_t, DescriptorId_t> ROOT::Experimental::Internal::RPageSource::fPreloadedClusters
private

Clusters from where pages got preloaded in UnzipClusterImpl(), ordered by first entry number of the clusters.

If the last used cluster changes in LoadPage(), all unused pages from previous clusters are evicted from the page pool.

Definition at line 616 of file RPageStorage.hxx.

Libraries for ROOT::Experimental::Internal::RPageSource:

The documentation for this class was generated from the following files: