Managed a set of clusters containing compressed and packed pages.
The cluster pool steers the preloading of (partial) clusters. There is a two-step pipeline: in a first step, compressed pages are read from clusters into a memory buffer. The second pipeline step decompresses the pages and pushes them into the page pool. The actual logic of reading and unzipping is implemented by the page source. The cluster pool only orchestrates the work queues for reading and unzipping. It uses one extra I/O thread for reading waits for data from storage and generates no CPU load.
The unzipping step of the pipeline therefore behaves differently depending on whether or not implicit multi-threading is turned on. If it is turned off, i.e. in a single-threaded environment, the cluster pool will only read the compressed pages and the page source has to uncompresses pages at a later point when data from the page is requested.
Definition at line 54 of file RClusterPool.hxx.
Classes | |
struct | RInFlightCluster |
Clusters that are currently being processed by the pipeline. More... | |
struct | RReadItem |
Request to load a subset of the columns of a particular cluster. More... | |
Public Member Functions | |
RClusterPool (const RClusterPool &other)=delete | |
RClusterPool (RPageSource &pageSource) | |
RClusterPool (RPageSource &pageSource, unsigned int clusterBunchSize) | |
~RClusterPool () | |
RCluster * | GetCluster (DescriptorId_t clusterId, const RCluster::ColumnSet_t &physicalColumns) |
Returns the requested cluster either from the pool or, in case of a cache miss, lets the I/O thread load the cluster in the pool, blocks until done, and then returns it. | |
RClusterPool & | operator= (const RClusterPool &other)=delete |
void | WaitForInFlightClusters () |
Used by the unit tests to drain the queue of clusters to be preloaded. | |
Static Public Attributes | |
static constexpr unsigned int | kDefaultClusterBunchSize = 1 |
Private Member Functions | |
void | ExecReadClusters () |
The I/O thread routine, there is exactly one I/O thread in-flight for every cluster pool. | |
size_t | FindFreeSlot () const |
Returns an index of an unused element in fPool; callers of this function (GetCluster() and WaitFor()) make sure that a free slot actually exists. | |
RCluster * | FindInPool (DescriptorId_t clusterId) const |
Every cluster id has at most one corresponding RCluster pointer in the pool. | |
RCluster * | WaitFor (DescriptorId_t clusterId, const RCluster::ColumnSet_t &physicalColumns) |
Returns the given cluster from the pool, which needs to contain at least the columns physicalColumns . | |
Private Attributes | |
std::int64_t | fBunchId = 0 |
Used as an ever-growing counter in GetCluster() to separate bunches of clusters from each other. | |
unsigned int | fClusterBunchSize |
The number of clusters that are being read in a single vector read. | |
std::condition_variable | fCvHasReadWork |
Signals a non-empty I/O work queue. | |
std::vector< RInFlightCluster > | fInFlightClusters |
The clusters that were handed off to the I/O thread. | |
std::mutex | fLockWorkQueue |
Protects the shared state between the main thread and the I/O thread, namely the work queue and the in-flight clusters vector. | |
RPageSource & | fPageSource |
Every cluster pool is responsible for exactly one page source that triggers loading of the clusters (GetCluster()) and is used for implementing the I/O and cluster memory allocation (PageSource::LoadClusters()). | |
std::vector< std::unique_ptr< RCluster > > | fPool |
The cache of clusters around the currently active cluster. | |
std::deque< RReadItem > | fReadQueue |
The communication channel to the I/O thread. | |
std::thread | fThreadIo |
The I/O thread calls RPageSource::LoadClusters() asynchronously. | |
unsigned int | fWindowPre = 0 |
The number of clusters before the currently active cluster that should stay in the pool if present Reserved for later use. | |
#include <ROOT/RClusterPool.hxx>
ROOT::Experimental::Internal::RClusterPool::RClusterPool | ( | RPageSource & | pageSource, |
unsigned int | clusterBunchSize | ||
) |
Definition at line 51 of file RClusterPool.cxx.
|
inlineexplicit |
Definition at line 126 of file RClusterPool.hxx.
|
delete |
ROOT::Experimental::Internal::RClusterPool::~RClusterPool | ( | ) |
Definition at line 60 of file RClusterPool.cxx.
|
private |
The I/O thread routine, there is exactly one I/O thread in-flight for every cluster pool.
Definition at line 71 of file RClusterPool.cxx.
|
private |
Returns an index of an unused element in fPool; callers of this function (GetCluster() and WaitFor()) make sure that a free slot actually exists.
Definition at line 130 of file RClusterPool.cxx.
|
private |
Every cluster id has at most one corresponding RCluster pointer in the pool.
Definition at line 121 of file RClusterPool.cxx.
ROOT::Experimental::Internal::RCluster * ROOT::Experimental::Internal::RClusterPool::GetCluster | ( | DescriptorId_t | clusterId, |
const RCluster::ColumnSet_t & | physicalColumns | ||
) |
Returns the requested cluster either from the pool or, in case of a cache miss, lets the I/O thread load the cluster in the pool, blocks until done, and then returns it.
Triggers along the way the background loading of the following fWindowPost number of clusters. The returned cluster has at least all the pages of physicalColumns
and possibly pages of other columns, too. If implicit multi-threading is turned on, the uncompressed pages of the returned cluster are already pushed into the page pool associated with the page source upon return. The cluster remains valid until the next call to GetCluster().
Definition at line 198 of file RClusterPool.cxx.
|
delete |
|
private |
Returns the given cluster from the pool, which needs to contain at least the columns physicalColumns
.
Executed at the end of GetCluster when all missing data pieces have been sent to the load queue. Ideally, the function returns without blocking if the cluster is already in the pool.
Definition at line 341 of file RClusterPool.cxx.
void ROOT::Experimental::Internal::RClusterPool::WaitForInFlightClusters | ( | ) |
Used by the unit tests to drain the queue of clusters to be preloaded.
Definition at line 391 of file RClusterPool.cxx.
|
private |
Used as an ever-growing counter in GetCluster() to separate bunches of clusters from each other.
Definition at line 92 of file RClusterPool.hxx.
|
private |
The number of clusters that are being read in a single vector read.
Definition at line 90 of file RClusterPool.hxx.
|
private |
Signals a non-empty I/O work queue.
Definition at line 102 of file RClusterPool.hxx.
|
private |
The clusters that were handed off to the I/O thread.
Definition at line 100 of file RClusterPool.hxx.
|
private |
Protects the shared state between the main thread and the I/O thread, namely the work queue and the in-flight clusters vector.
Definition at line 98 of file RClusterPool.hxx.
|
private |
Every cluster pool is responsible for exactly one page source that triggers loading of the clusters (GetCluster()) and is used for implementing the I/O and cluster memory allocation (PageSource::LoadClusters()).
Definition at line 85 of file RClusterPool.hxx.
|
private |
The cache of clusters around the currently active cluster.
Definition at line 94 of file RClusterPool.hxx.
|
private |
The communication channel to the I/O thread.
Definition at line 104 of file RClusterPool.hxx.
|
private |
The I/O thread calls RPageSource::LoadClusters() asynchronously.
The thread is mostly waiting for the data to arrive (blocked by the kernel) and therefore can safely run in addition to the application main threads.
Definition at line 109 of file RClusterPool.hxx.
|
private |
The number of clusters before the currently active cluster that should stay in the pool if present Reserved for later use.
Definition at line 88 of file RClusterPool.hxx.
|
staticconstexpr |
Definition at line 124 of file RClusterPool.hxx.