Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
TTreeCache.cxx
Go to the documentation of this file.
1// @(#)root/tree:$Id$
2// Author: Rene Brun 04/06/2006
3
4/*************************************************************************
5 * Copyright (C) 1995-2018, Rene Brun and Fons Rademakers. *
6 * All rights reserved. *
7 * *
8 * For the licensing terms see $ROOTSYS/LICENSE. *
9 * For the list of contributors see $ROOTSYS/README/CREDITS. *
10 *************************************************************************/
11
12/** \class TTreeCache
13\ingroup tree
14\brief A cache to speed-up the reading of ROOT datasets
15
16# A cache to speed-up the reading of ROOT datasets
17
18## Table of Contents
19- [Motivation](\ref motivation)
20- [General Description](\ref description)
21- [Changes in behaviour](\ref changesbehaviour)
22- [Self-optimization](\ref cachemisses)
23- [Examples of usage](\ref examples)
24- [Check performance and stats](\ref checkPerf)
25
26\anchor motivation
27## Motivation: why having a cache is needed?
28
29When writing a TTree, the branch buffers are kept in memory.
30A typical branch buffersize (before compression) is typically 32 KBytes.
31After compression, the zipped buffer may be just a few Kbytes.
32The branch buffers cannot be much larger in case of TTrees with several
33hundred or thousand branches.
34
35When writing, this does not generate a performance problem because branch
36buffers are always written sequentially and, thanks to OS optimisations,
37content is flushed to the output file when a few MBytes of data are available.
38On the other hand, when reading, one may hit performance problems because of
39latencies e.g imposed by network.
40For example in a WAN with 10ms latency, reading 1000 buffers of 10 KBytes each
41with no cache will imply 10s penalty where a local read of the 10 MBytes would
42take about 1 second.
43
44The TreeCache tries to prefetch all the buffers for the selected branches
45in order to transfer a few multi-Megabytes large buffers instead of many
46multi-kilobytes small buffers. In addition, TTreeCache can sort the blocks to
47be read in increasing order such that the file is read sequentially.
48
49Systems like xrootd, dCache or httpd take advantage of the TTreeCache in
50reading ahead as much data as they can and return to the application
51the maximum data specified in the cache and have the next chunk of data ready
52when the next request comes.
53
54### Are there cases for which the usage of TTreeCache is detrimental for performance?
55Yes, some corner cases. For example, when reading only a small fraction of all
56entries such that not all branch buffers are read.
57
58\anchor description
59## General Description
60This class acts as a file cache, registering automatically the baskets from
61the branches being processed via direct manipulation of TTrees or with tools
62such as TTree::Draw, TTree::Process, TSelector, TTreeReader and RDataFrame
63when in the learning phase. The learning phase is by default 100 entries.
64It can be changed via TTreeCache::SetLearnEntries.
65
66The usage of a TTreeCache can considerably improve the runtime performance at
67the price of a modest investment in memory, in particular when the TTree is
68accessed remotely, e.g. via a high latency network.
69
70For each TTree being processed a TTreeCache object is created.
71This object is automatically deleted when the Tree is deleted or
72when the file is deleted.
73The user can change the size of the cache with the TTree::SetCacheSize method
74(by default the size is 30 Megabytes). This feature can be controlled with the
75environment variable `ROOT_TTREECACHE_SIZE` or the TTreeCache.Size option.
76The entry range for which the cache is active can also be set with the
77SetEntryRange method.
78
79\anchor changesbehaviour
80## Changes of behavior when using TChain and TEventList
81
82The usage of TChain or TEventList have influence on the behaviour of the cache:
83
84- Special case of a TChain
85 Once the training is done on the first Tree, the list of branches
86 in the cache is kept for the following files.
87
88- Special case of a TEventlist
89 if the Tree or TChain has a TEventlist, only the buffers
90 referenced by the list are put in the cache.
91
92The learning phase is started or restarted when:
93 - TTree automatically creates a cache.
94 - TTree::SetCacheSize is called with a non-zero size and a cache
95 did not previously exist
96 - TTreeCache::StartLearningPhase is called.
97 - TTreeCache::SetEntryRange is called
98 * and the learning is not yet finished
99 * and has not been set to manual
100 * and the new minimun entry is different.
101
102The learning period is stopped (and prefetching is started) when:
103 - TTreeCache::StopLearningPhase is called.
104 - An entry outside the 'learning' range is requested
105 The 'learning range is from fEntryMin (default to 0) to
106 fEntryMin + fgLearnEntries.
107 - A 'cached' TChain switches over to a new file.
108
109
110\anchor cachemisses
111## Self-optimization in presence of cache misses
112
113The TTreeCache can optimize its behavior on a cache miss. When
114miss optimization is enabled (see the SetOptimizeMisses method),
115it tracks all branches utilized after the learning phase which caused a cache
116miss.
117When one cache miss occurs, all the utilized branches are be prefetched
118for that event. This optimization utilizes the observation that infrequently
119accessed branches are often accessed together.
120An example scenario where such behavior is desirable, is an analysis where
121a set of collections are read only for a few events in which a certain
122condition is respected, e.g. a trigger fired.
123
124### Additional memory and CPU usage when optimizing for cache misses
125When this mode is enabled, the memory dedicated to the cache can increase
126by at most a factor two in the case of cache miss.
127Additionally, on the first miss of an event, we must iterate through all the
128"active branches" for the miss cache and find the correct basket.
129This can be potentially a CPU-expensive operation compared to, e.g., the
130latency of a SSD. This is why the miss cache is currently disabled by default.
131
132\anchor examples
133## Example usages of TTreeCache
134
135A few use cases are discussed below. A cache may be created with automatic
136sizing when a TTree is used:
137
138In some applications, e.g. central processing workflows of experiments, the list
139of branches to read is known a priori. For these cases, the TTreeCache can be
140instructed about the branches which will be read via explicit calls to the TTree
141or TTreeCache interfaces.
142In less streamlined applications such as analysis, predicting the branches which
143will be read can be difficult. In such cases, ROOT I/O flags used branches
144automatically when a branch buffer is read during the learning phase.
145
146In the examples below, portions of analysis code are shown.
147The few statements involving the TreeCache are marked with `//<<<`
148
149### ROOT::RDataFrame and TTreeReader Examples
150
151If you use RDataFrame or TTreeReader, the system will automatically cache the
152best set of branches: no action is required by the user.
153
154### TTree::Draw Example
155
156The TreeCache is automatically used by TTree::Draw. The method knows
157which branches are used in the query and it puts automatically these branches
158in the cache. The entry range is also inferred automatically.
159
160### TTree::Process and TSelectors Examples
161
162The user must enable the cache and tell the system which branches to cache
163and also specify the entry range. It is important to specify the entry range
164in case only a subset of the events is processed to avoid wasteful caching.
165
166#### Reading all branches
167
168~~~ {.cpp}
169 TTree *T;
170 f->GetObject(T, "mytree");
171 auto nentries = T->GetEntries();
172 auto cachesize = 10000000U; // 10 MBytes
173 T->SetCacheSize(cachesize); //<<<
174 T->AddBranchToCache("*", true); //<<< add all branches to the cache
175 T->Process("myselector.C+");
176 // In the TSelector::Process function we read all branches
177 T->GetEntry(i);
178 // ... Here the entry is processed
179~~~
180
181#### Reading a subset of all branches
182
183In the Process function we read a subset of the branches.
184Only the branches used in the first entry will be put in the cache
185~~~ {.cpp}
186 TTree *T;
187 f->GetObject(T, "mytree");
188 // We want to process only the 200 first entries
189 auto nentries=200UL;
190 auto efirst = 0;
191 auto elast = efirst+nentries;
192 auto cachesize = 10000000U; // 10 MBytes
193 TTreeCache::SetLearnEntries(1); //<<< we can take the decision after 1 entry
194 T->SetCacheSize(cachesize); //<<<
195 T->SetCacheEntryRange(efirst,elast); //<<<
196 T->Process("myselector.C+","",nentries,efirst);
197 // In the TSelector::Process we read only 2 branches
198 auto b1 = T->GetBranch("branch1");
199 b1->GetEntry(i);
200 if (somecondition) return;
201 auto b2 = T->GetBranch("branch2");
202 b2->GetEntry(i);
203 ... Here the entry is processed
204~~~
205### Custom event loop
206
207#### Always using the same two branches
208
209In this example, exactly two branches are always used: those need to be
210prefetched.
211~~~ {.cpp}
212 TTree *T;
213 f->GetObject(T, "mytree");
214 auto b1 = T->GetBranch("branch1");
215 auto b2 = T->GetBranch("branch2");
216 auto nentries = T->GetEntries();
217 auto cachesize = 10000000U; //10 MBytes
218 T->SetCacheSize(cachesize); //<<<
219 T->AddBranchToCache(b1, true); //<<< add branch1 and branch2 to the cache
220 T->AddBranchToCache(b2, true); //<<<
221 T->StopCacheLearningPhase(); //<<< we do not need the system to guess anything
222 for (auto i : TSeqL(nentries)) {
223 T->LoadTree(i); //<<< important call when calling TBranch::GetEntry after
224 b1->GetEntry(i);
225 if (some condition not met) continue;
226 b2->GetEntry(i);
227 if (some condition not met) continue;
228 // Here we read the full event only in some rare cases.
229 // There is no point in caching the other branches as it might be
230 // more economical to read only the branch buffers really used.
231 T->GetEntry(i);
232 ... Here the entry is processed
233 }
234~~~
235#### Always using at least the same two branches
236
237In this example, two branches are always used: in addition, some analysis
238functions are invoked and those may trigger the reading of other branches which
239are a priori not known.
240There is no point in prefetching branches that will be used very rarely: we can
241rely on the system to cache the right branches.
242~~~ {.cpp}
243 TTree *T;
244 f->GetObject(T, "mytree");
245 auto nentries = T->GetEntries();
246 auto cachesize = 10000000; //10 MBytes
247 T->SetCacheSize(cachesize); //<<<
248 T->SetCacheLearnEntries(5); //<<< we can take the decision after 5 entries
249 auto b1 = T->GetBranch("branch1");
250 auto b2 = T->GetBranch("branch2");
251 for (auto i : TSeqL(nentries)) {
252 T->LoadTree(i);
253 b1->GetEntry(i);
254 if (some condition not met) continue;
255 b2->GetEntry(i);
256 // At this point we may call a user function where a few more branches
257 // will be read conditionally. These branches will be put in the cache
258 // if they have been used in the first 10 entries
259 if (some condition not met) continue;
260 // Here we read the full event only in some rare cases.
261 // There is no point in caching the other branches as it might be
262 // more economical to read only the branch buffers really used.
263 T->GetEntry(i);
264 .. process the rare but interesting cases.
265 ... Here the entry is processed
266 }
267~~~
268
269\anchor checkPerf
270## How can the usage and performance of TTreeCache be verified?
271
272Once the event loop terminated, the number of effective system reads for a
273given file can be checked with a code like the following:
274~~~ {.cpp}
275 printf("Reading %lld bytes in %d transactions\n",myTFilePtr->GetBytesRead(), f->GetReadCalls());
276~~~
277
278Another handy command is:
279~~~ {.cpp}
280myTreeOrChain.GetTree()->PrintCacheStats();
281~~~
282
283*/
284
285#include "TSystem.h"
286#include "TEnv.h"
287#include "TTreeCache.h"
288#include "TChain.h"
289#include "TList.h"
290#include "TBranch.h"
291#include "TBranchElement.h"
292#include "TEventList.h"
293#include "TObjArray.h"
294#include "TObjString.h"
295#include "TRegexp.h"
296#include "TLeaf.h"
297#include "TFriendElement.h"
298#include "TFile.h"
299#include "TMath.h"
300#include "TBranchCacheInfo.h"
301#include "TVirtualPerfStats.h"
302#include <climits>
303
304#include <memory>
305
307
308
309////////////////////////////////////////////////////////////////////////////////
310/// Default Constructor.
311
312TTreeCache::TTreeCache() : TFileCacheRead(), fPrefillType(GetConfiguredPrefillType())
313{
314}
315
316////////////////////////////////////////////////////////////////////////////////
317/// Constructor.
318
320 : TFileCacheRead(tree->GetCurrentFile(), bufsize, tree), fEntryMax(tree->GetEntriesFast()), fEntryNext(0),
321 fBrNames(new TList), fTree(tree), fPrefillType(GetConfiguredPrefillType())
322{
324 Int_t nleaves = tree->GetListOfLeaves()->GetEntriesFast();
326}
327
328////////////////////////////////////////////////////////////////////////////////
329/// Destructor. (in general called by the TFile destructor)
330
332{
333 // Informe the TFile that we have been deleted (in case
334 // we are deleted explicitly by legacy user code).
335 if (fFile) fFile->SetCacheRead(nullptr, fTree);
336
337 delete fBranches;
338 if (fBrNames) {fBrNames->Delete(); delete fBrNames; fBrNames=nullptr;}
339}
340
341////////////////////////////////////////////////////////////////////////////////
342/// Add a branch discovered by actual usage to the list of branches to be stored
343/// in the cache this function is called by TBranch::GetBasket
344/// If we are not longer in the training phase this is an error.
345/// Returns:
346/// - 0 branch added or already included
347/// - -1 on error
348
350{
351 if (!fIsLearning) {
352 return -1;
353 }
354
355 // Reject branch that are not from the cached tree.
356 if (!b || fTree->GetTree() != b->GetTree()) return -1;
357
358 // Is this the first addition of a branch (and we are learning and we are in
359 // the expected TTree), then prefill the cache. (We expect that in future
360 // release the Prefill-ing will be the default so we test for that inside the
361 // LearnPrefill call).
363
364 return AddBranch(b, subbranches);
365}
366
367////////////////////////////////////////////////////////////////////////////////
368/// Add a branch to the list of branches to be stored in the cache
369/// this function is called by the user via TTree::AddBranchToCache.
370/// The branch is added even if we are outside of the training phase.
371/// Returns:
372/// - 0 branch added or already included
373/// - -1 on error
374
376{
377 // Reject branch that are not from the cached tree.
378 if (!b || fTree->GetTree() != b->GetTree()) return -1;
379
380 //Is branch already in the cache?
381 bool isNew = true;
382 for (int i=0;i<fNbranches;i++) {
383 if (fBranches->UncheckedAt(i) == b) {isNew = false; break;}
384 }
385 if (isNew) {
386 fTree = b->GetTree();
388 const char *bname = b->GetName();
389 if (fTree->IsA() == TChain::Class()) {
390 // If we have a TChain, we will need to use the branch name
391 // and we better disambiguate them (see atlasFlushed.root for example)
392 // in order to cache all the requested branches.
393 // We do not do this all the time as GetMother is slow (it contains
394 // a linear search from list of top level branch).
395 TString build;
396 const char *mothername = b->GetMother()->GetName();
397 if (b != b->GetMother() && mothername[strlen(mothername)-1] != '.') {
398 // Maybe we ought to prefix the name to avoid ambiguity.
399 auto bem = dynamic_cast<TBranchElement*>(b->GetMother());
400 if (bem->GetType() < 3) {
401 // Not a collection.
402 build = mothername;
403 build.Append(".");
404 if (strncmp(bname,build.Data(),build.Length()) != 0) {
405 build.Append(bname);
406 bname = build.Data();
407 }
408 }
409 }
410 }
411 fBrNames->Add(new TObjString(bname));
412 fNbranches++;
413 if (gDebug > 0) printf("Entry: %lld, registering branch: %s\n",b->GetTree()->GetReadEntry(),b->GetName());
414 }
415
416 // process subbranches
417 Int_t res = 0;
418 if (subbranches) {
419 TObjArray *lb = b->GetListOfBranches();
420 Int_t nb = lb->GetEntriesFast();
421 for (Int_t j = 0; j < nb; j++) {
422 TBranch* branch = (TBranch*) lb->UncheckedAt(j);
423 if (!branch) continue;
424 if (AddBranch(branch, subbranches)<0) {
425 res = -1;
426 }
427 }
428 }
429 return res;
430}
431
432////////////////////////////////////////////////////////////////////////////////
433/// Add a branch to the list of branches to be stored in the cache
434/// this is to be used by user (thats why we pass the name of the branch).
435/// It works in exactly the same way as TTree::SetBranchStatus so you
436/// probably want to look over there for details about the use of bname
437/// with regular expressions.
438/// The branches are taken with respect to the Owner of this TTreeCache
439/// (i.e. the original Tree)
440/// NB: if bname="*" all branches are put in the cache and the learning phase stopped
441/// Returns:
442/// - 0 branch added or already included
443/// - -1 on error
444
445Int_t TTreeCache::AddBranch(const char *bname, bool subbranches /*= false*/)
446{
449
450 Int_t i;
451 Int_t nleaves = (fTree->GetListOfLeaves())->GetEntriesFast();
452 TRegexp re(bname,true);
453 Int_t nb = 0;
454 Int_t res = 0;
455
456 // first pass, loop on all branches
457 // for leafcount branches activate/deactivate in function of status
458 bool all = false;
459 if (!strcmp(bname,"*")) all = true;
460 for (i=0;i<nleaves;i++) {
461 leaf = (TLeaf*)(fTree->GetListOfLeaves())->UncheckedAt(i);
462 branch = (TBranch*)leaf->GetBranch();
463 TString s = branch->GetName();
464 if (!all) { //Regexp gives wrong result for [] in name
466 longname.Form("%s.%s",fTree->GetName(),branch->GetName());
467 if (strcmp(bname,branch->GetName())
468 && longname != bname
469 && s.Index(re) == kNPOS) continue;
470 }
471 nb++;
472 if (AddBranch(branch, subbranches)<0) {
473 res = -1;
474 }
475 leafcount = leaf->GetLeafCount();
476 if (leafcount && !all) {
477 bcount = leafcount->GetBranch();
478 if (AddBranch(bcount, subbranches)<0) {
479 res = -1;
480 }
481 }
482 }
483 if (nb==0 && strchr(bname,'*')==nullptr) {
484 branch = fTree->GetBranch(bname);
485 if (branch) {
486 if (AddBranch(branch, subbranches)<0) {
487 res = -1;
488 }
489 ++nb;
490 }
491 }
492
493 //search in list of friends
495 if (fTree->GetListOfFriends()) {
499 while ((fe = (TFriendElement*)nextf())) {
500 TTree *t = fe->GetTree();
501 if (t==nullptr) continue;
502
503 // If the alias is present replace it with the real name.
504 char *subbranch = (char*)strstr(bname,fe->GetName());
505 if (subbranch!=bname) subbranch = nullptr;
506 if (subbranch) {
507 subbranch += strlen(fe->GetName());
508 if ( *subbranch != '.' ) subbranch = nullptr;
509 else subbranch ++;
510 }
511 if (subbranch) {
512 name.Form("%s.%s",t->GetName(),subbranch);
513 if (name != bname && AddBranch(name, subbranches)<0) {
514 res = -1;
515 }
517 }
518 }
519 }
520 if (!nb && !foundInFriend) {
521 if (gDebug > 0) printf("AddBranch: unknown branch -> %s \n", bname);
522 Error("AddBranch", "unknown branch -> %s", bname);
523 return -1;
524 }
525 //if all branches are selected stop the learning phase
526 if (*bname == '*') {
527 fEntryNext = -1; // We are likely to have change the set of branches, so for the [re-]reading of the cluster.
529 }
530 return res;
531}
532
533////////////////////////////////////////////////////////////////////////////////
534/// Remove a branch to the list of branches to be stored in the cache
535/// this function is called by TBranch::GetBasket.
536/// Returns:
537/// - 0 branch dropped or not in cache
538/// - -1 on error
539
541{
542 if (!fIsLearning) {
543 return -1;
544 }
545
546 // Reject branch that are not from the cached tree.
547 if (!b || fTree->GetTree() != b->GetTree()) return -1;
548
549 //Is branch already in the cache?
550 if (fBranches->Remove(b)) {
551 --fNbranches;
552 if (gDebug > 0) printf("Entry: %lld, un-registering branch: %s\n",b->GetTree()->GetReadEntry(),b->GetName());
553 }
554 delete fBrNames->Remove(fBrNames->FindObject(b->GetName()));
555
556 // process subbranches
557 Int_t res = 0;
558 if (subbranches) {
559 TObjArray *lb = b->GetListOfBranches();
560 Int_t nb = lb->GetEntriesFast();
561 for (Int_t j = 0; j < nb; j++) {
562 TBranch* branch = (TBranch*) lb->UncheckedAt(j);
563 if (!branch) continue;
564 if (DropBranch(branch, subbranches)<0) {
565 res = -1;
566 }
567 }
568 }
569 return res;
570}
571
572////////////////////////////////////////////////////////////////////////////////
573/// Remove a branch to the list of branches to be stored in the cache
574/// this is to be used by user (thats why we pass the name of the branch).
575/// It works in exactly the same way as TTree::SetBranchStatus so you
576/// probably want to look over there for details about the use of bname
577/// with regular expressions.
578/// The branches are taken with respect to the Owner of this TTreeCache
579/// (i.e. the original Tree)
580/// NB: if bname="*" all branches are put in the cache and the learning phase stopped
581/// Returns:
582/// - 0 branch dropped or not in cache
583/// - -1 on error
584
585Int_t TTreeCache::DropBranch(const char *bname, bool subbranches /*= false*/)
586{
589
590 Int_t i;
591 Int_t nleaves = (fTree->GetListOfLeaves())->GetEntriesFast();
592 TRegexp re(bname,true);
593 Int_t nb = 0;
594 Int_t res = 0;
595
596 // first pass, loop on all branches
597 // for leafcount branches activate/deactivate in function of status
598 bool all = false;
599 if (!strcmp(bname,"*")) all = true;
600 for (i=0;i<nleaves;i++) {
601 leaf = (TLeaf*)(fTree->GetListOfLeaves())->UncheckedAt(i);
602 branch = (TBranch*)leaf->GetBranch();
603 TString s = branch->GetName();
604 if (!all) { //Regexp gives wrong result for [] in name
606 longname.Form("%s.%s",fTree->GetName(),branch->GetName());
607 if (strcmp(bname,branch->GetName())
608 && longname != bname
609 && s.Index(re) == kNPOS) continue;
610 }
611 nb++;
612 if (DropBranch(branch, subbranches)<0) {
613 res = -1;
614 }
615 leafcount = leaf->GetLeafCount();
616 if (leafcount && !all) {
617 bcount = leafcount->GetBranch();
618 if (DropBranch(bcount, subbranches)<0) {
619 res = -1;
620 }
621 }
622 }
623 if (nb==0 && strchr(bname,'*')==nullptr) {
624 branch = fTree->GetBranch(bname);
625 if (branch) {
626 if (DropBranch(branch, subbranches)<0) {
627 res = -1;
628 }
629 ++nb;
630 }
631 }
632
633 //search in list of friends
635 if (fTree->GetListOfFriends()) {
639 while ((fe = (TFriendElement*)nextf())) {
640 TTree *t = fe->GetTree();
641 if (t==nullptr) continue;
642
643 // If the alias is present replace it with the real name.
644 char *subbranch = (char*)strstr(bname,fe->GetName());
645 if (subbranch!=bname) subbranch = nullptr;
646 if (subbranch) {
647 subbranch += strlen(fe->GetName());
648 if ( *subbranch != '.' ) subbranch = nullptr;
649 else subbranch ++;
650 }
651 if (subbranch) {
652 name.Form("%s.%s",t->GetName(),subbranch);
653 if (DropBranch(name, subbranches)<0) {
654 res = -1;
655 }
657 }
658 }
659 }
660 if (!nb && !foundInFriend) {
661 if (gDebug > 0) printf("DropBranch: unknown branch -> %s \n", bname);
662 Error("DropBranch", "unknown branch -> %s", bname);
663 return -1;
664 }
665 //if all branches are selected stop the learning phase
666 if (*bname == '*') {
667 fEntryNext = -1; // We are likely to have change the set of branches, so for the [re-]reading of the cluster.
668 }
669 return res;
670}
671
672////////////////////////////////////////////////////////////////////////////////
673/// Start of methods for the miss cache.
674////////////////////////////////////////////////////////////////////////////////
675
676////////////////////////////////////////////////////////////////////////////////
677/// Enable / disable the miss cache.
678///
679/// The first time this is called on a TTreeCache object, the corresponding
680/// data structures will be allocated. Subsequent enable / disables will
681/// simply turn the functionality on/off.
683{
684
685 if (opt && !fMissCache) {
687 }
688 fOptimizeMisses = opt;
689}
690
691////////////////////////////////////////////////////////////////////////////////
692/// Reset all the miss cache training.
693///
694/// The contents of the miss cache will be emptied as well as the list of
695/// branches used.
697{
698
699 fLastMiss = -1;
700 fFirstMiss = -1;
701
702 if (!fMissCache) {
703 fMissCache = std::make_unique<MissCache>();
704 }
705 fMissCache->clear();
706}
707
708////////////////////////////////////////////////////////////////////////////////
709/// For the event currently being fetched into the miss cache, find the IO
710/// (offset / length tuple) to pull in the current basket for a given branch.
711///
712/// Returns:
713/// - IOPos describing the IO operation necessary for the basket on this branch
714/// - On failure, IOPos.length will be set to 0.
716{
717 if (R__unlikely(b.GetDirectory() == nullptr)) {
718 // printf("Branch at %p has no valid directory.\n", &b);
719 return IOPos{0, 0};
720 }
721 if (R__unlikely(b.GetDirectory()->GetFile() != fFile)) {
722 // printf("Branch at %p is in wrong file (branch file %p, my file %p).\n", &b, b.GetDirectory()->GetFile(),
723 // fFile);
724 return IOPos{0, 0};
725 }
726
727 // printf("Trying to find a basket for branch %p\n", &b);
728 // Pull in metadata about branch; make sure it is valid
729 Int_t *lbaskets = b.GetBasketBytes();
730 Long64_t *entries = b.GetBasketEntry();
731 if (R__unlikely(!lbaskets || !entries)) {
732 // printf("No baskets or entries.\n");
733 return IOPos{0, 0};
734 }
735 // Int_t blistsize = b.GetListOfBaskets()->GetSize();
736 Int_t blistsize = b.GetWriteBasket();
737 if (R__unlikely(blistsize <= 0)) {
738 // printf("Basket list is size 0.\n");
739 return IOPos{0, 0};
740 }
741
742 // Search for the basket that contains the event of interest. Unlike the primary cache, we
743 // are only interested in a single basket per branch - we don't try to fill the cache.
745 if (basketOffset < 0) { // No entry found.
746 // printf("No entry offset found for entry %ld\n", fTree->GetReadEntry());
747 return IOPos{0, 0};
748 }
749
750 // Check to see if there's already a copy of this basket in memory. If so, don't fetch it
751 if ((basketOffset < blistsize) && b.GetListOfBaskets()->UncheckedAt(basketOffset)) {
752
753 // printf("Basket is already in memory.\n");
754 return IOPos{0, 0};
755 }
756
757 Long64_t pos = b.GetBasketSeek(basketOffset);
759 if (R__unlikely(pos <= 0 || len <= 0)) {
760 /*printf("Basket returned was invalid (basketOffset=%ld, pos=%ld, len=%d).\n", basketOffset, pos, len);
761 for (int idx=0; idx<blistsize; idx++) {
762 printf("Basket entry %d, first event %d, pos %ld\n", idx, entries[idx], b.GetBasketSeek(idx));
763 }*/
764 return IOPos{0, 0};
765 } // Sanity check
766 // Do not cache a basket if it is bigger than the cache size!
768 // printf("Basket size is greater than the cache size.\n");
769 return IOPos{0, 0};
770 }
771
772 return {pos, len};
773}
774
775////////////////////////////////////////////////////////////////////////////////
776/// Given a particular IO description (offset / length) representing a 'miss' of
777/// the TTreeCache's primary cache, calculate all the corresponding IO that
778/// should be performed.
779///
780/// `all` indicates that this function should search the set of _all_ branches
781/// in this TTree. When set to false, we only search through branches that
782/// have previously incurred a miss.
783///
784/// Returns:
785/// - TBranch pointer corresponding to the basket that will be retrieved by
786/// this IO operation.
787/// - If no corresponding branch could be found (or an error occurs), this
788/// returns nullptr.
790{
791 if (R__unlikely((pos < 0) || (len < 0))) {
792 return nullptr;
793 }
794
795 int count = all ? (fTree->GetListOfLeaves())->GetEntriesFast() : fMissCache->fBranches.size();
796 fMissCache->fEntries.reserve(count);
797 fMissCache->fEntries.clear();
798 bool found_request = false;
799 TBranch *resultBranch = nullptr;
801
802 std::vector<std::pair<size_t, Int_t>> basketsInfo;
803 auto perfStats = GetTree()->GetPerfStats();
804
805 // printf("Will search %d branches for basket at %ld.\n", count, pos);
806 for (int i = 0; i < count; i++) {
807 TBranch *b =
808 all ? static_cast<TBranch *>(static_cast<TLeaf *>((fTree->GetListOfLeaves())->UncheckedAt(i))->GetBranch())
809 : fMissCache->fBranches[i];
811 if (iopos.fLen == 0) { // Error indicator
812 continue;
813 }
814 if (iopos.fPos == pos && iopos.fLen == len) {
815 found_request = true;
816 resultBranch = b;
817 // Note that we continue to iterate; fills up the rest of the entries in the cache.
818 }
819 // At this point, we are ready to push back a new offset
820 fMissCache->fEntries.emplace_back(iopos);
821
822 if (R__unlikely(perfStats)) {
823 Int_t blistsize = b->GetWriteBasket();
824 Int_t basketNumber = -1;
825 for (Int_t bn = 0; bn < blistsize; ++bn) {
826 if (iopos.fPos == b->GetBasketSeek(bn)) {
828 break;
829 }
830 }
831 if (basketNumber >= 0)
832 basketsInfo.emplace_back((size_t)i, basketNumber);
833 }
834 }
836 // We have gone through all the branches in this file and the requested basket
837 // doesn't appear to be in any of them. Likely a logic error / bug.
838 fMissCache->fEntries.clear();
839 }
840 if (R__unlikely(perfStats)) {
841 for (auto &info : basketsInfo) {
842 perfStats->SetLoadedMiss(info.first, info.second);
843 }
844 }
845 return resultBranch;
846}
847
848////////////////////////////////////////////////////////////////////////////////
849///
850/// Process a cache miss; (pos, len) isn't in the buffer.
851///
852/// The first time we have a miss, we buffer as many baskets we can (up to the
853/// maximum size of the TTreeCache) in memory from all branches that are not in
854/// the prefetch list.
855///
856/// Subsequent times, we fetch all the buffers corresponding to branches that
857/// had previously seen misses. If it turns out the (pos, len) isn't in the
858/// list of branches, we treat this as if it was the first miss.
859///
860/// Returns true if we were able to pull the data into the miss cache.
861///
863{
864
865 bool firstMiss = false;
866 if (fFirstMiss == -1) {
868 firstMiss = true;
869 }
871 // The first time this is executed, we try to pull in as much data as we can.
873 if (!b) {
874 if (!firstMiss) {
875 // TODO: this recalculates for *all* branches, throwing away the above work.
876 b = CalculateMissEntries(pos, len, true);
877 }
878 if (!b) {
879 // printf("ProcessMiss: pos %ld does not appear to correspond to a buffer in this file.\n", pos);
880 // We have gone through all the branches in this file and the requested basket
881 // doesn't appear to be in any of them. Likely a logic error / bug.
882 fMissCache->fEntries.clear();
883 return false;
884 }
885 }
886 // TODO: this should be a set.
887 fMissCache->fBranches.push_back(b);
888
889 // OK, sort the entries
890 std::sort(fMissCache->fEntries.begin(), fMissCache->fEntries.end());
891
892 // Now, fetch the buffer.
893 std::vector<Long64_t> positions;
894 positions.reserve(fMissCache->fEntries.size());
895 std::vector<Int_t> lengths;
896 lengths.reserve(fMissCache->fEntries.size());
898 for (auto &mcentry : fMissCache->fEntries) {
899 positions.push_back(mcentry.fIO.fPos);
900 lengths.push_back(mcentry.fIO.fLen);
901 mcentry.fIndex = cumulative;
902 cumulative += mcentry.fIO.fLen;
903 }
904 fMissCache->fData.resize(cumulative);
905 // printf("Reading %lu bytes into miss cache for %lu entries.\n", cumulative, fEntries->size());
906 fNMissReadPref += fMissCache->fEntries.size();
907 fFile->ReadBuffers(&(fMissCache->fData[0]), &(positions[0]), &(lengths[0]), fMissCache->fEntries.size());
909
910 return true;
911}
912
913////////////////////////////////////////////////////////////////////////////////
914/// Given an IO operation (pos, len) that was a cache miss in the primary TTC,
915/// try the operation again with the miss cache.
916///
917/// Returns true if the IO operation was successful and the contents of buf
918/// were populated with the requested data.
919///
920bool TTreeCache::CheckMissCache(char *buf, Long64_t pos, int len)
921{
922
923 if (!fOptimizeMisses) {
924 return false;
925 }
926 if (R__unlikely((pos < 0) || (len < 0))) {
927 return false;
928 }
929
930 // printf("Checking the miss cache for offset=%ld, length=%d\n", pos, len);
931
932 // First, binary search to see if the desired basket is already cached.
934 auto iter = std::lower_bound(fMissCache->fEntries.begin(), fMissCache->fEntries.end(), mcentry);
935
936 if (iter != fMissCache->fEntries.end()) {
937 if (len > iter->fIO.fLen) {
939 return false;
940 }
941 auto offset = iter->fIndex;
942 memcpy(buf, &(fMissCache->fData[offset]), len);
943 // printf("Returning data from pos=%ld in miss cache.\n", offset);
944 ++fNMissReadOk;
945 return true;
946 }
947
948 // printf("Data not in miss cache.\n");
949
950 // Update the cache, looking for this (pos, len).
951 if (!ProcessMiss(pos, len)) {
952 // printf("Unable to pull data into miss cache.\n");
954 return false;
955 }
956
957 // OK, we updated the cache with as much information as possible. Search again for
958 // the entry we want.
959 iter = std::lower_bound(fMissCache->fEntries.begin(), fMissCache->fEntries.end(), mcentry);
960
961 if (iter != fMissCache->fEntries.end()) {
962 auto offset = iter->fIndex;
963 // printf("Expecting data at offset %ld in miss cache.\n", offset);
964 memcpy(buf, &(fMissCache->fData[offset]), len);
965 ++fNMissReadOk;
966 return true;
967 }
968
969 // This must be a logic bug. ProcessMiss should return false if (pos, len)
970 // wasn't put into fEntries.
972 return false;
973}
974
975////////////////////////////////////////////////////////////////////////////////
976/// End of methods for miss cache.
977////////////////////////////////////////////////////////////////////////////////
978
979namespace {
980struct BasketRanges {
981 struct Range {
982 Long64_t fMin; ///< Inclusive minimum
983 Long64_t fMax; ///< Inclusive maximum
984
985 Range() : fMin(-1), fMax(-1) {}
986
987 void UpdateMin(Long64_t min)
988 {
989 if (fMin == -1 || min < fMin)
990 fMin = min;
991 }
992
993 void UpdateMax(Long64_t max)
994 {
995 if (fMax == -1 || fMax < max)
996 fMax = max;
997 }
998
999 bool Contains(Long64_t entry) { return (fMin <= entry && entry <= fMax); }
1000 };
1001
1002 std::vector<Range> fRanges;
1003 std::map<Long64_t,size_t> fMinimums;
1004 std::map<Long64_t,size_t> fMaximums;
1005
1006 BasketRanges(size_t nBranches) { fRanges.resize(nBranches); }
1007
1008 void Update(size_t branchNumber, Long64_t min, Long64_t max)
1009 {
1010 Range &range = fRanges.at(branchNumber);
1011 auto old(range);
1012
1013 range.UpdateMin(min);
1014 range.UpdateMax(max);
1015
1016 if (old.fMax != range.fMax) {
1017 if (old.fMax != -1) {
1018 auto maxIter = fMaximums.find(old.fMax);
1019 if (maxIter != fMaximums.end()) {
1020 if (maxIter->second == 1) {
1021 fMaximums.erase(maxIter);
1022 } else {
1023 --(maxIter->second);
1024 }
1025 }
1026 }
1027 ++(fMaximums[max]);
1028 }
1029 }
1030
1031 void Update(size_t branchNumber, size_t basketNumber, Long64_t *entries, size_t nb, size_t max)
1032 {
1033 Update(branchNumber, entries[basketNumber],
1034 (basketNumber < (nb - 1)) ? (entries[basketNumber + 1] - 1) : max - 1);
1035 }
1036
1037 // Check that fMaximums and fMinimums are properly set
1039 {
1040 Range result;
1041 for (const auto &r : fRanges) {
1042 if (result.fMin == -1 || result.fMin < r.fMin) {
1043 if (r.fMin != -1)
1044 result.fMin = r.fMin;
1045 }
1046 if (result.fMax == -1 || r.fMax < result.fMax) {
1047 if (r.fMax != -1)
1048 result.fMax = r.fMax;
1049 }
1050 }
1051 // if (result.fMax < result.fMin) {
1052 // // No overlapping range.
1053 // }
1054
1056
1057 return (result.fMin == allIncludedRange.fMin && result.fMax == allIncludedRange.fMax);
1058 }
1059
1060 // This returns a Range object where fMin is the maximum of all the minimun entry
1061 // number loaded for each branch and fMax is the minimum of all the maximum entry
1062 // number loaded for each branch.
1063 // As such it is valid to have fMin > fMax, this is the case where there
1064 // are no overlap between the branch's range. For example for 2 branches
1065 // where we have for one the entry [50,99] and for the other [0,49] then
1066 // we will have fMin = max(50,0) = 50 and fMax = min(99,49) = 49
1068 {
1069 Range result;
1070 if (!fMinimums.empty())
1071 result.fMin = fMinimums.rbegin()->first;
1072 if (!fMaximums.empty())
1073 result.fMax = fMaximums.begin()->first;
1074 return result;
1075 }
1076
1077 // Returns the number of branches with at least one baskets registered.
1079 {
1080 UInt_t result = 0;
1081 for (const auto &r : fRanges) {
1082 if (r.fMin != -1 && r.fMax != -1)
1083 ++result;
1084 }
1085 return result;
1086 }
1087
1088 // Returns true if at least one of the branch's range contains
1089 // the entry.
1090 bool Contains(Long64_t entry)
1091 {
1092 for (const auto &r : fRanges) {
1093 if (r.fMin != -1 && r.fMax != -1)
1094 if (r.fMin <= entry && entry <= r.fMax)
1095 return true;
1096 }
1097 return false;
1098 }
1099
1100 void Print()
1101 {
1102 for (size_t i = 0; i < fRanges.size(); ++i) {
1103 if (fRanges[i].fMin != -1 || fRanges[i].fMax != -1)
1104 Printf("Range #%zu : %lld to %lld", i, fRanges[i].fMin, fRanges[i].fMax);
1105 }
1106 }
1107};
1108} // Anonymous namespace.
1109
1110////////////////////////////////////////////////////////////////////////////////
1111/// Fill the cache buffer with the branches in the cache.
1112
1114{
1115
1116 if (fNbranches <= 0) return false;
1117 TTree *tree = ((TBranch*)fBranches->UncheckedAt(0))->GetTree();
1118 Long64_t entry = tree->GetReadEntry();
1120
1121 if (entry != -1 && (entry < fEntryMin || fEntryMax < entry))
1122 return false;
1123
1124 if (fEnablePrefetching) { // Prefetching mode
1125 if (fIsLearning) { // Learning mode
1126 if (fEntryNext >= 0 && entry >= fEntryNext) {
1127 // entry is outside the learn range, need to stop the learning
1128 // phase. Doing so may trigger a recursive call to FillBuffer in
1129 // the process of filling both prefetching buffers
1131 fIsManual = false;
1132 }
1133 }
1134 if (fIsLearning) { // Learning mode
1135 // The learning phase should start from the minimum entry in the cache
1136 entry = fEntryMin;
1137 }
1138 if (fFirstTime) {
1139 //try to detect if it is normal or reverse read
1141 }
1142 else {
1143 if (fFirstEntry == entry) return false;
1144 // Set the read direction
1145 if (!fReadDirectionSet) {
1146 if (entry < fFirstEntry) {
1147 fReverseRead = true;
1148 fReadDirectionSet = true;
1149 }
1150 else if (entry > fFirstEntry) {
1151 fReverseRead =false;
1152 fReadDirectionSet = true;
1153 }
1154 }
1155
1156 if (fReverseRead) {
1157 // Reverse reading with prefetching
1158 if (fEntryCurrent >0 && entry < fEntryNext) {
1159 // We can prefetch the next buffer
1160 if (entry >= fEntryCurrent) {
1161 entry = fEntryCurrent - tree->GetAutoFlush() * fFillTimes;
1162 }
1163 if (entry < 0) entry = 0;
1164 }
1165 else if (fEntryCurrent >= 0) {
1166 // We are still reading from the oldest buffer, no need to prefetch a new one
1167 return false;
1168 }
1169 if (entry < 0) return false;
1171 }
1172 else {
1173 // Normal reading with prefetching
1174 if (fEnablePrefetching) {
1177 } else if (entry >= fEntryCurrent) {
1178 if (entry < fEntryNext) {
1179 entry = fEntryNext;
1180 }
1181 }
1182 else {
1183 // We are still reading from the oldest buffer,
1184 // no need to prefetch a new one
1185 return false;
1186 }
1188 }
1189 }
1190 }
1191 }
1192
1193 // Set to true to enable all debug output without having to set gDebug
1194 // Replace this once we have a per module and/or per class debugging level/setting.
1195 static constexpr bool showMore = false;
1196
1197 static const auto PrintAllCacheInfo = [](TObjArray *branches) {
1198 for (Int_t i = 0; i < branches->GetEntries(); i++) {
1199 TBranch *b = (TBranch *)branches->UncheckedAt(i);
1200 b->PrintCacheInfo();
1201 }
1202 };
1203
1204 if (showMore || gDebug > 6)
1205 Info("FillBuffer", "***** Called for entry %lld", entry);
1206
1208 // Check if all the basket in the cache have already be used and
1209 // thus we can reuse the cache.
1210 bool allUsed = true;
1211 for (Int_t i = 0; i < fNbranches; ++i) {
1213 if (!b->fCacheInfo.AllUsed()) {
1214 allUsed = false;
1215 break;
1216 }
1217 }
1218 if (allUsed) {
1219 fEntryNext = entry;
1220 if (showMore || gDebug > 5)
1221 Info("FillBuffer", "All baskets used already, so refresh the cache early at entry %lld", entry);
1222 }
1223 if (gDebug > 8)
1225 }
1226
1227 // If the entry is in the range we previously prefetched, there is
1228 // no point in retrying. Note that this will also return false
1229 // during the training phase (fEntryNext is then set intentional to
1230 // the end of the training phase).
1231 if (fEntryCurrent <= entry && entry < fEntryNext) return false;
1232
1233 // Triggered by the user, not the learning phase
1234 if (entry == -1)
1235 entry = 0;
1236
1237 bool resetBranchInfo = false;
1239 // We are moving on to another set of clusters.
1240 resetBranchInfo = true;
1241 if (showMore || gDebug > 6)
1242 Info("FillBuffer", "*** Will reset the branch information about baskets");
1243 } else if (showMore || gDebug > 6) {
1244 Info("FillBuffer", "*** Info we have on the set of baskets");
1246 }
1247
1249 TTree::TClusterIterator clusterIter = tree->GetClusterIterator(entry);
1250
1251 auto entryCurrent = clusterIter();
1252 auto entryNext = clusterIter.GetNextEntry();
1253
1255 // There is no overlap between the cluster we found [entryCurrent, entryNext[
1256 // and the authorized range [fEntryMin, fEntryMax]
1257 // so we have nothing to do
1258 return false;
1259 }
1260
1261 // If there is overlap between the found cluster and the authorized range
1262 // update the cache data members with the information about the current cluster.
1265
1267 if (showMore || gDebug > 6)
1268 Info("FillBuffer", "Looking at cluster spanning from %lld to %lld", fEntryCurrent, fEntryNext);
1269
1271 if (fEntryMax <= 0) fEntryMax = tree->GetEntries();
1273
1274 if ( fEnablePrefetching ) {
1275 if ( entry == fEntryMax ) {
1276 // We are at the end, no need to do anything else
1277 return false;
1278 }
1279 }
1280
1281 if (resetBranchInfo) {
1282 // We earlier thought we were onto the next set of clusters.
1283 if (fCurrentClusterStart != -1 || fNextClusterStart != -1) {
1285 Error("FillBuffer", "Inconsistency: fCurrentClusterStart=%lld fEntryCurrent=%lld fNextClusterStart=%lld "
1286 "but fEntryCurrent should not be in between the two",
1288 }
1289 }
1290
1291 // Start the next cluster set.
1294 }
1295
1296 // Check if owner has a TEventList set. If yes we optimize for this
1297 // Special case reading only the baskets containing entries in the
1298 // list.
1299 TEventList *elist = fTree->GetEventList();
1301 if (elist) {
1302 if (fTree->IsA() ==TChain::Class()) {
1303 TChain *chain = (TChain*)fTree;
1304 Int_t t = chain->GetTreeNumber();
1305 chainOffset = chain->GetTreeOffset()[t];
1306 }
1307 }
1308
1309 //clear cache buffer
1311 if (fEnablePrefetching){ //prefetching mode
1312 if (fFirstBuffer) {
1315 }
1316 else {
1319 }
1320 }
1321 else {
1324 }
1325
1326 //store baskets
1327 BasketRanges ranges((showMore || gDebug > 6) ? fNbranches : 0);
1328 BasketRanges reqRanges(fNbranches);
1329 BasketRanges memRanges((showMore || gDebug > 6) ? fNbranches : 0);
1333 Long64_t maxReadEntry = minEntry; // If we are stopped before the end of the 2nd pass, this marker will where we need to start next time.
1335 auto perfStats = GetTree()->GetPerfStats();
1336
1337 struct collectionInfo {
1338 Int_t fClusterStart{-1}; // First basket belonging to the current cluster
1339 Int_t fCurrent{-1}; // Currently visited basket
1340 bool fLoadedOnce{false};
1341
1342 void Rewind() { fCurrent = (fClusterStart >= 0) ? fClusterStart : 0; }
1343 };
1344 std::vector<collectionInfo> cursor(fNbranches);
1345
1346 // Main loop to fill the cache, inside each loop we will loop over
1347 // all the cached branch and collect the baskets within the 'current'
1348 // range/cluster. If there is still space in the cache after that, we
1349 // will do another iteration to add one more cluster to the cache.
1350 // i.e. essentially loop over the clusters.
1351 do {
1353 Long64_t lowestMaxEntry = fEntryMax; // The lowest maximum entry in the TTreeCache for each branch for each pass.
1354
1355 bool reachedEnd = false;
1356 bool skippedFirst = false;
1357 bool oncePerBranch = false;
1358 Int_t nDistinctLoad = 0;
1359 bool progress = true;
1360 enum ENarrow {
1361 kFull = 0,
1362 kNarrow = 1
1363 };
1364 enum EPass {
1365 kStart = 1,
1366 kRegular = 2,
1367 kRewind = 3
1368 };
1369
1370 auto CollectBaskets = [this, elist, chainOffset, entry, clusterIterations, resetBranchInfo, perfStats,
1373 &ranges, &memRanges, &reqRanges,
1375 // The first pass we add one basket per branches around the requested entry
1376 // then in the second pass we add the other baskets of the cluster.
1377 // This is to support the case where the cache is too small to hold a full cluster.
1378 Int_t nReachedEnd = 0;
1379 Int_t nSkipped = 0;
1381 std::vector<Int_t> potentialVetoes;
1382
1383 if (showMore || gDebug > 7)
1384 Info("CollectBaskets", "Called with pass=%d narrow=%d maxCollectEntry=%lld", pass, narrow, maxCollectEntry);
1385
1386 bool filled = false;
1387 for (Int_t i = 0; i < fNbranches; ++i) {
1389 if (b->GetDirectory()==nullptr || b->TestBit(TBranch::kDoNotProcess))
1390 continue;
1391 if (b->GetDirectory()->GetFile() != fFile)
1392 continue;
1393 potentialVetoes.clear();
1394 if (pass == kStart && !cursor[i].fLoadedOnce && resetBranchInfo) {
1395 // First check if we have any cluster that is currently in the
1396 // cache but was not used and would be reloaded in the next
1397 // cluster.
1398 b->fCacheInfo.GetUnused(potentialVetoes);
1399 if (showMore || gDebug > 7) {
1401 for(auto v : potentialVetoes) {
1402 vetolist += v;
1403 vetolist.Append(' ');
1404 }
1405 if (!potentialVetoes.empty())
1406 Info("FillBuffer", "*** Potential Vetos for branch #%d: %s", i, vetolist.Data());
1407 }
1408 b->fCacheInfo.Reset();
1409 }
1410 Int_t nb = b->GetMaxBaskets();
1411 Int_t *lbaskets = b->GetBasketBytes();
1412 Long64_t *entries = b->GetBasketEntry();
1413 if (!lbaskets || !entries)
1414 continue;
1415 //we have found the branch. We now register all its baskets
1416 // from the requested offset to the basket below fEntryMax
1417 Int_t blistsize = b->GetListOfBaskets()->GetSize();
1418
1419 auto maxOfBasket = [this, nb, entries](int j) {
1420 return ((j < (nb - 1)) ? (entries[j + 1] - 1) : fEntryMax - 1);
1421 };
1422
1423 if (pass == kRewind)
1424 cursor[i].Rewind();
1425 else if (cursor[i].fCurrent == -1) {
1426 auto start = TMath::BinarySearch(b->GetWriteBasket() + 1, entries, minEntry);
1427 cursor[i].fCurrent = (start < 0) ? 0 : start;
1428 }
1429 for (auto &j = cursor[i].fCurrent; j < nb; j++) {
1430 // This basket has already been read, skip it
1431
1432 if (j < blistsize && b->GetListOfBaskets()->UncheckedAt(j)) {
1433
1434 if (showMore || gDebug > 6) {
1435 ranges.Update(i, entries[j], maxOfBasket(j));
1436 memRanges.Update(i, entries[j], maxOfBasket(j));
1437 }
1438 if (entries[j] <= entry && entry <= maxOfBasket(j)) {
1439 b->fCacheInfo.SetIsInCache(j);
1440 b->fCacheInfo.SetUsed(j);
1441 if (narrow) {
1442 // In narrow mode, we would select 'only' this basket,
1443 // so we are done for this round, let's 'consume' this
1444 // basket and go.
1445 ++nReachedEnd;
1446 ++j;
1447 break;
1448 }
1449 }
1450 continue;
1451 }
1452
1453 // Important: do not try to read maxCollectEntry, otherwise we might jump to the next autoflush
1454 if (entries[j] >= maxCollectEntry) {
1455 ++nReachedEnd;
1456 break; // break out of the for each branch loop.
1457 }
1458
1459 Long64_t pos = b->GetBasketSeek(j);
1460 Int_t len = lbaskets[j];
1461 if (pos <= 0 || len <= 0)
1462 continue;
1463 if (len > fBufferSizeMin) {
1464 // Do not cache a basket if it is bigger than the cache size!
1465 if ((showMore || gDebug > 7) &&
1466 (!(entries[j] < minEntry && (j < nb - 1 && entries[j + 1] <= minEntry))))
1467 Info("FillBuffer", "Skipping branch %s basket %d is too large for the cache: %d > %d",
1468 b->GetName(), j, len, fBufferSizeMin);
1469 continue;
1470 }
1471
1472 if (nReadPrefRequest && entries[j] > (reqRanges.AllIncludedRange().fMax + 1)) {
1473 // There is a gap between this basket and the max of the 'lowest' already loaded basket
1474 // If we are tight in memory, reading this basket may prevent reading the basket (for the other branches)
1475 // that covers this gap, forcing those baskets to be read uncached (because the cache wont be reloaded
1476 // until we use this basket).
1477 // eg. We could end up with the cache contain
1478 // b1: [428, 514[ // 'this' basket and we can assume [321 to 428[ is already in memory
1479 // b2: [400, 424[
1480 // and when reading entry 425 we will read b2's basket uncached.
1481
1482 if (showMore || gDebug > 8)
1483 Info("FillBuffer", "Skipping for now due to gap %d/%d with %lld > %lld", i, j, entries[j],
1484 (reqRanges.AllIncludedRange().fMax + 1));
1485 break; // Without consuming the basket.
1486 }
1487
1488 if (entries[j] < minEntry && (j<nb-1 && entries[j+1] <= minEntry))
1489 continue;
1490
1491 // We are within the range
1492 if (cursor[i].fClusterStart == -1)
1493 cursor[i].fClusterStart = j;
1494
1495 if (elist) {
1497 if (j<nb-1)
1498 emax = entries[j + 1] - 1;
1499 if (!elist->ContainsRange(entries[j]+chainOffset,emax+chainOffset))
1500 continue;
1501 }
1502
1503 if (b->fCacheInfo.HasBeenUsed(j) || b->fCacheInfo.IsInCache(j) || b->fCacheInfo.IsVetoed(j)) {
1504 // We already cached and used this basket during this cluster range,
1505 // let's not redo it
1506 if (showMore || gDebug > 7)
1507 Info("FillBuffer", "Skipping basket to avoid redo: %d/%d veto: %d", i, j, b->fCacheInfo.IsVetoed(j));
1508 continue;
1509 }
1510
1511 if (std::find(std::begin(potentialVetoes), std::end(potentialVetoes), j) != std::end(potentialVetoes)) {
1512 // This basket was in the previous cache/cluster and was not used,
1513 // let's not read it again. I.e. we bet that it will continue to not
1514 // be used. At worst it will be used and thus read by itself.
1515 // Usually in this situation the basket is large so the penalty for
1516 // (re)reading it uselessly is high and the penalty to read it by
1517 // itself is 'small' (i.e. size bigger than latency).
1518 b->fCacheInfo.Veto(j);
1519 if (showMore || gDebug > 7)
1520 Info("FillBuffer", "Veto-ing cluster %d [%lld,%lld[ in branch %s #%d", j, entries[j],
1521 maxOfBasket(j) + 1, b->GetName(), i);
1522 continue;
1523 }
1524
1525 if (narrow) {
1526 if ((((entries[j] > entry)) || (j < nb - 1 && entries[j + 1] <= entry))) {
1527 // Keep only the basket that contains the entry
1528 if (j == cursor[i].fClusterStart && entry > entries[j])
1529 ++nSkipped;
1530 if (entries[j] > entry)
1531 break;
1532 else
1533 continue;
1534 }
1535 }
1536
1538 // Humm ... we are going to go over the requested size.
1539 if (clusterIterations > 0 && cursor[i].fLoadedOnce) {
1540 // We already have a full cluster and now we would go over the requested
1541 // size, let's stop caching (and make sure we start next time from the
1542 // end of the previous cluster).
1543 if (showMore || gDebug > 5) {
1544 Info(
1545 "FillBuffer",
1546 "Breaking early because %lld is greater than %d at cluster iteration %d will restart at %lld",
1548 }
1550 filled = true;
1551 break;
1552 } else {
1553 if (pass == kStart || !cursor[i].fLoadedOnce) {
1554 if (((Long64_t)ntotCurrentBuf + len) > 4LL * fBufferSizeMin) {
1555 // Okay, so we have not even made one pass and we already have
1556 // accumulated request for more than twice the memory size ...
1557 // So stop for now, and will restart at the same point, hoping
1558 // that the basket will still be in memory and not asked again ..
1560
1561 if (showMore || gDebug > 5) {
1562 Info("FillBuffer", "Breaking early because %lld is greater than 4*%d at cluster iteration "
1563 "%d pass %d will restart at %lld",
1565 }
1566 filled = true;
1567 break;
1568 }
1569 } else {
1570 // We have made one pass through the branches and thus already
1571 // requested one basket per branch, let's stop prefetching
1572 // now.
1573 if (((Long64_t)ntotCurrentBuf + len) > 2LL * fBufferSizeMin) {
1575 if (showMore || gDebug > 5) {
1576 Info("FillBuffer", "Breaking early because %lld is greater than 2*%d at cluster iteration "
1577 "%d pass %d will restart at %lld",
1579 }
1580 filled = true;
1581 break;
1582 }
1583 }
1584 }
1585 }
1586
1588
1589 reqRanges.Update(i, j, entries, nb, fEntryMax);
1590 if (showMore || gDebug > 6)
1591 ranges.Update(i, j, entries, nb, fEntryMax);
1592
1593 b->fCacheInfo.SetIsInCache(j);
1594
1595 if (showMore || gDebug > 6)
1596 Info("FillBuffer", "*** Registering branch %d basket %d %s", i, j, b->GetName());
1597
1598 if (!cursor[i].fLoadedOnce) {
1599 cursor[i].fLoadedOnce = true;
1600 ++nDistinctLoad;
1601 }
1602 if (R__unlikely(perfStats)) {
1603 perfStats->SetLoaded(i, j);
1604 }
1605
1606 // Actual registering the basket for loading from the file.
1607 if (fEnablePrefetching){
1608 if (fFirstBuffer) {
1611 }
1612 else {
1615 }
1616 }
1617 else {
1620 }
1621
1622 if ( ( j < (nb-1) ) && entries[j+1] > maxReadEntry ) {
1623 // Info("FillBuffer","maxCollectEntry incremented from %lld to %lld", maxReadEntry, entries[j+1]);
1624 maxReadEntry = entries[j+1];
1625 }
1626 if (ntotCurrentBuf > 4LL * fBufferSizeMin) {
1627 // Humm something wrong happened.
1628 Warning("FillBuffer", "There is more data in this cluster (starting at entry %lld to %lld, "
1629 "current=%lld) than usual ... with %d %.3f%% of the branches we already have "
1630 "%d bytes (instead of %d)",
1631 fEntryCurrent, fEntryNext, entries[j], i, (100.0 * i) / ((float)fNbranches), ntotCurrentBuf,
1633 }
1634 if (pass == kStart) {
1635 // In the first pass, we record one basket per branch and move on to the next branch.
1636 auto high = maxOfBasket(j);
1637 if (high < lowestMaxEntry)
1638 lowestMaxEntry = high;
1639 // 'Consume' the baskets (i.e. avoid looking at it during a subsequent pass)
1640 ++j;
1641 break;
1642 } else if ((j + 1) == nb || entries[j + 1] >= maxReadEntry || entries[j + 1] >= lowestMaxEntry) {
1643 // In the other pass, load the baskets until we get to the maximum loaded so far.
1644 auto high = maxOfBasket(j);
1645 if (high < lowestMaxEntry)
1646 lowestMaxEntry = high;
1647 // 'Consume' the baskets (i.e. avoid looking at it during a subsequent pass)
1648 ++j;
1649 break;
1650 }
1651 }
1652
1653 if (cursor[i].fCurrent == nb) {
1654 ++nReachedEnd;
1655 }
1656
1657 if (gDebug > 0)
1658 Info("CollectBaskets",
1659 "Entry: %lld, registering baskets branch %s, fEntryNext=%lld, fNseek=%d, ntotCurrentBuf=%d",
1661 }
1663 skippedFirst = (nSkipped > 0);
1666 return filled;
1667 };
1668
1669 // First collect all the basket containing the request entry.
1670 bool full = false;
1671
1672 full = CollectBaskets(kStart, kNarrow, fEntryNext);
1673
1674 // Then fill out from all but the 'largest' branch to even out
1675 // the range across branches;
1676 while (!full && !reachedEnd && progress) { // used to be restricted to !oncePerBranch
1677 full = CollectBaskets(kStart, kFull, std::min(maxReadEntry, fEntryNext));
1678 }
1679
1680 resetBranchInfo = false; // Make sure the 2nd cluster iteration does not erase the info.
1681
1682 // Then fill out to the end of the cluster.
1683 if (!full && !fReverseRead) {
1684 do {
1685 full = CollectBaskets(kRegular, kFull, fEntryNext);
1686 } while (!full && !reachedEnd && progress);
1687 }
1688
1689 // The restart from the start of the cluster.
1690 if (!full && skippedFirst) {
1691 full = CollectBaskets(kRewind, kFull, fEntryNext);
1692 while (!full && !reachedEnd && progress) {
1693 full = CollectBaskets(kRegular, kFull, fEntryNext);
1694 }
1695 }
1696
1698
1699 minEntry = clusterIter.Next();
1700 if (fIsLearning) {
1701 fFillTimes++;
1702 }
1703
1704 // Continue as long as we still make progress (prevNtot < ntotCurrentBuf), that the next entry range to be looked
1705 // at,
1706 // which start at 'minEntry', is not past the end of the requested range (minEntry < fEntryMax)
1707 // and we guess that we not going to go over the requested amount of memory by asking for another set
1708 // of entries (fBufferSizeMin > ((Long64_t)ntotCurrentBuf*(clusterIterations+1))/clusterIterations).
1709 // ntotCurrentBuf / clusterIterations is the average size we are accumulated so far at each loop.
1710 // and thus (ntotCurrentBuf / clusterIterations) * (clusterIterations+1) is a good guess at what the next total
1711 // size
1712 // would be if we run the loop one more time. ntotCurrentBuf and clusterIterations are Int_t but can sometimes
1713 // be 'large' (i.e. 30Mb * 300 intervals) and can overflow the numerical limit of Int_t (i.e. become
1714 // artificially negative). To avoid this issue we promote ntotCurrentBuf to a long long (64 bits rather than 32
1715 // bits)
1718 if (showMore || gDebug > 6)
1719 Info("FillBuffer", "Breaking because %d <= %lld || (%d >= %d) || %lld >= %lld", fBufferSizeMin,
1722 break;
1723 }
1724
1725 //for the reverse reading case
1726 if (!fIsLearning && fReverseRead) {
1728 break;
1730 break;
1731 }
1732 fEntryNext = clusterIter.GetNextEntry();
1735 } while (true);
1736
1737 if (showMore || gDebug > 6) {
1738 Info("FillBuffer", "Mem ranges");
1739 memRanges.Print();
1740 Info("FillBuffer", "Combined ranges");
1741 ranges.Print();
1742 Info("FillBuffer", "Requested ranges");
1743 reqRanges.Print();
1745 }
1746
1747 if (nReadPrefRequest == 0) {
1748 // Nothing was added in the cache. This usually indicates that the baskets
1749 // contains the requested entry are either already in memory or are too large
1750 // on their own to fit in the cache.
1751 if (showMore || gDebug > 5) {
1752 Info("FillBuffer", "For entry %lld, nothing was added to the cache.", entry);
1753 }
1754 } else if (fEntryNext < firstClusterEnd && !reqRanges.Contains(entry)) {
1755 // Something went very wrong and even-though we searched for the baskets
1756 // holding 'entry' we somehow ended up with a range of entries that does
1757 // validate. So we must have been unable to find or fit the needed basket.
1758 // And thus even-though, we know the corresponding baskets wont be in the cache,
1759 // Let's make it official that 'entry' is within the range of this TTreeCache ('s search.)
1760
1761 // Without this, the next read will be flagged as 'out-of-range' and then we start at
1762 // the exact same point as this FillBuffer execution resulting in both the requested
1763 // entry still not being part of the cache **and** the beginning of the cluster being
1764 // read **again**.
1765
1766 if (showMore || gDebug > 5) {
1767 Error("FillBuffer", "Reset the next entry because the currently loaded range does not contains the request "
1768 "entry: %lld. fEntryNext updated from %lld to %lld. %d",
1770 reqRanges.Print();
1771 }
1772
1774 } else {
1775 if (showMore || gDebug > 5) {
1776 Info("FillBuffer", "Complete adding %d baskets from %d branches taking in memory %d out of %d",
1778 }
1779 }
1780
1782 if (fEnablePrefetching) {
1783 if (fIsLearning) {
1785 }
1786 if (!fIsLearning && fFirstTime){
1787 // First time we add autoFlush entries , after fFillTimes * autoFlush
1788 // only in reverse prefetching mode
1789 fFirstTime = false;
1790 }
1791 }
1792 fIsLearning = false;
1793 return true;
1794}
1795
1796////////////////////////////////////////////////////////////////////////////////
1797/// Return the desired prefill type from the environment or resource variable
1798/// - 0 - No prefill
1799/// - 1 - All branches
1800
1802{
1803 const char *stcp;
1804 Int_t s = 0;
1805
1806 if (!(stcp = gSystem->Getenv("ROOT_TTREECACHE_PREFILL")) || !*stcp) {
1807 s = gEnv->GetValue("TTreeCache.Prefill", 1);
1808 } else {
1809 s = TString(stcp).Atoi();
1810 }
1811
1812 return static_cast<TTreeCache::EPrefillType>(s);
1813}
1814
1815////////////////////////////////////////////////////////////////////////////////
1816/// Give the total efficiency of the primary cache... defined as the ratio
1817/// of blocks found in the cache vs. the number of blocks prefetched
1818/// ( it could be more than 1 if we read the same block from the cache more
1819/// than once )
1820///
1821/// Note: This should eb used at the end of the processing or we will
1822/// get incomplete stats
1823
1825{
1826 if ( !fNReadPref )
1827 return 0;
1828
1829 return ((Double_t)fNReadOk / (Double_t)fNReadPref);
1830}
1831
1832////////////////////////////////////////////////////////////////////////////////
1833/// The total efficiency of the 'miss cache' - defined as the ratio
1834/// of blocks found in the cache versus the number of blocks prefetched
1835
1837{
1838 if (!fNMissReadPref) {
1839 return 0;
1840 }
1841 return static_cast<double>(fNMissReadOk) / static_cast<double>(fNMissReadPref);
1842}
1843
1844////////////////////////////////////////////////////////////////////////////////
1845/// This will indicate a sort of relative efficiency... a ratio of the
1846/// reads found in the cache to the number of reads so far
1847
1849{
1850 if ( !fNReadOk && !fNReadMiss )
1851 return 0;
1852
1853 return ((Double_t)fNReadOk / (Double_t)(fNReadOk + fNReadMiss));
1854}
1855
1856////////////////////////////////////////////////////////////////////////////////
1857/// Relative efficiency of the 'miss cache' - ratio of the reads found in cache
1858/// to the number of reads so far.
1859
1861{
1862 if (!fNMissReadOk && !fNMissReadMiss) {
1863 return 0;
1864 }
1865
1866 return static_cast<double>(fNMissReadOk) / static_cast<double>(fNMissReadOk + fNMissReadMiss);
1867}
1868
1869////////////////////////////////////////////////////////////////////////////////
1870/// Static function returning the number of entries used to train the cache
1871/// see SetLearnEntries
1872
1877
1878////////////////////////////////////////////////////////////////////////////////
1879/// Print cache statistics. Like:
1880///
1881/// ~~~ {.cpp}
1882/// ******TreeCache statistics for file: cms2.root ******
1883/// Number of branches in the cache ...: 1093
1884/// Cache Efficiency ..................: 0.997372
1885/// Cache Efficiency Rel...............: 1.000000
1886/// Learn entries......................: 100
1887/// Reading............................: 72761843 bytes in 7 transactions
1888/// Readahead..........................: 256000 bytes with overhead = 0 bytes
1889/// Average transaction................: 10394.549000 Kbytes
1890/// Number of blocks in current cache..: 210, total size: 6280352
1891/// ~~~
1892///
1893/// - if option = "a" the list of blocks in the cache is printed
1894/// see also class TTreePerfStats.
1895/// - if option contains 'cachedbranches', the list of branches being
1896/// cached is printed.
1897
1899{
1900 TString opt = option;
1901 opt.ToLower();
1902 printf("******TreeCache statistics for tree: %s in file: %s ******\n",fTree ? fTree->GetName() : "no tree set",fFile ? fFile->GetName() : "no file set");
1903 if (fNbranches <= 0) return;
1904 printf("Number of branches in the cache ...: %d\n",fNbranches);
1905 printf("Cache Efficiency ..................: %f\n",GetEfficiency());
1906 printf("Cache Efficiency Rel...............: %f\n",GetEfficiencyRel());
1907 printf("Secondary Efficiency ..............: %f\n", GetMissEfficiency());
1908 printf("Secondary Efficiency Rel ..........: %f\n", GetMissEfficiencyRel());
1909 printf("Learn entries......................: %d\n",TTreeCache::GetLearnEntries());
1910 if ( opt.Contains("cachedbranches") ) {
1911 opt.ReplaceAll("cachedbranches","");
1912 printf("Cached branches....................:\n");
1914 Int_t nbranches = cachedBranches->GetEntriesFast();
1915 for (Int_t i = 0; i < nbranches; ++i) {
1916 TBranch* branch = (TBranch*) cachedBranches->UncheckedAt(i);
1917 printf("Branch name........................: %s\n",branch->GetName());
1918 }
1919 }
1921}
1922
1923////////////////////////////////////////////////////////////////////////////////
1924/// Old method ReadBuffer before the addition of the prefetch mechanism.
1925
1927 //Is request already in the cache?
1928 if (TFileCacheRead::ReadBuffer(buf,pos,len) == 1){
1929 fNReadOk++;
1930 return 1;
1931 }
1932
1933 static const auto recordMiss = [](TVirtualPerfStats *perfStats, TObjArray *branches, bool bufferFilled,
1935 if (gDebug > 6)
1936 ::Info("TTreeCache::ReadBufferNormal", "Cache miss after an %s FillBuffer: pos=%lld",
1937 bufferFilled ? "active" : "inactive", basketpos);
1938 for (Int_t i = 0; i < branches->GetEntries(); ++i) {
1939 TBranch *b = (TBranch *)branches->UncheckedAt(i);
1940 Int_t blistsize = b->GetListOfBaskets()->GetSize();
1941 for (Int_t j = 0; j < blistsize; ++j) {
1942 if (basketpos == b->GetBasketSeek(j)) {
1943 if (gDebug > 6)
1944 ::Info("TTreeCache::ReadBufferNormal", " Missing basket: %d for %s", j, b->GetName());
1945 perfStats->SetMissed(i, j);
1946 }
1947 }
1948 }
1949 };
1950
1951 //not found in cache. Do we need to fill the cache?
1952 bool bufferFilled = FillBuffer();
1953 if (bufferFilled) {
1954 Int_t res = TFileCacheRead::ReadBuffer(buf,pos,len);
1955
1956 if (res == 1)
1957 fNReadOk++;
1958 else if (res == 0) {
1959 fNReadMiss++;
1960 auto perfStats = GetTree()->GetPerfStats();
1961 if (perfStats)
1963 }
1964
1965 return res;
1966 }
1967
1968 if (CheckMissCache(buf, pos, len)) {
1969 return 1;
1970 }
1971
1972 fNReadMiss++;
1973 auto perfStats = GetTree()->GetPerfStats();
1974 if (perfStats)
1976
1977 return 0;
1978}
1979
1980////////////////////////////////////////////////////////////////////////////////
1981/// Used to read a chunk from a block previously fetched. It will call FillBuffer
1982/// even if the cache lookup succeeds, because it will try to prefetch the next block
1983/// as soon as we start reading from the current block.
1984
1986{
1987 if (TFileCacheRead::ReadBuffer(buf, pos, len) == 1){
1988 //call FillBuffer to prefetch next block if necessary
1989 //(if we are currently reading from the last block available)
1990 FillBuffer();
1991 fNReadOk++;
1992 return 1;
1993 }
1994
1995 //keep on prefetching until request is satisfied
1996 // try to prefetch a couple of times and if request is still not satisfied then
1997 // fall back to normal reading without prefetching for the current request
1998 Int_t counter = 0;
1999 while (true) {
2000 if(TFileCacheRead::ReadBuffer(buf, pos, len)) {
2001 break;
2002 }
2003 FillBuffer();
2004 fNReadMiss++;
2005 counter++;
2006 if (counter>1) {
2007 return 0;
2008 }
2009 }
2010
2011 fNReadOk++;
2012 return 1;
2013}
2014
2015////////////////////////////////////////////////////////////////////////////////
2016/// Read buffer at position pos if the request is in the list of
2017/// prefetched blocks read from fBuffer.
2018/// Otherwise try to fill the cache from the list of selected branches,
2019/// and recheck if pos is now in the list.
2020/// Returns:
2021/// - -1 in case of read failure,
2022/// - 0 in case not in cache,
2023/// - 1 in case read from cache.
2024/// This function overloads TFileCacheRead::ReadBuffer.
2025
2027{
2028 if (!fEnabled) return 0;
2029
2031 return TTreeCache::ReadBufferPrefetch(buf, pos, len);
2032 else
2033 return TTreeCache::ReadBufferNormal(buf, pos, len);
2034}
2035
2036////////////////////////////////////////////////////////////////////////////////
2037/// This will simply clear the cache
2038
2040{
2041 for (Int_t i = 0; i < fNbranches; ++i) {
2043 if (b->GetDirectory()==nullptr || b->TestBit(TBranch::kDoNotProcess))
2044 continue;
2045 if (b->GetDirectory()->GetFile() != fFile)
2046 continue;
2047 b->fCacheInfo.Reset();
2048 }
2049 fEntryCurrent = -1;
2050 fEntryNext = -1;
2052 fNextClusterStart = -1;
2053
2055
2056 if (fEnablePrefetching) {
2057 fFirstTime = true;
2059 }
2060}
2061
2062////////////////////////////////////////////////////////////////////////////////
2063/// Change the underlying buffer size of the cache.
2064/// If the change of size means some cache content is lost, or if the buffer
2065/// is now larger, setup for a cache refill the next time there is a read
2066/// Buffersize might be clamped, see TFileCacheRead::SetBufferSize
2067/// Returns:
2068/// - 0 if the buffer content is still available
2069/// - 1 if some or all of the buffer content has been made unavailable
2070/// - -1 on error
2071
2073{
2076 if (res < 0) {
2077 return res;
2078 }
2079
2080 if (res == 0 && buffersize <= prevsize) {
2081 return res;
2082 }
2083
2084 // if content was removed from the buffer, or the buffer was enlarged then
2085 // empty the prefetch lists and prime to fill the cache again
2086
2088 if (fEnablePrefetching) {
2090 }
2091
2092 fEntryCurrent = -1;
2093 if (!fIsLearning) {
2094 fEntryNext = -1;
2095 }
2096
2097 return 1;
2098}
2099
2100////////////////////////////////////////////////////////////////////////////////
2101/// Set the minimum and maximum entry number to be processed
2102/// this information helps to optimize the number of baskets to read
2103/// when prefetching the branch buffers.
2104
2106{
2107 // This is called by TTreePlayer::Process in an automatic way...
2108 // don't restart it if the user has specified the branches.
2110
2111 fEntryMin = emin;
2112 fEntryMax = emax;
2114 if (gDebug > 0)
2115 Info("SetEntryRange", "fEntryMin=%lld, fEntryMax=%lld, fEntryNext=%lld",
2117
2118 if (needLearningStart) {
2119 // Restart learning
2121 }
2122}
2123
2124////////////////////////////////////////////////////////////////////////////////
2125/// Change the file that is being cached.
2126
2128{
2129 // The infinite recursion is 'broken' by the fact that
2130 // TFile::SetCacheRead remove the entry from fCacheReadMap _before_
2131 // calling SetFile (and also by setting fFile to zero before the calling).
2132 if (fFile) {
2133 TFile *prevFile = fFile;
2134 fFile = nullptr;
2135 prevFile->SetCacheRead(nullptr, fTree, action);
2136 }
2138}
2139
2140////////////////////////////////////////////////////////////////////////////////
2141/// Static function to set the number of entries to be used in learning mode
2142/// The default value for n is 10. n must be >= 1
2143
2145{
2146 if (n < 1) n = 1;
2147 fgLearnEntries = n;
2148}
2149
2150////////////////////////////////////////////////////////////////////////////////
2151/// Set whether the learning period is started with a prefilling of the
2152/// cache and which type of prefilling is used.
2153/// The two value currently supported are:
2154/// - TTreeCache::kNoPrefill disable the prefilling
2155/// - TTreeCache::kAllBranches fill the cache with baskets from all branches.
2156/// The default prefilling behavior can be controlled by setting
2157/// TTreeCache.Prefill or the environment variable ROOT_TTREECACHE_PREFILL.
2158
2163
2164////////////////////////////////////////////////////////////////////////////////
2165/// The name should be enough to explain the method.
2166/// The only additional comments is that the cache is cleaned before
2167/// the new learning phase.
2168
2170{
2171 fIsLearning = true;
2172 fIsManual = false;
2173 fNbranches = 0;
2174 if (fBrNames) fBrNames->Delete();
2175 fIsTransferred = false;
2176 fEntryCurrent = -1;
2177}
2178
2179////////////////////////////////////////////////////////////////////////////////
2180/// This is the counterpart of StartLearningPhase() and can be used to stop
2181/// the learning phase. It's useful when the user knows exactly what branches
2182/// they are going to use.
2183/// For the moment it's just a call to FillBuffer() since that method
2184/// will create the buffer lists from the specified branches.
2185
2187{
2188 if (fIsLearning) {
2189 // This will force FillBuffer to read the buffers.
2190 fEntryNext = -1;
2191 fIsLearning = false;
2192 }
2193 fIsManual = true;
2194
2195 auto perfStats = GetTree()->GetPerfStats();
2196 if (perfStats)
2197 perfStats->UpdateBranchIndices(fBranches);
2198
2199 //fill the buffers only once during learning
2200 if (fEnablePrefetching && !fOneTime) {
2201 fIsLearning = true;
2202 FillBuffer();
2203 fOneTime = true;
2204 }
2205}
2206
2207////////////////////////////////////////////////////////////////////////////////
2208/// Update pointer to current Tree and recompute pointers to the branches in the cache.
2209
2211{
2212
2213 fTree = tree;
2214
2215 fEntryMin = 0;
2217
2218 fEntryCurrent = -1;
2219
2220 if (fBrNames->GetEntries() == 0 && fIsLearning) {
2221 // We still need to learn.
2223 } else {
2224 // We learnt from a previous file.
2225 fIsLearning = false;
2226 fEntryNext = -1;
2227 }
2228 fNbranches = 0;
2229
2230 TIter next(fBrNames);
2231 TObjString *os;
2232 while ((os = (TObjString*)next())) {
2233 TBranch *b = fTree->GetBranch(os->GetName());
2234 if (!b) {
2235 continue;
2236 }
2238 fNbranches++;
2239 }
2240
2241 auto perfStats = GetTree()->GetPerfStats();
2242 if (perfStats)
2243 perfStats->UpdateBranchIndices(fBranches);
2244}
2245
2246////////////////////////////////////////////////////////////////////////////////
2247/// Perform an initial prefetch, attempting to read as much of the learning
2248/// phase baskets for all branches at once
2249
2251{
2252 // This is meant for the learning phase
2253 if (!fIsLearning) return;
2254
2255 // This should be called before reading entries, otherwise we'll
2256 // always exit here, since TBranch adds itself before reading
2257 if (fNbranches > 0) return;
2258
2259 // Is the LearnPrefill enabled (using an Int_t here to allow for future
2260 // extension to alternative Prefilling).
2261 if (fPrefillType == kNoPrefill) return;
2262
2264
2265 // Return early if we are out of the requested range.
2267
2268 fLearnPrefilling = true;
2269
2270
2271 // Force only the learn entries to be cached by temporarily setting min/max
2272 // to the learning phase entry range
2273 // But save all the old values, so we can restore everything to how it was
2280
2281 fEntryMin = std::max(fEntryMin, fEntryCurrent);
2282 fEntryMax = std::min(fEntryMax, fEntryNext);
2283
2284 // We check earlier that we are within the authorized range, but
2285 // we might still be out of the (default) learning range and since
2286 // this is called before any branch is added to the cache, this means
2287 // that the user's first GetEntry is this one which is outside of the
2288 // learning range ... so let's do something sensible-ish.
2289 // Note: we probably should also fix the learning range but we may
2290 // or may not have enough information to know if we can move it
2291 // (for example fEntryMin (eminOld right now) might be the default or user provided)
2292 if (entry < fEntryMin) fEntryMin = entry;
2293 if (entry > fEntryMax) fEntryMax = entry;
2294
2295 // Add all branches to be cached. This also sets fIsManual, stops learning,
2296 // and makes fEntryNext = -1 (which forces a cache fill, which is good)
2297 AddBranch("*");
2298 fIsManual = false; // AddBranch sets fIsManual, so we reset it
2299
2300 // Now, fill the buffer with the learning phase entry range
2301 FillBuffer();
2302
2303 // Leave everything the way we found it
2304 fIsLearning = true;
2305 DropBranch("*"); // This doesn't work unless we're already learning
2306
2307 // Restore entry values
2314
2315 fLearnPrefilling = false;
2316}
#define R__unlikely(expr)
Definition RConfig.hxx:594
#define b(i)
Definition RSha256.hxx:100
int Int_t
Signed integer 4 bytes (int)
Definition RtypesCore.h:59
unsigned int UInt_t
Unsigned integer 4 bytes (unsigned int)
Definition RtypesCore.h:60
constexpr Ssiz_t kNPOS
The equivalent of std::string::npos for the ROOT class TString.
Definition RtypesCore.h:131
long long Long64_t
Portable signed long integer 8 bytes.
Definition RtypesCore.h:83
const char Option_t
Option string (const char)
Definition RtypesCore.h:80
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
R__EXTERN TEnv * gEnv
Definition TEnv.h:170
Option_t Option_t option
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t cursor
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h offset
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t r
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t result
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
char name[80]
Definition TGX11.cxx:110
void Print(GNN_Data &d, std::string txt="")
Int_t gDebug
Global variable setting the debug level. Set to 0 to disable, increase it in steps of 1 to increase t...
Definition TROOT.cxx:627
void Printf(const char *fmt,...)
Formats a string in a circular formatting buffer and prints the string.
Definition TString.cxx:2509
R__EXTERN TSystem * gSystem
Definition TSystem.h:572
A Branch for the case of an object.
A TTree is a list of TBranches.
Definition TBranch.h:93
@ kDoNotProcess
Active bit for branches.
Definition TBranch.h:105
A chain is a collection of files containing TTree objects.
Definition TChain.h:33
static TClass * Class()
virtual Int_t GetEntries() const
virtual Int_t GetValue(const char *name, Int_t dflt) const
Returns the integer value for a resource.
Definition TEnv.cxx:490
<div class="legacybox"><h2>Legacy Code</h2> TEventList is a legacy interface: there will be no bug fi...
Definition TEventList.h:31
virtual bool ContainsRange(Long64_t entrymin, Long64_t entrymax)
Return TRUE if list contains entries from entrymin to entrymax included.
A cache when reading files over the network.
virtual Int_t ReadBuffer(char *buf, Long64_t pos, Int_t len)
Read buffer at position pos.
virtual void SecondPrefetch(Long64_t, Int_t)
Bool_t fEnablePrefetching
reading by prefetching asynchronously
Int_t fNtot
Total size of prefetched blocks.
virtual Int_t SetBufferSize(Long64_t buffersize)
Sets the buffer size.
virtual void Prefetch(Long64_t pos, Int_t len)
Add block of length len at position pos in the list of blocks to be prefetched.
Int_t fBufferSizeMin
Original size of fBuffer.
Bool_t fIsTransferred
True when fBuffer contains something valid.
TFile * fFile
Pointer to file.
Int_t fNseek
Number of blocks to be prefetched.
void Print(Option_t *option="") const override
Print cache statistics.
virtual Int_t GetBufferSize() const
virtual void SetFile(TFile *file, TFile::ECacheAction action=TFile::kDisconnect)
Set the file using this cache and reset the current blocks (if any).
A ROOT file is an on-disk file, usually with extension .root, that stores objects in a file-system-li...
Definition TFile.h:131
virtual void SetCacheRead(TFileCacheRead *cache, TObject *tree=nullptr, ECacheAction action=kDisconnect)
Set a pointer to the read cache.
Definition TFile.cxx:2395
virtual Bool_t ReadBuffers(char *buf, Long64_t *pos, Int_t *len, Int_t nbuf)
Read the nbuf blocks described in arrays pos and len.
Definition TFile.cxx:1851
ECacheAction
TTreeCache flushing semantics.
Definition TFile.h:149
A TFriendElement TF describes a TTree object TF in a file.
A TLeaf describes individual elements of a TBranch See TBranch structure in TTree.
Definition TLeaf.h:57
A doubly linked list.
Definition TList.h:38
TObject * FindObject(const char *name) const override
Find an object in this list using its name.
Definition TList.cxx:575
void Add(TObject *obj) override
Definition TList.h:81
TObject * Remove(TObject *obj) override
Remove object from the list.
Definition TList.cxx:819
void Delete(Option_t *option="") override
Remove all objects from the list AND delete all heap based objects.
Definition TList.cxx:467
const char * GetName() const override
Returns name of object.
Definition TNamed.h:49
An array of TObjects.
Definition TObjArray.h:31
void AddAt(TObject *obj, Int_t idx) override
Add object at position ids.
virtual void AddAtAndExpand(TObject *obj, Int_t idx)
Add object at position idx.
TObject * UncheckedAt(Int_t i) const
Definition TObjArray.h:84
TObject * Remove(TObject *obj) override
Remove object from array.
Collectable string class.
Definition TObjString.h:28
virtual void Warning(const char *method, const char *msgfmt,...) const
Issue warning message.
Definition TObject.cxx:1057
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition TObject.cxx:1071
virtual void Info(const char *method, const char *msgfmt,...) const
Issue info message.
Definition TObject.cxx:1045
Regular expression class.
Definition TRegexp.h:31
Basic string class.
Definition TString.h:138
Ssiz_t Length() const
Definition TString.h:425
void ToLower()
Change string to lower-case.
Definition TString.cxx:1189
Int_t Atoi() const
Return integer value of string.
Definition TString.cxx:1994
const char * Data() const
Definition TString.h:384
TString & ReplaceAll(const TString &s1, const TString &s2)
Definition TString.h:712
TString & Append(const char *cs)
Definition TString.h:580
Bool_t Contains(const char *pat, ECaseCompare cmp=kExact) const
Definition TString.h:640
Ssiz_t Index(const char *pat, Ssiz_t i=0, ECaseCompare cmp=kExact) const
Definition TString.h:659
virtual const char * Getenv(const char *env)
Get environment variable.
Definition TSystem.cxx:1676
virtual void SetLearnPrefill(EPrefillType type=kNoPrefill)
Set whether the learning period is started with a prefilling of the cache and which type of prefillin...
virtual void UpdateBranches(TTree *tree)
Update pointer to current Tree and recompute pointers to the branches in the cache.
Double_t GetEfficiencyRel() const
This will indicate a sort of relative efficiency... a ratio of the reads found in the cache to the nu...
bool fEnabled
! cache enabled for cached reading
Definition TTreeCache.h:63
Long64_t fLastMiss
! set to the event # of the last miss.
Definition TTreeCache.h:74
Int_t fNMissReadPref
Number of blocks read into the secondary ("miss") cache.
Definition TTreeCache.h:50
~TTreeCache() override
Destructor. (in general called by the TFile destructor)
Int_t fNMissReadOk
Number of blocks read, not found in the primary cache, and found in the secondary cache.
Definition TTreeCache.h:46
Int_t SetBufferSize(Long64_t buffersize) override
Change the underlying buffer size of the cache.
Long64_t fEntryMin
! first entry in the cache
Definition TTreeCache.h:38
void SetOptimizeMisses(bool opt)
Start of methods for the miss cache.
Int_t LearnBranch(TBranch *b, bool subgbranches=false) override
Add a branch discovered by actual usage to the list of branches to be stored in the cache this functi...
static void SetLearnEntries(Int_t n=10)
Static function to set the number of entries to be used in learning mode The default value for n is 1...
bool CheckMissCache(char *buf, Long64_t pos, int len)
Check the miss cache for a particular buffer, fetching if deemed necessary.
Long64_t fEntryNext
! next entry number where cache must be filled
Definition TTreeCache.h:41
Int_t fNReadMiss
Number of blocks read and not found in the cache.
Definition TTreeCache.h:47
bool fReverseRead
! reading in reverse mode
Definition TTreeCache.h:58
Double_t GetEfficiency() const
Give the total efficiency of the primary cache... defined as the ratio of blocks found in the cache v...
virtual bool FillBuffer()
Fill the cache buffer with the branches in the cache.
TTree * GetTree() const
Definition TTreeCache.h:149
Long64_t fNextClusterStart
! End+1 of the cluster(s) where the current content was picked out
Definition TTreeCache.h:43
bool ProcessMiss(Long64_t pos, int len)
! Given a file read not in the miss cache, handle (possibly) loading the data.
const TObjArray * GetCachedBranches() const
Definition TTreeCache.h:139
bool fIsLearning
! true if cache is in learning mode
Definition TTreeCache.h:54
Int_t fNMissReadMiss
Number of blocks read and not found in either cache.
Definition TTreeCache.h:48
IOPos FindBranchBasketPos(TBranch &, Long64_t entry)
Given a branch and an entry, determine the file location (offset / size) of the corresponding basket.
virtual void SetEntryRange(Long64_t emin, Long64_t emax)
Set the minimum and maximum entry number to be processed this information helps to optimize the numbe...
TTreeCache()
Default Constructor.
std::unique_ptr< MissCache > fMissCache
! Cache contents for misses
Definition TTreeCache.h:105
bool fFirstBuffer
! true if first buffer is used for prefetching
Definition TTreeCache.h:56
bool fOptimizeMisses
! true if we should optimize cache misses.
Definition TTreeCache.h:72
void StartLearningPhase()
The name should be enough to explain the method.
TBranch * CalculateMissEntries(Long64_t, int, bool)
Given an file read, try to determine the corresponding branch.
Long64_t fCurrentClusterStart
! Start of the cluster(s) where the current content was picked out
Definition TTreeCache.h:42
Double_t GetMissEfficiency() const
The total efficiency of the 'miss cache' - defined as the ratio of blocks found in the cache versus t...
virtual Int_t ReadBufferNormal(char *buf, Long64_t pos, Int_t len)
Old method ReadBuffer before the addition of the prefetch mechanism.
virtual void ResetCache()
This will simply clear the cache.
bool fOneTime
! used in the learning phase
Definition TTreeCache.h:57
EPrefillType fPrefillType
Whether a pre-filling is enabled (and if applicable which type)
Definition TTreeCache.h:64
bool fLearnPrefilling
! true if we are in the process of executing LearnPrefill
Definition TTreeCache.h:68
virtual void LearnPrefill()
Perform an initial prefetch, attempting to read as much of the learning phase baskets for all branche...
virtual Int_t DropBranch(TBranch *b, bool subbranches=false)
Remove a branch to the list of branches to be stored in the cache this function is called by TBranch:...
Long64_t fEntryMax
! last entry in the cache
Definition TTreeCache.h:39
static Int_t fgLearnEntries
number of entries used for learning mode
Definition TTreeCache.h:65
virtual Int_t ReadBufferPrefetch(char *buf, Long64_t pos, Int_t len)
Used to read a chunk from a block previously fetched.
Long64_t fEntryCurrent
! current lowest entry number in the cache
Definition TTreeCache.h:40
void ResetMissCache()
Reset all the miss cache training.
Long64_t fFirstMiss
! set to the event # of the first miss.
Definition TTreeCache.h:73
Int_t ReadBuffer(char *buf, Long64_t pos, Int_t len) override
Read buffer at position pos if the request is in the list of prefetched blocks read from fBuffer.
bool fReadDirectionSet
! read direction established
Definition TTreeCache.h:62
bool fIsManual
! true if cache is StopLearningPhase was used
Definition TTreeCache.h:55
Double_t GetMissEfficiencyRel() const
Relative efficiency of the 'miss cache' - ratio of the reads found in cache to the number of reads so...
Long64_t fFirstEntry
! save the value of the first entry
Definition TTreeCache.h:61
Int_t fFillTimes
! how many times we can fill the current buffer
Definition TTreeCache.h:59
Int_t fNReadPref
Number of blocks that were prefetched.
Definition TTreeCache.h:49
TTree * fTree
! pointer to the current Tree
Definition TTreeCache.h:53
EPrefillType GetConfiguredPrefillType() const
Return the desired prefill type from the environment or resource variable.
static Int_t GetLearnEntries()
Static function returning the number of entries used to train the cache see SetLearnEntries.
virtual void StopLearningPhase()
This is the counterpart of StartLearningPhase() and can be used to stop the learning phase.
void SetFile(TFile *file, TFile::ECacheAction action=TFile::kDisconnect) override
Change the file that is being cached.
Int_t fNbranches
! Number of branches in the cache
Definition TTreeCache.h:44
Int_t fNReadOk
Number of blocks read and found in the cache.
Definition TTreeCache.h:45
void Print(Option_t *option="") const override
Print cache statistics.
Int_t AddBranch(TBranch *b, bool subgbranches=false) override
Add a branch to the list of branches to be stored in the cache this function is called by the user vi...
TObjArray * fBranches
! List of branches to be stored in the cache
Definition TTreeCache.h:51
TList * fBrNames
! list of branch names in the cache
Definition TTreeCache.h:52
bool fFirstTime
! save the fact that we processes the first entry
Definition TTreeCache.h:60
Helper class to iterate over cluster of baskets.
Definition TTree.h:306
A TTree represents a columnar dataset.
Definition TTree.h:89
virtual TVirtualPerfStats * GetPerfStats() const
Definition TTree.h:585
virtual TBranch * GetBranch(const char *name)
Return pointer to the branch with the given name in this tree or its friends.
Definition TTree.cxx:5430
virtual TObjArray * GetListOfLeaves()
Definition TTree.h:568
virtual Long64_t GetEntries() const
Definition TTree.h:502
virtual Long64_t GetReadEntry() const
Definition TTree.h:588
virtual TTree * GetTree() const
Definition TTree.h:596
TEventList * GetEventList() const
Definition TTree.h:552
TClass * IsA() const override
Definition TTree.h:744
virtual TList * GetListOfFriends() const
Definition TTree.h:569
Provides the interface for the an internal performance measurement and event tracing.
const Int_t n
Definition legend1.C:16
Long64_t BinarySearch(Long64_t n, const T *array, T value)
Binary search in an array of n values to locate value.
Definition TMathBase.h:348
Ta Range(0, 0, 1, 1)