Trees
Introducing TTree
As introduced in → Storing columnar data in a ROOT file and reading it back,
ROOT can handle large columnar datasets.
In the aforementioned section, we made use of RDataFrame to write and
read back a simple dataset.
RDataFrame traditionally relies on TTree
for columnar data storage, used for example
by all LHC (Large Hadron Collider) experiments.
Trees are optimized for reduced disk space and selecting, high-throughput columnar access with reduced memory usage.
In addition to the documentation in this manual, we recommend to take a look at the TTree tutorials:
→ Tree tutorialsRNTuple
RNTuple is the experimental evolution of
TTree
columnar data storage. RNTuple introduces robust interfaces, a high-performance storage layout, and an asynchronous, thread-safe scheduling.
RDataFrame
To access TTree data, please use RDataFrame.
TTree
provides interfaces for low-level, expert usage.
The tree and its data
A TTree
behaves like an array of a data structure that resides on storage - except for one entry (or row, in database language).
That entry is accessible in memory: you can load any tree entry, ideally sequentially.
You can provide your own storage for the values of the columns of the current entry, in the form of variables.
In this case you have to tell the TTree
about the addresses of these variables; either by calling TTree::SetBranchAddress()
, or by passing the variable when creating the branch for writing.
When “filling” (writing) the TTree
, it will read the values out of these variables;
when reading back a TTree
entry, it will write the values it read from storage into your variables.
Branches and leaves
A tree consists of a list of independent columns, called branches. A branch can contain values of any fundamental type, C++ objects known to ROOT’s type system, or collections of those. When reading a tree, you can select which subset of branches should be read. This allows you to optimize read throughput for a given analysis, and is one of the main motivations for storing data in columnar format.
Branches are represented by TBranch
and its derived classes.
While TBranch
represent structure, objects inheriting from TLeaf
give access to the actual data.
Originally, any columnar data was accessible through a TLeaf
; these days, some of the TBranch
-derived classes provide data access themselves, such as TBranchElement
.
Baskets, clusters and the tree header
Every branch or leaf stores the data for its entries in buffers of a size that can be specified during branch creation (default: 32000 bytes). Once the buffer is full, it gets compressed; the compressed buffer is called basket. These baskets are written into the ROOT file. Branches with more data per tree entry will fill more baskets than branches with less data per tree entry. Conversely, baskets can hold many tree entries if their branch stores only a few bytes per tree entry. This means that generally, all baskets - also of different branches - will contain data of different tree entry ranges.
To allow more efficient pre-fetching and better chunking of tree data stored in ROOT files, TTree groups baskets into clusters. A cluster contains all the data of a given entry range. Trees will close baskets that are not yet full when reaching the tree entry at a cluster boundary.
TTree finds the baskets for a given entry for a given branch by means of a header stored in the file.
This header also contains other auxiliary metadata.
When reading a TTree
object, only this header is actually deserialized, until the tree’s entries are loaded.
Multiple updates of these headers can often be found in files (treename;1
, treename;2
etc, called cycles, see → Opening and inspecting a ROOT file).
Only the last one (also accessible as treename
) knows about all written baskets.
TNtuple
, the high-performance spread-sheet
For convenience, ROOT also provides the TNtuple
class which is a tree whose branches contain only numbers of type float
, one per tree entry.
It derives from TTree
and is constructed with a list of column names separated by :
.
Example
Writing a tree
When writing a TTree
you first want to create a TFile
(see → ROOT files.
Then construct the TTree
to be stored in the file; we will later add branches to the tree.
Example
Creating branches
There are multiple ways to add branches to a TTree
; the most commonly used ones are covered here.
More extensive documentation can be found in the reference manual.
Note
Do not use the
TBranch
constructor to add a branch to a tree.
Note
The objects and variables used to create branches must not be destroyed until the
TTree
is deleted orTTree::ResetBranchAddress()
is called. If the address of the data to be filled changes with each tree entry, you have to inform the branch about the new address with TBranch::SetAddress before filling the tree again.
1. Branches holding basic types
If you have a variable of type int
, float
, bool
, or any other basic type, you can create a branch (and a leaf) from it.
For fundamental datatypes, the type can be deduced from the variable and the name of the leaf will be set to the name of the branch.
In Python, that type information is not available and the leaf name and data type must be specified as third argument.
Further details are explained in the reference guide.
2. Branches holding class type
You can create a branch holding one of ROOT’s classes, or your own type for which you have provided a dictionary (see → I/O).
If told, TTree will create (sub-) branches for each member of a class and its base classes. If such a member is a class itself, that member’s type can also be split. The recursion level of nested splitting is called the “split level”; it can be configured during branch creation.
If the split level is set to 0, there is no splitting: all data members are stored in the same branch. Data members can also be configured to be non-split as part of the dictionary; see → I/O. The default split level of 99 means to split all members at any recursion level.
Pointers
While references X &
are not supported as member types, pointers are.
If the pointer is non-null, ROOT stores the object pointed to (pointee).
If multiple pointers within the same branch point to the same object during one TBranch::Fill()
operation (as invoked by TTree::Fill()
), that pointee will only be stored once; upon reading, all pointers will again point to the same object.
For the general case, indices into object collections could be persistified instead of pointers. This way, the object is only stored once.
Example
ROOT’s class TNamed
has the data members fName
and fTitle
.
The following requests the tree to create a branch for each of them.
As TNamed
derives from TObject
, branches for TObject
’s data members will also be created.
3. Branches holding std::vector
, std::array
, std::list
, etc
Both top-level branches (those created by a call to TTree::Branch()
) and branches created by splitting data members can hold collections such as std::vector
, std::array
, std::list
, or std::map
.
Splitting can traverse through collections:
if a member is a std::vector<X>
, the tree can split X
into sub-branches, too.
Such collections can also contain pointers.
For polymorphic pointees, ROOT will not just stream the base, but determine the actual object type.
If the split level is TTree::kSplitCollectionOfPointers
then the pointees will be written in split mode, possibly adding new branches as new polymorphic derived types are encountered.
Filling a tree
Use TTree:Fill() to add a new entry (or “row”) to the tree, and store the current values of the variables that were provided during branch creation.
Writing the tree header
Use TTree::Write() to write the tree header into a ROOT file.
Earlier entries’ data might already be written as part of TTree::Fill()
.
If due to the data written during TTree::Fill()
, the file’s size increases beyond TTree::GetMaxTreeSize(), the current ROOT file is closed and a new ROOT file is created.
For an original ROOT file named myfile.root
, the subsequent ROOT files are named myfile_1.root
, myfile_2.root
, etc.
Example
AutoFlush
The tree can flush its data (i.e. its baskets) to file when reaching a given cluster size, thus closing the cluster. By default this happens approximately every 30MB of compressed data. The size can be adjusted using using TTree::SetAutoFlush().
AutoSave
The tree can write a header update to file after it has collected a certain data size in baskets (by default, 300MB). If your program crashes, you can recover the tree and its baskets written before the last autosave.
You can adjust the threshold (in bytes or entries) using TTree::SetAutoSave().
Reading a tree
Note
Please use RDataFrame to read trees, unless you need to do low-level I/O!
To read a tree, you need to associate your variables with the tree’s branches, as when writing.
When loading a tree entry, the tree will set the variables to the branch’s value as read from the storage.
That is done by calling TTree::SetBranchAddress()
:
Example
In Python you can simply use the branch name as an attribute on the tree:
Selecting a subset of branches to be read
You can select or deselect branches from being read by GetEntry()
by calling TTree::SetBranchStatus()
.
It is vividly recommended to only read the branches actually needed:
TTree
is optimized for exactly this use case, and most analyses will only need a fraction of the available branches.
Selecting a subset of entries to be read
To process only a selection of tree entries, you can use a TEntryList
.
First you insert the tree entry numbers you want to process into the TEntryList
.
You can then re-use the TEntryList
in subsequent processing of the tree, skipping irrelevant entries.
Appending TTree
s as a TChain
In high energy physics you always want as much data as possible.
But it’s not nice to deal with files of multiple terabytes.
ROOT allows to to split data across multiple files, where you can then access the files’ tree parts as one large tree.
That’s done through TChain
, which inherits from TTree
:
it wants to know the name of the trees in the files (which can be overridden when adding files), and the file names, and will act as if it was a huge, continuous tree:
Example
Widening a TTree
through friends
Trees are usually written just once.
While updating an existing tree is non-trivial, extending it with additional branches, potentially an “improved” version of an original branch, is straightforward.
“Friend trees” are added by calling TTree::AddFriend().
Adding another tree called T1
as a friend tree will make the branch X
of T1
available as both T1.X
and - if X
does not exist in the original tree - as X
.
Friend trees are expected to have at least as many entries as the original tree. The order of the friend tree’s entries must preserve the entry order of the original tree.
Note
Care must be taken to ensure that the order of entries in the primary tree matches friends’ entries. This is especially relevant when processing a tree in parallel to generate a friend tree, as the entries might be written out in an undefined order (misaligned entries). This can be mitigated by building an index on the friend tree with TTree::BuildIndex()), see Indexing a Tree.
Examining a tree
ROOT offers different ways to examine tree structure and its contents, from text to graphics.
Printing the summary of a tree
Use TTree::Print() to see a summary of the tree structure.
Example
Showing the content of a tree entry
Use TTree::Show() to display the values of all branches for a given tree entry.
Example
Showing tree data as a table
Use TTree::Scan() to display a paged table of branches’ values for all or some tree entries.
Example
Tree Viewer
With the Tree Viewer you can examine a tree in a GUI.
Note
You can also use the ROOT object browser to examine a tree that is saved in a ROOT file. See → ROOT object browser.
Example
Figure: Tree Viewer.
The left panel contains the list of trees and their branches. The right panel displays the leaves or variables in the tree.
Drawing correlating variables in a scatterplot
You can show the correlation between the variables, listed in the TTreeViewer
, by drawing a scatterplot.
- Select a variable in the
TTreeViewer
and drag it to theX:-empty-
entry. - Select a second variable and drag it to the
Y:-empty-
entry.
Figure: Variables Age and Cost selected for the scatterplot.
- Click
Scatterplot
.
Figure: Scatterplot icon.
The scatterplot is drawn.
Figure: Scatterplot of the variables Age and Cost.
Note that not each `(x,y) point on a scatterplot represents two values in your N−tuple. In fact, the scatterplot is a grid and each square in the grid is randomly populated with a density of dots that’s proportional to the number of values in that grid.
Indexing a tree
Use TTree::BuildIndex() to build an index table over expressions that depend on the value in the leaves. This index is similar to database indexes: it allows to quickly determine the tree entry number corresponding to the value of an expression. These expressions should be both equality comparable (that is, not use floating point numbers where precision might cause the index lookup to fail) and unique, to make sure you get the tree entry you expect. For high-energy physics, a common example could be a combination of run number and event number: while each one of them might have duplications, their combination is guaranteed to be unique.
To build an index, define a major and optionally a minor expression.
In the example above these might simply be the leaves Run
and Event
.
They can be expressions using original tree variables, such as "run - 90000"
.
TTree::BuildIndex() loops over all entries and builds the lookup table from the expressions to the tree entry number.
The index can then be saved as part of the TTree
object with tree.Write()
.
This is done most conveniently at the end of the filling process, just before saving the tree header.
An entry can be retrieved using the index with TTree::GetEntryWithIndex().
Tree indexing works as well with a TChain
.