# Trees

## Introducing TTree

As introduced in → Storing columnar data in a ROOT file and reading it back, ROOT can handle large columnar datasets. In the aforementioned section, we made use of RDataFrame to write and read back a simple dataset. RDataFrame traditionally relies on TTree for columnar data storage, used for example by all LHC (Large Hadron Collider) experiments. Trees are optimized for reduced disk space and selecting, high-throughput columnar access with reduced memory usage.

In addition to the documentation in this manual, we recommend to take a look at the TTree tutorials:

→ Tree tutorials

RNTuple

RNTuple is the experimental evolution of TTree columnar data storage. RNTuple introduces robust interfaces, a high-performance storage layout, and an asynchronous, thread-safe scheduling.

RDataFrame

To access TTree data, please use RDataFrame . TTree provides interfaces for low-level, expert usage.

### The tree and its data

A TTree behaves like an array of a data structure that resides on storage - except for one entry (or row, in database language). That entry is accessible in memory: you can load any tree entry, ideally sequentially. You can provide your own storage for the values of the columns of the current entry, in the form of variables. In this case you have to tell the TTree about the addresses of these variables; either by calling TTree::SetBranchAddress(), or by passing the variable when creating the branch for writing. When “filling” (writing) the TTree, it will read the values out of these variables; when reading back a TTree entry, it will write the values it read from storage into your variables.

### Branches and leaves

A tree consists of a list of independent columns, called branches. A branch can contain values of any fundamental type, C++ objects known to ROOT’s type system, or collections of those. When reading a tree, you can select which subset of branches should be read. This allows you to optimize read throughput for a given analysis, and is one of the main motivations for storing data in columnar format.

Branches are represented by TBranch and its derived classes.

While TBranch represent structure, objects inheriting from TLeaf give access to the actual data. Originally, any columnar data was accessible through a TLeaf; these days, some of the TBranch-derived classes provide data access themselves, such as TBranchElement .

Every branch or leaf stores the data for its entries in buffers of a size that can be specified during branch creation (default: 32000 bytes). Once the buffer is full, it gets compressed; the compressed buffer is called basket. These baskets are written into the ROOT file. Branches with more data per tree entry will fill more baskets than branches with less data per tree entry. Conversely, baskets can hold many tree entries if their branch stores only a few bytes per tree entry. This means that generally, all baskets - also of different branches - will contain data of different tree entry ranges.

To allow more efficient pre-fetching and better chunking of tree data stored in ROOT files, TTree groups baskets into clusters. A cluster contains all the data of a given entry range. Trees will close baskets that are not yet full when reaching the tree entry at a cluster boundary.

TTree finds the baskets for a given entry for a given branch by means of a header stored in the file. This header also contains other auxiliary metadata. When reading a TTree object, only this header is actually deserialized, until the tree’s entries are loaded. Multiple updates of these headers can often be found in files (treename;1, treename;2 etc, called cycles, see → Opening and inspecting a ROOT file). Only the last one (also accessible as treename) knows about all written baskets.

### TNtuple, the high-performance spread-sheet

For convenience, ROOT also provides the TNtuple class which is a tree whose branches contain only numbers of type float, one per tree entry. It derives from TTree and is constructed with a list of column names separated by :.

Example

## Writing a tree

When writing a TTree you first want to create a TFile (see → ROOT files. Then construct the TTree to be stored in the file; we will later add branches to the tree.

Example

### Creating branches

There are multiple ways to add branches to a TTree; the most commonly used ones are covered here. More extensive documentation can be found in the reference manual.

Note

Do not use the TBranch constructor to add a branch to a tree.

Note

The objects and variables used to create branches must not be destroyed until the TTree is deleted or TTree::ResetBranchAddress() is called. If the address of the data to be filled changes with each tree entry, you have to inform the branch about the new address with TBranch::SetAddress before filling the tree again.

1. Branches holding basic types

If you have a variable of type int, float, bool, or any other basic type, you can create a branch (and a leaf) from it. For fundamental datatypes, the type can be deduced from the variable and the name of the leaf will be set to the name of the branch. In Python, that type information is not available and the leaf name and data type must be specified as third argument. Further details are explained in the reference guide.

2. Branches holding class type

You can create a branch holding one of ROOT’s classes, or your own type for which you have provided a dictionary (see → I/O).

Splitting

If told, TTree will create (sub-) branches for each member of a class and its base classes. If such a member is a class itself, that member’s type can also be split. The recursion level of nested splitting is called the “split level”; it can be configured during branch creation.

If the split level is set to 0, there is no splitting: all data members are stored in the same branch. Data members can also be configured to be non-split as part of the dictionary; see → I/O. The default split level of 99 means to split all members at any recursion level.

Pointers

While references X & are not supported as member types, pointers are. If the pointer is non-null, ROOT stores the object pointed to (pointee). If multiple pointers within the same branch point to the same object during one TBranch::Fill() operation (as invoked by TTree::Fill()), that pointee will only be stored once; upon reading, all pointers will again point to the same object.

For the general case, indices into object collections could be persistified instead of pointers. This way, the object is only stored once.

Example

ROOT’s class TNamed has the data members fName and fTitle. The following requests the tree to create a branch for each of them. As TNamed derives from TObject, branches for TObject’s data members will also be created.

3. Branches holding std::vector, std::array, std::list, etc

Both top-level branches (those created by a call to TTree::Branch()) and branches created by splitting data members can hold collections such as std::vector, std::array, std::list, or std::map. Splitting can traverse through collections: if a member is a std::vector<X>, the tree can split X into sub-branches, too.

Such collections can also contain pointers. For polymorphic pointees, ROOT will not just stream the base, but determine the actual object type. If the split level is TTree::kSplitCollectionOfPointers then the pointees will be written in split mode, possibly adding new branches as new polymorphic derived types are encountered.

### Filling a tree

Use TTree:Fill() to add a new entry (or “row”) to the tree, and store the current values of the variables that were provided during branch creation.

Use TTree::Write() to write the tree header into a ROOT file. Earlier entries’ data might already be written as part of TTree::Fill().

If due to the data written during TTree::Fill(), the file’s size increases beyond TTree::GetMaxTreeSize(), the current ROOT file is closed and a new ROOT file is created. For an original ROOT file named myfile.root, the subsequent ROOT files are named myfile_1.root, myfile_2.root, etc.

Example

AutoFlush

The tree can flush its data (i.e. its baskets) to file when reaching a given cluster size, thus closing the cluster. By default this happens approximately every 30MB of compressed data. The size can be adjusted using using TTree::SetAutoFlush().

AutoSave

The tree can write a header update to file after it has collected a certain data size in baskets (by default, 300MB). If your program crashes, you can recover the tree and its baskets written before the last autosave.

You can adjust the threshold (in bytes or entries) using TTree::SetAutoSave().

Note

Please use RDataFrame to read trees, unless you need to do low-level I/O!

To read a tree, you need to associate your variables with the tree’s branches, as when writing. When loading a tree entry, the tree will set the variables to the branch’s value as read from the storage. That is done by calling TTree::SetBranchAddress():

Example

In Python you can simply use the branch name as an attribute on the tree:

### Selecting a subset of branches to be read

You can select or deselect branches from being read by GetEntry() by calling TTree::SetBranchStatus(). It is vividly recommended to only read the branches actually needed: TTree is optimized for exactly this use case, and most analyses will only need a fraction of the available branches.

### Selecting a subset of entries to be read

To process only a selection of tree entries, you can use a TEntryList . First you insert the tree entry numbers you want to process into the TEntryList.

You can then re-use the TEntryList in subsequent processing of the tree, skipping irrelevant entries.

## Appending TTrees as a TChain

In high energy physics you always want as much data as possible. But it’s not nice to deal with files of multiple terabytes. ROOT allows to to split data across multiple files, where you can then access the files’ tree parts as one large tree. That’s done through TChain , which inherits from TTree : it wants to know the name of the trees in the files (which can be overridden when adding files), and the file names, and will act as if it was a huge, continuous tree:

Example

## Widening a TTree through friends

Trees are usually written just once. While updating an existing tree is non-trivial, extending it with additional branches, potentially an “improved” version of an original branch, is straightforward. “Friend trees” are added by calling TTree::AddFriend(). Adding another tree called T1 as a friend tree will make the branch X of T1 available as both T1.X and - if X does not exist in the original tree - as X.

Friend trees are expected to have at least as many entries as the original tree. The order of the friend tree’s entries must preserve the entry order of the original tree.

Note

Care must be taken to ensure that the order of entries in the primary tree matches friends’ entries. This is especially relevant when processing a tree in parallel to generate a friend tree, as the entries might be written out in an undefined order (misaligned entries). This can be mitigated by building an index on the friend tree with TTree::BuildIndex()), see Indexing a Tree.

## Examining a tree

ROOT offers different ways to examine tree structure and its contents, from text to graphics.

### Printing the summary of a tree

Use TTree::Print() to see a summary of the tree structure.

Example

### Showing the content of a tree entry

Use TTree::Show() to display the values of all branches for a given tree entry.

Example

### Showing tree data as a table

Use TTree::Scan() to display a paged table of branches’ values for all or some tree entries.

Example

## Tree Viewer

With the Tree Viewer you can examine a tree in a GUI.

Note

You can also use the ROOT object browser to examine a tree that is saved in a ROOT file. See → ROOT object browser.

Example

Figure: Tree Viewer.

The left panel contains the list of trees and their branches. The right panel displays the leaves or variables in the tree.

### Drawing correlating variables in a scatterplot

You can show the correlation between the variables, listed in the TTreeViewer , by drawing a scatterplot.

• Select a variable in the TTreeViewer and drag it to the X:-empty- entry.
• Select a second variable and drag it to the Y:-empty- entry.

Figure: Variables Age and Cost selected for the scatterplot.

• Click Scatterplot.

Figure: Scatterplot icon.

The scatterplot is drawn.

Figure: Scatterplot of the variables Age and Cost.

Note that not each (x,y) point on a scatterplot represents two values in your N−tuple. In fact, the scatterplot is a grid and each square in the grid is randomly populated with a density of dots that’s proportional to the number of values in that grid.

### Indexing a tree

Use TTree::BuildIndex() to build an index table over expressions that depend on the value in the leaves. This index is similar to database indexes: it allows to quickly determine the tree entry number corresponding to the value of an expression. These expressions should be both equality comparable (that is, not use floating point numbers where precision might cause the index lookup to fail) and unique, to make sure you get the tree entry you expect. For high-energy physics, a common example could be a combination of run number and event number: while each one of them might have duplications, their combination is guaranteed to be unique.

To build an index, define a major and optionally a minor expression. In the example above these might simply be the leaves Run and Event. They can be expressions using original tree variables, such as "run - 90000". TTree::BuildIndex() loops over all entries and builds the lookup table from the expressions to the tree entry number. The index can then be saved as part of the TTree object with tree.Write(). This is done most conveniently at the end of the filling process, just before saving the tree header.

An entry can be retrieved using the index with TTree::GetEntryWithIndex().

Tree indexing works as well with a TChain` .