This page explains the mechanisms and concepts behind ROOT’s I/O facilities, i.e. how ROOT converts your objects into a stream of bytes and back. It assumes that you have read the introduction on → ROOT files, → I/O of custom classes, and → Trees.
ROOT files, directories, and keys
Similar to a file system directory, a
TFile can contain directories (
) and objects, accessible through the directory’s keys (
TFile is a directory itself: it inherits from
The global “current directory”
ROOT uses two globals (thread-local static, to be precise) that point to the most recently opened file
and the “current” directory
The most recently opened ROOT file is always also the current directory.
You can change the current directory by assigning to
gDirectory; you can see the current directory with
Some objects operate (at least by default) on that global directory.
An example is
TObject::Write() (unlike the preferred
TDirectory::WriteObject()), or the
TTree constructor if no directory is specified.
Also object ownership relates to the current directory, see → Object ownership.
ROOT’s interpreter interfaces (both C++ and Python) make the objects of the current directory available as if they were variables. This is a strong motivation to use valid C++ identifiers as key names, i.e. without spaces, not starting with a digit, etc.
Here, we show how opening a
gDirectory, and how one can use an object stored in the current directory as if it was a declared variable.
Rint:/ corresponds to “ROOT’s in-memory” directory, the default directory during startup.
Adding and removing objects from a ROOT file
When writing an object to a ROOT file, ROOT creates a directory “entry” (
) representing the object, which mainly consists of a name and the object’s persistent data.
You can think of a
TFile as a collection of
TKeys, possibly inside nested
The name of the
TKey can be either explicitly stated when writing, or it can be determined from
TObject::GetName() for classes inheriting from
An object read into memory is independent from the object on disk.
Changes of the in-memory object are not propagated to disk.
Instead, a new version of the object needs to be saved, for instance passing
"overwrite" as option to
TDirectory::WriteObject(), see the documentation of
Removing an object from a
TDirectory::Delete() will generally not free the corresponding disk space.
Instead, the storage occupied by the deleted object will be made available (as a
TFree for subsequent objects to be written to this file.
hadd to defragment a ROOT file by rewriting it.
Iterating over a directory’s content
TFile gives access to the list of keys through its base class, TDirectoryFile::GetListOfKeys().
Get the list of keys from the demo.root file and print them.
For instance if you do not know an object’s name upfront, it can be useful to iterate through all of a directory’s entries.
The output of such an
iterate.C ROOT macro could be:
root .x iterate.C key: h0 points to an object of class: TH1F key: h1 points to an object of class: TH1F ...
Note the concept of name cycles, see → Opening and inspecting a ROOT file.
ROOT’s C++ object serialization: from memory to disk and back
Writing an object to file means writing the current values of the object’s data members. This is done with the help of what ROOT calls a streamer: each serializable class has such a streamer which converts all data members into a buffer of raw bytes (see TBuffer).
Variables of composite data types such as classes, structures, and arrays can be decomposed into simple types such as longs, shorts, floats, and chars. These values are then written out in a machine-independent representation.
This happens recursively: a second streamer will be invoked for a member of class type, to stream that class’s members. A similar recursion happens for all base classes. In the end, the buffer contains all the simple data members of all the classes that make up that particular object.
At runtime, ROOT needs to determine which streamer function to call for a given object.
ClassDef macro inside the class definition makes this operation more performant:
ClassDef, the object’s dynamic type has to be looked up to determine which streamer to invoke.
See also → The
Data members of certain types need special treatment, for instance pointers and references. This will be explained below.
The methods of a class are not written to the ROOT file, it contains only the persistent data members.
See also → Restrictions on types ROOT I/O can handle.
Streamers are C++ functions that are usually created as part of a class’s dictionary, see → I/O of custom classes.
rootcling parses the class definition and determines how to stream the object in an optimal way, and which streamers need to be invoked for base classes and members.
Excluding data members from I/O
To prevent a data member from being written to the file, insert a
! as the first character in a single-line comment (
//) following the declaration of the member.
To accommodate doxygen-style documentation, this annotation can also be written as
Marking pointers as never null
For a small performance benefit, pointer data members can be marked as always be pointing to valid memory (never being null):
this is done with the annotation
A pointer marked as such must not point back to the current object, not even indirectly.
The pointer data member
fH is marked as never null: ROOT will be able to perform additional optimizations.
fTracks, instead, will always be checked for
Array data members of fixed and variable size
ROOT supports I/O of fixed size arrays out of the box. For variable-size arrays, a special comment syntax is available to specify the name of a data member that holds the size of the array. Here is an example:
//[fNVertex] tells ROOT that the length of the array is stored in the corresponding variable. In general, the syntax is:
LENGTH must be the name of a data member that is defined before the array member, or in a base class.
Note Pointers to simple types (e.g.
int*) are assumed to be variable-size arrays.
If you know that a certain data member will always have to be read as a full object, it can be more performant to prevent its splitting.
To do so, add
//|| as an annotation to its declaration:
Double32_t: storing doubles with single precision
Some values have inherent reduced precision, yet benefit from double precision arithmetics.
The type alias
Double32_t represents a value that has double precision in memory, but it is stored with lower, adjustable precision.
The actual size on disk (before compression) is determined by the parameter next to the data element declaration:
If the comment is absent or does not contain
nbits, the member is saved with single precision.
max values themselves, if present, are saved with 32-bits precision.
max can be either a floating point number or one of the following trivial mathematical expressions:
nbits is present, the member is saved with
nbits-bits precision. For more details, see this tutorial.
Usually, streamers are generated automatically by
rootcling (see → I/O of custom classes). However, you can also create your own streamer.
A common use case is as a post-read hook, for instance for the registration of a read object with other objects.
You need to tell
rootcling not to build a streamer, by ending the
#pragma statement with a
The following is an example of a customized
Streamer function for
Event. It takes a
as a parameter, and first checks to see if this is a case of reading or writing the buffer.
Note A class with a custom streamer cannot be split, and its members cannot be stored member-wise.
Disable storage of TObject data members in derived classes
Types do not have to inherit from TObject for ROOT to be able to read/write them: the presence of a dictionary is sufficient.
Classes that do inherit from
TObject can exclude TObject’s data members from their I/O invoking
myObject->Class()->IgnoreTObjectStreamer() before any object of type
MyClass is written to a ROOT file.
This is useful in case you do not use
fUniqueID data members and saving some space in the output file is important.
Storing networks of objects pointing at each other
ROOT supports storing multiple objects with complex networks of pointers between them, including in the presence of circular dependencies.
The network of pointers is preserved on disk and recreated when the data is read.
Note that in the special case of an object being pointed to, where one of its members is also pointed to, that member will be serialized both as part of the object and independently.
Compression and performance
Compressing data saves disk space, at the cost of additional work for the CPU to write and read the data. If your analysis is one of the rare cases which spends most of the time in CPU work, using uncompressed data might be beneficial.
Most analyses on the other hand will benefit from one of the fast compression algorithms that also reduce the amount of data to be read from disk or transferred over the network.
The compression factor, that is, the saving of storage space, varies with the type of data. A buffer containing
N identical values is compressed better than a set of values with higher entropy.
ROOT offers several options, such as LZMA with very high compression ratio, or LZ4 with very high decompression throughput, or ZSTD with a good compromise in performance.
The default compression for
RNTuple is determined based on the data; for everything else it’s zlib with compression level 1.
Algorithm and compression level can be selected using
TFile::SetCompressionLevel(), respectively, at the time data is written. A compression level of 0 turns off compression completely. Both algorithm and level can be set an the same time using
The recommended algorithm for general purpose analysis data can be set with:
Note that different objects in a ROOT file might have been written with different compression settings. Even different branches or different baskets in a TTree might be using different settings.
ROOT supports writing objects to XML files instead of ROOT files. While XML files are generally inappropriate for storing data (e.g. worse I/O performance, larger size, no compression), they can be opened with a normal text editor.
Therefore XML files should only be used for small amounts of data, typically histogram files, images, geometries, calibrations. XML files use the same streaming technology as regular ROOT files: any class with a dictionary can be stored in XML format. Contrary to ROOT files, XML files do not support subdirectories or trees.
To create an XML file, specify a filename with
.xml extension when calling TFile::Open().
Storing the class data layout
ROOT files store data members’ values together with some related metadata, e.g. their names and types. This allows ROOT to find discrepancies between the class layout in memory at the time of writing and at the time of reading, if the class definition changes over time (enabling schema evolution). It also allows ROOT to read data of classes for which no dictionary is available - potentially even when the corresponding library has not been loaded.
ROOT’s reflection library (
TClass) provides the name and type information, which is written to ROOT files in the form of
TStreamerInfo objects, describing a class’s members and types.
TStreamerInfo objects for all classes written to a file are accessible through
These class description objects are versioned, as different generations of the same type might be written to different files, which in the end are merged, resulting in a file with multiple versions for the same type.
As long as a ROOT file that contains it has been opened, a class’s
TStreamerInfo for a given version can be retrieved through
It contains entries for each data member and base class, in the form of
They can be accessed through
TFile::MakeProject() can use the information from a
TStreamerInfo to construct a C++ header which contains the class data members and their types, but no member functions.
This allows to create libraries of compiled objects simply from a data file, even if the original library is not available.
Abstraction of I/O operations on collections: collection proxy
Instead of implementing dedicated streaming functions for
std::list, etc., as well as ROOT’s collection types, ROOT implements an abstraction layer for the required I/O functionality, such as creation, insertion, and clearing.
They give access to collection data from disk, no matter what the original collection type was, and whether or not a dictionary for that collection (and its specific template specialization) exists.
These can be adapted to custom collections.
The abstract interface (“protocol”) to implement is
TVirtualCollectionProxy; a concrete example is
For a given class, the collection proxy can be queried and set with
Dealing with changes in class layouts: schema evolution
With long-lived data, changes in the data layout become a concern. When a class layout (i.e. data member names, their types, order, etc.) changes, existing persistent data may no longer correspond to the foreseen target of the read operation: the in-memory layout of the latest version of a class definition might now differ from the persisted layout. “Schema evolution” is ROOT’s solution to this problem: in the case of a mismatch between the in-memory version and the persistent version of a class, ROOT maps the data in the file to the new layout of the object in memory.
ROOT supports two types of schema evolution: automatic schema evolution, which deals with changes in the class definition (e.g., reorder of data members, changes in their types, etc.), and “I/O rules” which allow for fine-tuned manual schema evolution.
Automatic schema evolution
Automatic schema evolution supports the following scenarios:
- Change in the order of data members in the class.
- Addition of a data member: the value of the missing member will be left unchanged by the I/O (so usually the value set by the default constructor).
- Removal of a data member: the corresponding data is not read.
- Move of a data member from a derived class to a base class or vice-versa.
- Change of the type of a member if it is a simple type or a pointer to a simple type, including
Float16_t. A warning is given in case of loss of precision.
- Addition or removal of a base class.
- Change of a member type from
- Change of a member type from
- Change of a member type from C-style array (such as
int) to its
std::arraycounterpart (such as
- Change from variable-size array and size (such as
float *fArray; //[fSize]and
int fSize) to
- Change between STL collection types, from / to
- Change of STL associative containers, from / to
All transformations above are applied transparently with no intervention required on the part of user: ROOT will automatically recognize these cases and apply the relevant rules.
Here is an example of the class layout changes that automatic schema evolution supports:
Manual schema evolution: user-defined I/O customization rules
The automatic schema evolution described above allows reading back the serialized data object if the definition of the classes representing these objects changed in one of the supported ways. It is also possible to manually set rules for arbitrary data transformations upon reading the classes.
ROOT provides two interfaces for users to define the conversion rules. The recommended way is to add a rule to the dictionary file by specifying it in the corresponding linkdef file. Alternatively, rules can be inserted into the TClass object using its C++ API.
Specifying I/O customization rules in a linkdef file
I/O customization rules can be part of the generated dictionary for a class. These rules are specified through a linkdef file. The syntax of the rules is as follows:
The arguments in the rules have the following meaning:
sourceClass(mandatory): The name of the persisted class used as input for the rule.
source(mandatory): A semicolon-separated list of data member declarations defining the data members of the source class that the rule needs to access.
version: A comma-separated list of versions or version ranges of the source class. The list has to be enclosed in square brackets. This rule is only applied to input classes matching any of these versions. One of
versionmust be present. The version is an integer number, whereas the version range is one of the following:
a-b: all the version numbers between and including
-a: all the version numbers
a-: all the version numbers
checksum: A comma-separated list of checksums of the source class that that this rule is applied to. The list has to be enclosed in square brackets. One of
versionmust be present.
targetClass(mandatory): Defines the name of the in-memory class that this rule is applied to.
target(mandatory): A semicolon-separated list of target class data member names that this rule is potentially updating.
true(the default), the rule is written to the output file if an object of this class is serialized.
include: A comma-separated list of header files that need to be included for the code snippet.
code: The C++ code snippet implementing the rule’s actions.
The C++ code snippet has access to the following pre-defined variables:
newObj: variable pointing to the target in-memory object.
oldObj: a variable of type TVirtualObject , behaving as a pointer to the source object.
- variables representing the data members of the target object declared in the
targetproperty of the rule.
onfile.variable_name: variables declared in the source property of the rule
Specifying I/O customization rules through the C++ API
The schema evolution C++ API consists of the following classes:
- TSchemaRuleSet: objects of this type manage the sets of rules and ensure their consistency. There can be no conflicting rules in the rule sets. The rule sets are owned by the
objects corresponding to the target classes defined in the rules and can be accessed using
- TSchemaRule: it represent the rules and their fields have exactly the same meaning as the ones of rules specified in the dictionaries (see above).
Schema evolution with custom streamers
If you have written your own
Streamer as described in Custom streamers, you will have to manually add code for each version and manage the evolution of your class. When you add or remove data members, you must modify the
Streamer by hand. ROOT assumes that you have increased the class version number in the
ClassDef statement and introduced the relevant test in the read part of the Streamer. For example, if a new version of the
Event class above includes a new member:
Int_t fNew the
ClassDef statement should be changed to
ClassDef(Event,2) and the following lines should be added to the read part of the
If, in the same new version 2 you remove the member
fH, you must add
the following code to read the histogram object into some temporary
object and delete it:
Our experience with manual schema evolution shows that it is easy to
make mistakes and mismatches between
Streamer writers and readers are frequent
and increase as the number of classes increase. We recommend you use
rootcling to automatically generate dictionaries for your classes and profit from the automatic schema evolution.