A ROOTIO file consists of one "file header", one or more "data records," and zero or more "free segments". The file header is always at the beginning of the file, while the data records and free segments may in principle appear in any order.
The file header is fixed length (64 bytes in the current release.) It's detailed format is given in File header format.
A free segment is of variable length. One free segment is a set of contiguous bytes that are unused, and are available for ROOTIO to use for new or resized data records. The first four bytes of a a free segment contain the negative of the number of bytes in the segment. The contents of the remainder of the free segment are irrelevant.
A data record represents either user data or data used internally by ROOTIO. All data records have two portions, a "key" portion and a "data" portion. The key portion precedes the data portion. The format of the key portion is the same for all data. (The key portion corresponds to a class TKey object). The object name and they key cycle are together sufficient to uniquely determine the record within the file. The Format of a class object in DATA page describes the format of the data portion of a record for an object that uses the default streamer.
There are several types of data records used internally by ROOTIO to support the storage of byte sequences. These record types are TFile, TDirectory, "KeysList", and "FreeSegments". These types can be considered to be in the "core" layer of ROOTIO.
A file always contains exactly one TFile data record, which (nearly?) always immediately follows the file header. The TFile record consists of either data pertaining to the file as a whole, or data pertaining to the root "directory" of records in the file. Its detailed format is given in Format of the root (first) directory record.
A file contains zero or more TDirectory data records, each representing a subdirectory in the directory tree that has the TFile record at its root. The detailed format is given in Format of a TDirectory record.
A file contains one or more "KeysList" data records. There is one corresponding to the root directory (represented by the TFile record), and one corresponding to each (non-empty) subdirectory in the tree (each represented by a TDirectory record). The data portion of each KeysList record consists of the sequential keys of those data records in that directory. The detailed format is given in Format of KeysList record. Note that keys for TFile, "KeysList", "FreeSegments", and "StreamerInfo" data records never appear in the data portion of a KeysList data record.
A file always contains exactly one "FreeSegments" data record, which keeps track of the free segments in the file. Its detailed format is given in Format of FreeSegments record. Note that the list of free segments contains one additional free segment that is not in the file itself, because it represents the free space after the end of the file.
There is an additional data record type ("StreamerInfo") needed internally to support the storage of self-identifying objects. Its detailed format is given in Format of StreamerInfo record. Note that the StreamerInfo data record itself and the "core" data records described above are not self-identifying objects. A ROOTIO file contains exactly one StreamerInfo record. The use of the "StreamerInfo" record is described under the StreamerInfo heading below.
There are three object types ("TProcessID", "TRef", and "TRefArray") used internally to support pointers to persistent objects. Their formats are given in Format of TProcessID record, Format of the DATA for a TRef object, and Format of the DATA for a TRefArray object respectively. Of these three objects, only TProcessID objects necessarily comprise a complete data record (a "TProcessID" record). TRef and TRefArray objects typically are data members of larger objects, and therefore are only a part of the data portion of a record. In addition, objects that are referenced by such a pointer have an additional field in the base TObject. See Format of the DATA for a TObject object. A description of how these pointers work is given under the Pointers to persistent objects heading below.
These are either user defined record types, or record types supplied by ROOT that are not needed by ROOTIO. The format of such an object that uses the default streamer is shown in Format of a class object in DATA.
The user can set the data compression level for new or modified data records when creating or opening a file. When an existing file is opened for update, the compression level selected need not match that used previously. The compression level of existing records is not modified unless the record itself is modified.
There are ten compression levels, 0-9, ranging from 0 (no compression) to 9 (maximum compression), with level 1 being the default. The level chosen is a tradeoff between disk space and compression performance. The decompression speed is independent of level. Currently, in release 3.2.6, level 2 is not used. If level 2 is selected, level 1 is used with no notification to the user.
The chosen compression level is not applied to the entire file. The following portions of the file are not compressed, regardless of the compression level selected:
Furthermore, the data portion of the StreamerInfo data record is always compressed at level 1 (if over 256 bytes uncompressed), regardless of the compression level selected (even if no compression is selected).
The compression algorithm used is an in memory ZIP compression written for the DELPHI collaboration at CERN. Its author is E. Chernyaev (IHEP/Protvino). The source code is internal to ROOTIO.
The "StreamerInfo" data record is used by ROOTIO to support the storage of self-identifying objects. Its detailed format is given in Format of StreamerInfo record. A ROOTIO file contains exactly one StreamerInfo record, which is written to disk automatically when a new or modified file is closed.
The StreamerInfo record is a list (ROOTIO class TList) of "StreamerInfo" objects (ROOTIO class TStreamerInfo). There is one StreamerInfo object in the list for every class used in the file in a data record, other than a core layer record. There is no streamerinfo object for a class used in a core layer record unless the class is also used elsewhere in a data record. When reading a self-identifying object from a file, the system uses the StreamerInfo list to decompose the object recursively into its simple data members.
Each streamerinfo object is an array of "streamer element" objects, each of which describes a base class of the object or a (non-static and non-transient) data member of the object. If the base class or data member is itself a class, then there will also be a streamerinfo object in the record for that class. In this way, each class is recursively decomposed into its atomic elements, each of which is a simple type (e.g. "int"). A "long" or "unsigned long" member is always written as an 8 byte quantity, even if it occupies only 4 bytes in memory.
A data member of a class is marked transient on the line of its declaration by a comment beginning with "//!". Such members are not written to disk, nor is there any streamerinfo for such a member.
A data member that is a C++ pointer (not to be confused with "pointers to persistent objects" described below) is never written to disk as a pointer value. If it is a pointer to an object, the object itself (or 0 (4 bytes) if the pointer value is NULL) is written. If the declaration line has a comment beginning with "//->", this indicates that the pointer value will never be null, which allows a performance optimization. Another optimization is that if two or more pointers pointing to the same object are streamed in the same I/O operation, the object is written only once. The remaining pointers reference the object through a unique object identifier. This saves space and avoids the infinite loop that might otherwise arise if the directed graph of object instance pointer references contains a cycle.
If a data member is a pointer to a simple type, the Streamer presumes it is an array, with the dimension defined in a comment of the form "//[<length>]", where length is either an integer constant or a variable that is an integer data member of the class. If a variable is used, it must be defined ahead of its use or in a base class.
The above describes the function of the StreamerInfo record in decomposing a self-identifying object if the user uses the streamer generated by "rootcint". There are two reasons why a user may need to write a specialized streamer for a class. One reason is that it may be necessary to execute some code before or after data is read or written, for example, to initialize some non-persistent data members after the persistent data is read. In this case, the custom streamer can use the StreamerInfo record to decompose a self-identifying object in the exact same manner as the generated streamer would have done. An example is given (for the Event class) in the Root User's Guide (URL below) (Input/Output chapter, Streamers subchapter). On the other hand, if the user needs to write a streamer for a class that ROOT cannot handle, the user may need to explicitly code the decomposition and composition of the object to its members. In this case, the StreamerInfo for that class might not be used. In any case, if the composition/decomposition of the class is explicitly coded, the user should include the byte count, class information, and version number of the class before the data on disk as shown in Format of a class object in DATA.
The special method used for streaming a TClonesArray is described in the TClonesArray section below.
More information on the StreamerInfo record and its use is found in the Input/Output chapter of the Root Manual
NOTE: Some of the classes used internally in ROOTIO (e.g. TObject, TRef, TRefArray) have explicitly coded (de)compositions, and do not use the information in the StreamerInfo record to do the (de)composition. In this case, the StreamerInfo for the class may still be present in the StreamerInfo record, but may not match what is actually written to disk for those objects.
Information on how these work in memory can be found at: https://root.cern/root/html303/examples/Version302.news.html These were introduced in release 3.02, so there is not yet a description in the current Root Users Guide, which is for a version release 3.1. Here we discuss only the information on disk.
A ROOT file contains zero or more TProcessID records. Each such record contains a globally unique ID defining a given ROOT job that wrote a referenced object (see Format of TProcessID record). Each referenced object contains a "pidf" field referencing the corresponding TProcessID record and an "fUniqueID" field uniquely identifying the referenced object among those written by that process (see Format of the DATA for a TObject object). Similarly, every persistent reference to that object (a TRef Object, see Format of the DATA for a TRef object) also contains "pidf" and "fUniqueID" fields with the same value, thereby uniquely determining the referenced object (which need not even be in the same file). In the case of an array of references (a TRefArray object, see Format of the DATA for a TRefArray object), there is one "pidf" value for the entire array, and a separate "fUniqueID" value for each reference. For further information, see the above URL.
The TObjArray class can be used to support an array of objects. The objects need not be of the same type, but each object must be of a class type that inherits from TObject. We have already seen a specific example of the use of TObjArray, in the StreamerInfo record, where it is used to hold an array of TStreamerElement objects, each of which is of a class inheriting from TStreamerElement, which in turn inherits from TObject.
The TClonesArray class is a specialization of the TObjArray class for holding an array of objects that are all of the same type. The format of a TClonesArray object is given in Format of the DATA for a TClonesArray object.
There are two great advantages in the use of TClonesArray over TObjArray when the objects all will be of the same class:
A TTree is a highly specialized container class for efficient storage and retrieval of user data. The use of TTrees is discussed in detail in the Trees chapter of the Root Manual
Here we discuss in particular how a TTree is stored in a ROOTIO file.
A TTree object is split into one or more branches (class TBranch), each of which may have its own (sub)branches, recursively to any depth. Each TBranch contains an array of zero or more leaves (class TLeaf), each corresponding to a basic variable type or a class object that has not been split. The TLeaf object does not actually contain variable values, only information about the variables. The actual data on each branch is physically stored in basket objects (class TBasket). The user can set the basket size on a per TBranch basis. The default basket size is 32000 bytes. This should be viewed as an approximate number.
There is one TTree data record per file for each tree in the file, corresponding to a TTree class object. The TTree class object recursively contains TBranch objects, each of which contains an array of TBasket objects to hold its data.
However, the TTree data record does not necessarily contain the entire TTree object. For each branch, exactly one TBasket object is contained in the TTree data record. If the data on a given branch fits in one basket, then all the data for that branch will be in the TTree record itself. Otherwise, there will be a separate TBasket data record for each additional basket used on the branch, each containing a TBasket object containing user data.
By default, the additional TBasket data records are stored in the same file as that of the corresponding TTree data record. However, the user may specify a separate file for a given branch. If the data for that branch fits into one basket, this option has no effect. Otherwise, the additional TBasket records are written into the specified file, rather than the file containing the TTree data record itself. In this case, a TBranch data record for the specified branch is also written to the specified file, containing the TBranch object for the specified branch.
Streamer information for TTree related classes shows the streamer information for the TTree, TBranch, TLeaf, and some related classes, together with some additional commentary. For writing to a ROOTIO file, the streamers for these three classes act exactly as those of default generated streamers, except that, if the user has specified a separate file for a branch, the TBranch streamer also writes the TBranch object as a keyed data record to the specified file.
There is no streamer information for the TBasket class. The custom written TBasket streamer internally handles the packing of data into fixed size TBasket objects.