23 " rootreadspeed --files fname1 [fname2 ...]\n"
24 " --trees tname1 [tname2 ...]\n"
25 " (--all-branches | --branches bname1 [bname2 ...] | --branches-regex bregex1 "
27 " [--threads nthreads]\n"
28 " [--tasks-per-worker ntasks]\n"
29 " rootreadspeed (--help|-h)\n"
31 " Use -h for usage help, --help for detailed information.\n";
35 " Specifying files and trees:\n"
36 " --files fname1 [fname2...]\n"
37 " The list of root files to read from.\n"
39 " --trees tname1 [tname2...]\n"
40 " The list of trees to read from the files. If only one tree is provided then it will"
41 " be used for all files. If multiple trees are specified, each tree is read from the"
45 " Specifying branches:\n"
46 " Branches can be specified using one of the following flags. Currently only one can be used"
49 " Reads every branch from the specified files and trees."
51 " --branches bname1 [bname2...]\n"
52 " Reads the branches with matching names. Will error if any of the branches are not found."
54 " --branches-regex bregex1 [bregex2 ...]\n"
55 " Reads any branches with a name matching the provided regex. Will error if any provided"
56 " regex does not match at least one branch."
60 " --threads nthreads\n"
61 " The number of threads to use for file reading. Will automatically cap to the number of"
62 " available threads on the machine."
64 " --tasks-per-worker ntasks\n"
65 " The number of tasks to generate for each worker thread when using multithreading.";
69 " rootreadspeed is a tool used to help identify bottlenecks in root analysis programs"
70 " by providing an idea of what throughput you can expect when reading ROOT files in"
71 " certain configurations."
73 " It does this by providing information about the number of bytes read from your files,"
74 " how long this takes, and the different throughputs in MB/s, both in total and per thread."
77 "Compressed vs Uncompressed Throughput:\n"
78 " Throughput speeds are provided as compressed and uncompressed - ROOT files are usually"
79 " saved in compressed format, so these will often differ. Compressed bytes is the total"
80 " number of bytes read from TFiles during the readspeed test (possibly including meta-data)."
81 " Uncompressed bytes is the number of bytes processed by reading the branch values in the TTree."
82 " Throughput is calculated as the total number of bytes over the total runtime (including"
83 " decompression time) in the uncompressed and compressed cases."
86 "Interpreting results:\n"
88 " There are three possible scenarios when using rootreadspeed, namely:"
90 " -The 'Real Time' is significantly lower than your own analysis runtime."
91 " This would imply your actual application code is dominating the runtime of your analysis,"
92 " ie. your analysis logic or framework is taking up the time."
94 " The best way to decrease the runtime would be to optimize your code, attempt to parallelize"
95 " it onto multiple threads if possible, or use a machine with a more performant CPU."
96 " The best way to decrease the runtime would be to optimize your code (or the framework's),"
97 " parallelize it onto multiple threads if possible (for example with"
98 " RDataFrame and EnableImplicitMT) or switch to a machine with a more performant CPU."
101 " -The 'Real Time' is significantly higher than 'CPU Time / number of threads'*."
102 " If the real time is higher than the CPU time per core it implies the reading of data is the"
103 " bottleneck, as the CPU cores are wasting time waiting for data to arrive from your disk/drive"
104 " or network connection in order to decompress it."
106 " The best way to decrease your runtime would be transferring the data you need onto a faster"
107 " storage medium (ie. a faster disk/drive such as an SSD, or connecting to a faster network"
108 " for remote file access), or to use a compression algorithm with a higher compression ratio,"
109 " possibly at the cost of the decompression rate."
111 " Changing the number of threads is unlikely to help, and in fact using too many threads may"
112 " degrade performance if they make requests to different regions of your local storage. "
114 " * If no '--threads' argument was provided this is 1, otherwise it is the minimum of the value"
115 " provided and the number of threads your CPU can run in parallel. It is worth noting that -"
116 " on shared systems or if running other heavy applications - the number of your own threads"
117 " running at any time may be lower than the limit due to demand on the CPU."
120 " -The 'Real Time' is similar to 'CPU Time / number of threads'"
121 " -AND 'Compressed Throughput' is lower than expected for your storage medium:"
122 " This would imply that your CPU threads aren't decompressing data as fast as your storage medium"
123 " can provide it, and so decompression is the bottleneck."
125 " The best way to decrease your runtime would be to utilise a system with a faster CPU, or make use"
126 " use of more threads when running, or use a compression algorithm with a higher decompression rate"
127 " such as LZ4, possibly at the cost of some extra file size."
130 "A note on caching:\n"
131 " If your data is stored on a local disk, the system may cache some/all of the file in memory after it is"
132 " first read. If this is realistic of how your analysis will run - then there is no concern. However, if"
133 " you expect to only read files once in a while - and as such the files are unlikely to be in the cache -"
134 " consider clearing the cache before running rootreadspeed."
135 " On Linux this can be done by running 'echo 3 > /proc/sys/vm/drop_caches' as a superuser"
136 " or a specific file can be dropped from the cache with"
137 " `dd of=<FILENAME> oflag=nocache conv=notrunc,fdatasync count=0 > /dev/null 2>&1`."
140 " Known overhead of TTreeReader, RDataFrame:\n"
141 " `rootreadspeed` is designed to read all data present in the specified branches, trees and files at the highest "
142 " possible speed. When the application bottleneck is not in the computations performed by analysis logic, higher-level "
143 " interfaces built on top of TTree such as TTreeReader and RDataFrame are known to add a significant runtime overhead "
144 " with respect to the runtimes reported by `rootreadspeed` (up to a factor 2). In realistic analysis applications it has "
145 " been observed that a large part of that overhead is compensated by the ability of TTreeReader and RDataFrame to read "
146 " branch values selectively, based on event cuts, and this overhead will be reduced significantly when using RDataFrame "
147 " in conjunction with RNTuple.";
151 std::cout <<
"Thread pool size:\t\t" <<
r.fThreadPoolSize <<
'\n';
153 if (
r.fMTSetupRealTime > 0.) {
154 std::cout <<
"Real time to setup MT run:\t" <<
r.fMTSetupRealTime <<
" s\n";
155 std::cout <<
"CPU time to setup MT run:\t" <<
r.fMTSetupCpuTime <<
" s\n";
158 std::cout <<
"Real time:\t\t\t" <<
r.fRealTime <<
" s\n";
159 std::cout <<
"CPU time:\t\t\t" <<
r.fCpuTime <<
" s\n";
161 std::cout <<
"Uncompressed data read:\t\t" <<
r.fUncompressedBytesRead <<
" bytes\n";
162 std::cout <<
"Compressed data read:\t\t" <<
r.fCompressedBytesRead <<
" bytes\n";
164 const unsigned int effectiveThreads = std::max(
r.fThreadPoolSize, 1u);
166 std::cout <<
"Uncompressed throughput:\t" <<
r.fUncompressedBytesRead /
r.fRealTime / 1024 / 1024 <<
" MB/s\n";
167 std::cout <<
"\t\t\t\t" <<
r.fUncompressedBytesRead /
r.fRealTime / 1024 / 1024 / effectiveThreads
168 <<
" MB/s/thread for " << effectiveThreads <<
" threads\n";
169 std::cout <<
"Compressed throughput:\t\t" <<
r.fCompressedBytesRead /
r.fRealTime / 1024 / 1024 <<
" MB/s\n";
170 std::cout <<
"\t\t\t\t" <<
r.fCompressedBytesRead /
r.fRealTime / 1024 / 1024 / effectiveThreads
171 <<
" MB/s/thread for " << effectiveThreads <<
" threads\n\n";
173 const float cpuEfficiency = (
r.fCpuTime / effectiveThreads) /
r.fRealTime;
175 std::cout <<
"CPU Efficiency: \t\t" << (cpuEfficiency * 100) <<
"%\n";
176 std::cout <<
"Reading data is ";
177 if (cpuEfficiency > 0.80f) {
178 std::cout <<
"likely CPU bound (decompression).\n";
179 }
else if (cpuEfficiency < 0.50f) {
180 std::cout <<
"likely I/O bound.\n";
182 std::cout <<
"likely balanced (more threads may help though).\n";
184 std::cout <<
"For details run with the --help command.\n";
190 const auto argsProvided = args.size() >= 2;
191 const auto helpUsed = argsProvided && (args[1] ==
"--help" || args[1] ==
"-h");
192 const auto longHelpUsed = argsProvided && args[1] ==
"--help";
194 if (!argsProvided || helpUsed) {
200 std::cout << std::endl;
206 unsigned int nThreads = 0;
208 enum class EArgState {
kNone, kTrees, kFiles, kBranches, kThreads, kTasksPerWorkerHint } argState = EArgState::kNone;
209 enum class EBranchState {
kNone, kRegular, kRegex, kAll } branchState = EBranchState::kNone;
210 const auto branchOptionsErrMsg =
211 "Options --all-branches, --branches, and --branches-regex are mutually exclusive. You can use only one.\n";
213 for (
size_t i = 1; i < args.size(); ++i) {
214 const auto &arg = args[i];
216 if (arg ==
"--trees") {
217 argState = EArgState::kTrees;
218 }
else if (arg ==
"--files") {
219 argState = EArgState::kFiles;
220 }
else if (arg ==
"--all-branches") {
221 argState = EArgState::kNone;
222 if (branchState != EBranchState::kNone && branchState != EBranchState::kAll) {
223 std::cerr << branchOptionsErrMsg;
226 branchState = EBranchState::kAll;
228 d.fBranchNames = {
".*"};
229 }
else if (arg ==
"--branches") {
230 argState = EArgState::kBranches;
231 if (branchState != EBranchState::kNone && branchState != EBranchState::kRegular) {
232 std::cerr << branchOptionsErrMsg;
235 branchState = EBranchState::kRegular;
236 }
else if (arg ==
"--branches-regex") {
237 argState = EArgState::kBranches;
238 if (branchState != EBranchState::kNone && branchState != EBranchState::kRegex) {
239 std::cerr << branchOptionsErrMsg;
242 branchState = EBranchState::kRegex;
244 }
else if (arg ==
"--threads") {
245 argState = EArgState::kThreads;
246 }
else if (arg ==
"--tasks-per-worker") {
247 argState = EArgState::kTasksPerWorkerHint;
248 }
else if (arg[0] ==
'-') {
249 std::cerr <<
"Unrecognized option '" << arg <<
"'\n";
253 case EArgState::kTrees:
d.fTreeNames.emplace_back(arg);
break;
254 case EArgState::kFiles:
d.fFileNames.emplace_back(arg);
break;
255 case EArgState::kBranches:
d.fBranchNames.emplace_back(arg);
break;
256 case EArgState::kThreads:
257 nThreads = std::stoi(arg);
258 argState = EArgState::kNone;
260 case EArgState::kTasksPerWorkerHint:
263 argState = EArgState::kNone;
265 std::cerr <<
"ROOT was built without implicit multi-threading (IMT) support. The --tasks-per-worker option "
266 "will be ignored.\n";
269 default: std::cerr <<
"Unrecognized option '" << arg <<
"'\n";
return {};
274 return Args{std::move(
d), nThreads, branchState == EBranchState::kAll,
true};
279 std::vector<std::string> args;
282 for (
int i = 0; i < argc; ++i) {
283 args.emplace_back(argv[i]);
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t r
static void SetTasksPerWorkerHint(unsigned int m)
Set the hint for the desired number of tasks created per worker.
void PrintThroughput(const Result &r)
Args ParseArgs(const std::vector< std::string > &args)
bool fUseRegex
If the branch names should use regex matching.