Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
hadd.cxx
Go to the documentation of this file.
1/**
2 \file hadd.cxx
3 \brief This program will merge compatible ROOT objects, such as histograms, Trees and RNTuples,
4 from a list of root files and write them to a target root file.
5 In order for a ROOT object to be mergeable, it must implement the Merge() function.
6 Non-mergeable objects will have all instances copied as-is into the target file.
7 The target file must not be identical to one of the source files.
8
9 Syntax:
10 ```{.cpp}
11 hadd [flags] targetfile source1 source2 ... [flags]
12 ```
13
14 Flags can be passed before or after the positional arguments.
15 The first positional (non-flag) argument will be interpreted as the targetfile.
16 After that, the first sequence of positional arguments will be interpreted as the input files.
17 If two sequences of positional arguments are separated by flags, hadd will emit an error and abort.
18
19 By default, any argument starting with `-` is interpreted as a flag. If you want to pass filenames
20 starting with `-` you need to pass them after `--`:
21 ```{.cpp}
22 hadd [flags] -- -file1 -file2 ...
23 ```
24 Note that in this case you need to pass ALL positional arguments after `--`.
25
26 If a flag requires an argument, the argument can be specified in any of these ways:
27
28 # All equally valid:
29 -j 16
30 -j16
31 -j=16
32
33 The first syntax is the preferred one since it's backward-compatible with previous versions of hadd.
34 The -f flag is an exception to this rule: it only supports the `-f[0-9]` syntax.
35
36 Note that merging multiple flags is NOT supported: `-jfa` will be interpreted as -j=fa, which is invalid!
37
38 The flags are as follows:
39
40 \param -a Append to the output
41 \param -cachesize <SIZE> Resize the prefetching cache used to speed up I/O operations (use 0 to disable).
42 \param -d <DIR> Carry out the partial multiprocess execution in the specified directory
43 \param -dbg Enable verbosity. If -j was specified, do not not delete partial files
44 stored inside working directory.
45 \param -experimental-io-features <FEATURES> Enables the corresponding experimental feature for output trees.
46 \see ROOT::Experimental::EIOFeatures
47 \param -f Force overwriting of output file.
48 \param -f[0-9] Set target compression algorithm `i` and level `j` passing the number `i*100 + j`, e.g. `-f505`.
49 The last digit (`j`) can be set from 0 = uncompressed to 9 = highly compressed.
50 The first digit (`i`) is 1 for ZLIB, 2 for LZMA, 4 for LZ4 and 5 for ZSTD.
51 Recommended numbers are 101 (ZLIB), 207 (LZMA), 404 (LZ4), 505 (ZSTD),
52 The default value for this flag is 101 (kDefaultZLIB).
53 See ROOT::RCompressionSetting and TFile::TFile documentation for more details.
54 \param -fk Sets the target file to contain the baskets with the same compression as the input files
55 (unless -O is specified). Compresses the meta data using the compression level specified
56 in the first input or the compression setting after fk (for example 505 when using -fk505)
57 \param -ff The compression level used is the one specified in the first input
58 \param -j [N_JOBS] Parallelise the execution in `N_JOBS` processes. If the number of processes is not specified,
59 or is 0, use the system maximum.
60 \param -k Skip corrupt or non-existent files, do not exit
61 \param -L <FILE> Read the list of objects from FILE and either only merge or skip those objects depending on
62 the value of "-Ltype". FILE must contain one object name per line, which cannot contain
63 whitespaces or '/'. You can also pass TDirectory names, which apply to the entire directory
64 content. Lines beginning with '#' are ignored. If this flag is passed, "-Ltype" MUST be
65 passed as well.
66 \param -Ltype <SkipListed|OnlyListed> Sets the type of operation performed on the objects listed in FILE given with
67 the
68 "-L" flag. "SkipListed" will skip all the listed objects; "OnlyListed" will only merge those
69 objects. If this flag is passed, "-L" must be passed as well.
70 \param -n <N_FILES> Open at most `N` files at once (use 0 to request to use the system maximum - which is also
71 the default). This number includes both the input reading files as well as the output file.
72 Thus, if set to 1, it will be automatically replaced to a minimum of 2. If set to a too large
73 value, it will be clipped to the system maximum.
74 \param -O Re-optimize basket size when merging TTree
75 \param -T Do not merge Trees
76 \param -v [LEVEL] Explicitly set the verbosity level:
77 <= 0 = only output errors;
78 1 = only output errors and warnings;
79 2 = output minimal informative messages, errors and warnings (default);
80 >= 3 = output all messages.
81 \return hadd returns a status code: 0 if OK, 1 otherwise
82
83 For example assume 3 files f1, f2, f3 containing histograms hn and Trees Tn
84 - f1 with h1 h2 h3 T1
85 - f2 with h1 h4 T1 T2
86 - f3 with h5
87 the result of
88 ```
89 hadd -f x.root f1.root f2.root f3.root
90 ```
91 will be a file x.root with h1 h2 h3 h4 h5 T1 T2
92 where
93 - h1 will be the sum of the 2 histograms in f1 and f2
94 - T1 will be the merge of the Trees in f1 and f2
95
96 The files may contain sub-directories.
97
98 If the source files contains histograms and Trees, one can skip
99 the Trees with
100 ```
101 hadd -T targetfile source1 source2 ...
102 ```
103
104 Wildcarding and indirect files are also supported
105 ```
106 hadd result.root myfil*.root
107 ```
108 will merge all files in myfil*.root
109 ```
110 hadd result.root file1.root @list.txt file2. root myfil*.root
111 ```
112 will merge file1.root, file2.root, all files in myfil*.root
113 and all files in the indirect text file list.txt ("@" as the first
114 character of the file indicates an indirect file. An indirect file
115 is a text file containing a list of other files, including other
116 indirect files, one line per file).
117
118 If the sources and and target compression levels are identical (default),
119 the program uses the TChain::Merge function with option "fast", ie
120 the merge will be done without unzipping or unstreaming the baskets
121 (i.e. direct copy of the raw byte on disk). The "fast" mode is typically
122 5 times faster than the mode unzipping and unstreaming the baskets.
123
124 If the option -cachesize is used, hadd will resize (or disable if 0) the
125 prefetching cache use to speed up I/O operations.
126
127 For options that take a size as argument, a decimal number of bytes is expected.
128 If the number ends with a `k`, `m`, `g`, etc., the number is multiplied
129 by 1000 (1K), 1000000 (1MB), 1000000000 (1G), etc.
130 If this prefix is followed by `i`, the number is multiplied by the traditional
131 1024 (1KiB), 1048576 (1MiB), 1073741824 (1GiB), etc.
132 The prefix can be optionally followed by B whose casing is ignored,
133 eg. 1k, 1K, 1Kb and 1KB are the same.
134
135 \note By default histograms are added. However hadd does not support the case where
136 histograms have their bit TH1::kIsAverage set.
137
138 \authors Rene Brun, Dirk Geppert, Sven A. Schmidt, Toby Burnett
139*/
140#include "Compression.h"
141#include "TClass.h"
142#include "TFile.h"
143#include "TFileMerger.h"
144#include "THashList.h"
145#include "TKey.h"
146#include "TSystem.h"
147#include "TUUID.h"
148
149#include <ROOT/RConfig.hxx>
150#include <ROOT/StringConv.hxx>
151#include <ROOT/TIOFeatures.hxx>
152
153#include "haddCommandLineOptionsHelp.h"
154#include "logging.hxx"
155
156#include <climits>
157#include <cstdlib>
158#include <filesystem>
159#include <fstream>
160#include <iostream>
161#include <optional>
162#include <sstream>
163#include <string>
164#include <streambuf>
165
166#ifndef R__WIN32
168#endif
169
170////////////////////////////////////////////////////////////////////////////////
171
172// NOTE: TFileMerger will use PrintLevel = GetLogVerbosity() - 1. If PrintLevel is < 1, it will print nothing, otherwise
173// it will print everything. To give some granularity to hadd, we do the following:
174// LogVerbosity = 0: only print hadd errors
175// LogVerbosity = 1: only print hadd errors + warnings
176// LogVerbosity = 2: print hadd errors + warnings and TFileMerger messages
177// LogVerbosity > 2: print all hadd and TFileMerger messages.
178static constexpr int kDefaultHaddVerbosity = 2;
179
180using IntFlag_t = uint32_t;
181
182struct HAddArgs {
185 bool fForce;
188 bool fDebug;
191 bool fHelp;
192
193 std::optional<std::string> fWorkingDir;
194 std::optional<IntFlag_t> fNProcesses;
195 std::optional<std::string> fObjectFilterFile;
196 std::optional<Int_t> fObjectFilterType;
197 std::optional<TString> fCacheSize;
198 std::optional<ROOT::TIOFeatures> fFeatures;
199 std::optional<IntFlag_t> fMaxOpenedFiles;
200 std::optional<IntFlag_t> fVerbosity;
201 std::optional<IntFlag_t> fCompressionSettings;
202
205 // This is set to true if and only if the user passed `--`. In this special
206 // case, we must not stop parsing positional arguments even if we find one
207 // that starts with a `-`.
209};
210
212
213static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
214{
215 const auto argLen = strlen(arg);
216 const auto flagLen = strlen(flagStr);
217 if (argLen == flagLen && strncmp(arg, flagStr, flagLen) == 0) {
218 if (flagOut)
219 Warn() << "duplicate flag: " << flagStr << "\n";
220 flagOut = true;
222 }
224}
225
226// NOTE: not using std::stoi or similar because they have bad error checking.
227// std::stoi will happily parse "120notvalid" as 120.
228static std::optional<IntFlag_t> StrToUInt(const char *str)
229{
230 if (!str)
231 return {};
232
233 uint32_t res = 0;
234 do {
235 if (!isdigit(*str))
236 return {};
237 if (res * 10 < res) // overflow is an error
238 return {};
239 res *= 10;
240 res += *str - '0';
241 } while (*++str);
242
243 return res;
244}
245
246template <typename T>
251
252template <typename T>
253static FlagConvResult<T> ConvertArg(const char *);
254
255template <>
257{
258 return {arg, EFlagResult::kParsed};
259}
260
261template <>
263{
264 // Don't even try to parse arg if it doesn't look like a number.
265 if (!isdigit(*arg))
266 return {0, EFlagResult::kIgnored};
267
268 auto intOpt = StrToUInt(arg);
269 if (intOpt)
270 return {*intOpt, EFlagResult::kParsed};
271
272 Err() << "error parsing integer argument '" << arg << "'\n";
273 return {0, EFlagResult::kErr};
274}
275
276template <>
278{
280 std::stringstream ss;
281 ss.str(arg);
282 std::string item;
283 while (std::getline(ss, item, ',')) {
284 if (!features.Set(item))
285 Warn() << "ignoring unknown feature request: " << item << "\n";
286 }
288}
289
291{
292 TString cacheSize;
293 int size;
296 Err() << "could not parse the cache size passed after -cachesize: '" << arg << "'\n";
297 return {"", EFlagResult::kErr};
299 double m;
300 const char *munit = nullptr;
302 Warn() << "the cache size passed after -cachesize is too large: " << arg << " is greater than " << m << munit
303 << ". We will use the maximum value.\n";
304 return {std::to_string(m) + munit, EFlagResult::kParsed};
305 } else {
306 cacheSize = "cachesize=";
307 cacheSize.Append(arg);
308 }
309 return {cacheSize, EFlagResult::kParsed};
310}
311
313{
314 if (strcmp(arg, "SkipListed") == 0)
316 if (strcmp(arg, "OnlyListed") == 0)
318
319 Err() << "invalid argument for -Ltype: '" << arg << "'. Can only be 'SkipListed' or 'OnlyListed' (case matters).\n";
320 return {{}, EFlagResult::kErr};
321}
322
323// Parses a flag that is followed by an argument of type T.
324// If `defaultVal` is provided, the following argument is optional and will be set to `defaultVal` if missing.
325// `conv` is used to convert the argument from string to its type T.
326template <typename T>
327static EFlagResult
328FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional<T> &flagOut,
329 std::optional<T> defaultVal = std::nullopt, FlagConvResult<T> (*conv)(const char *) = ConvertArg<T>)
330{
331 int argIdx = argIdxInOut;
332 const char *arg = argv[argIdx] + 1;
333 int argLen = strlen(arg);
334 int flagLen = strlen(flagStr);
335 const char *nxtArg = nullptr;
336
337 if (strncmp(arg, flagStr, flagLen) != 0)
339
340 bool argIsSeparate = false;
341 if (argLen > flagLen) {
342 // interpret anything after the flag as the argument.
343 nxtArg = arg + flagLen;
344 // Ignore one '=', if present
345 if (nxtArg[0] == '=')
346 ++nxtArg;
347 } else if (argLen == flagLen) {
348 argIsSeparate = true;
349 if (argIdx + 1 < argc) {
350 ++argIdxInOut;
352 } else {
353 Err() << "expected argument after '-" << flagStr << "' flag.\n";
354 return EFlagResult::kErr;
355 }
356 } else {
358 }
359
360 auto converted = conv(nxtArg);
361 if (converted.fResult == EFlagResult::kParsed) {
362 flagOut = converted.fValue;
363 } else if (converted.fResult == EFlagResult::kIgnored) {
364 if (defaultVal && argIsSeparate) {
366 // If we had tried parsing the next argument, step back one arg idx.
368 } else {
369 Err() << "the argument after '-" << flagStr << "' flag was not of the expected type.\n";
370 return EFlagResult::kErr;
371 }
372 } else {
373 return EFlagResult::kErr;
374 }
375
377}
378
380{
381 // Must be a number between 0 and 509 (with a 0 in the middle)
382 if (compSettings == 0)
383 return true;
384 // We also accept [1-9] as aliases of [101-109], but it's discouraged.
385 if (compSettings >= 1 && compSettings <= 9) {
386 Warn() << "interpreting " << compSettings << " as " << 100 + compSettings
387 << "."
388 " This behavior is deprecated, please use the full compression settings.\n";
389 return true;
390 }
391 return (compSettings >= 100 && compSettings <= 509) && ((compSettings / 10) % 10 == 0);
392}
393
394// The -f flag has a somewhat complicated logic.
395// We have 4 cases:
396// 1. -f
397// 2. -ff
398// 3. -fk
399// 4. -f[0-509]
400//
401// and a combination thereof (e.g. -fk101, -ff202, -ffk, -fk209)
402// -ff and -f[0-509] are incompatible.
403//
404// ALL these flags imply '-f' ("force overwrite"), but only if they parse successfully.
405// This means that if we see a -f[something] and that "something" doesn't parse to a valid
406// number between 0 and 509, or f or k, we consider the flag invalid and skip it without
407// setting any state.
408//
409// Note that we don't allow `-f [0-9]` because that would be a backwards-incompatible
410// change with the previous arg parsing semantic, changing the meaning of a cmdline like:
411//
412// $ hadd -f 200 f.root g.root # <- '200' is the output file, not an argument to -f!
413static EFlagResult FlagF(const char *arg, HAddArgs &args)
414{
415 if (arg[0] != 'f')
417
418 args.fForce = true;
419 const char *cur = arg + 1;
420 while (*cur) {
421 switch (cur[0]) {
422 case 'f':
424 Warn() << "duplicate flag: -ff\n";
425 if (args.fCompressionSettings) {
426 std::cerr
427 << "[err] Cannot specify both -ff and -f[0-9]. Either use the first input compression or specify it.\n";
428 return EFlagResult::kErr;
429 } else
430 args.fUseFirstInputCompression = true;
431 break;
432 case 'k':
433 if (args.fKeepCompressionAsIs)
434 Warn() << "duplicate flag: -fk\n";
435 args.fKeepCompressionAsIs = true;
436 break;
437 default:
438 if (isdigit(cur[0])) {
439 if (args.fUseFirstInputCompression) {
440 Err() << "cannot specify both -ff and -f[0-9]. Either use the first input compression or "
441 "specify it.\n";
442 return EFlagResult::kErr;
443 } else if (!args.fCompressionSettings) {
444 if (auto compLv = StrToUInt(cur)) {
447 // we can't see any other argument after the number, so we return here to avoid
448 // incorrectly parsing the rest of the characters in `arg`.
450 } else {
451 Err() << *compLv << " is not a supported compression settings.\n";
452 return EFlagResult::kErr;
453 }
454 } else {
455 Err() << "failed to parse compression settings '" << cur << "' as an integer.\n";
456 return EFlagResult::kErr;
457 }
458 } else {
459 Err() << "cannot specify -f[0-9] multiple times!\n";
460 return EFlagResult::kErr;
461 }
462 } else {
463 Err() << "invalid flag: " << arg << "\n";
464 return EFlagResult::kErr;
465 }
466 }
467 ++cur;
468 }
469
471}
472
473// Returns nullopt if any of the flags failed to parse.
474// If an unknown flag is encountered, it will print a warning and go on.
475static std::optional<HAddArgs> ParseArgs(int argc, char **argv)
476{
477 HAddArgs args{};
478
479 enum {
485
486 for (int argIdx = 1; argIdx < argc; ++argIdx) {
487 const char *argRaw = argv[argIdx];
488 if (!*argRaw)
489 continue;
490
491 if (!args.fNoFlagsAfterPositionalArguments && argRaw[0] == '-' && argRaw[1] != '\0') {
492 if (argRaw[1] == '-' && argRaw[2] == '\0') {
493 // special case `--`: force parsing to consider all future args as positional arguments.
495 Err()
496 << "found `--`, but we've already parsed (or are still parsing) a sequence of positional arguments!"
497 " This is not supported: you must have exactly one sequence of positional arguments, so if you"
498 " need to use `--` make sure to pass *all* positional arguments after it.";
499 return {};
500 }
501 args.fNoFlagsAfterPositionalArguments = true;
502 continue;
503 }
504
505 // parse flag
507
508 const char *arg = argRaw + 1;
509 bool validFlag = false;
510
511#define PARSE_FLAG(func, ...) \
512 do { \
513 if (!validFlag) { \
514 const auto res = func(__VA_ARGS__); \
515 if (res == EFlagResult::kErr) \
516 return {}; \
517 validFlag = res == EFlagResult::kParsed; \
518 } \
519 } while (0)
520
521 // NOTE: if two flags have the same prefix (e.g. -Ltype and -L) always put the longest one first!
522 PARSE_FLAG(FlagToggle, arg, "T", args.fNoTrees);
523 PARSE_FLAG(FlagToggle, arg, "a", args.fAppend);
524 PARSE_FLAG(FlagToggle, arg, "k", args.fSkipErrors);
525 PARSE_FLAG(FlagToggle, arg, "O", args.fReoptimize);
526 PARSE_FLAG(FlagToggle, arg, "dbg", args.fDebug);
527 // Accept --help, -help and -h as "help"
528 PARSE_FLAG(FlagToggle, arg, "-help", args.fHelp);
529 PARSE_FLAG(FlagToggle, arg, "help", args.fHelp);
530 PARSE_FLAG(FlagToggle, arg, "h", args.fHelp);
531 PARSE_FLAG(FlagArg, argc, argv, argIdx, "d", args.fWorkingDir);
532 PARSE_FLAG(FlagArg, argc, argv, argIdx, "j", args.fNProcesses, {0});
533 PARSE_FLAG(FlagArg, argc, argv, argIdx, "Ltype", args.fObjectFilterType, {}, ConvertFilterType);
534 PARSE_FLAG(FlagArg, argc, argv, argIdx, "L", args.fObjectFilterFile);
535 PARSE_FLAG(FlagArg, argc, argv, argIdx, "cachesize", args.fCacheSize, {}, ConvertCacheSize);
536 PARSE_FLAG(FlagArg, argc, argv, argIdx, "experimental-io-features", args.fFeatures);
537 PARSE_FLAG(FlagArg, argc, argv, argIdx, "n", args.fMaxOpenedFiles);
538 PARSE_FLAG(FlagArg, argc, argv, argIdx, "v", args.fVerbosity, {kDefaultHaddVerbosity});
539 PARSE_FLAG(FlagF, arg, args);
540
541#undef PARSE_FLAG
542
543 if (!validFlag)
544 Warn() << "unknown flag: " << argRaw << "\n";
545
546 } else if (!args.fOutputArgIdx) {
547 // First positional argument is the output
548 args.fOutputArgIdx = argIdx;
551 } else {
552 // We should be in the same positional argument group as the output, error otherwise
554 if (!args.fFirstInputIdx) {
555 args.fFirstInputIdx = argIdx;
556 }
557 } else {
558 Err() << "seen a positional argument '" << argRaw
559 << "' after some flags."
560 " Positional arguments were already parsed at this point (from '"
561 << argv[args.fOutputArgIdx]
562 << "' onwards), and you can only have one sequence of them, so you cannot pass more."
563 " Please group your positional arguments all together so that hadd works as you expect.\n"
564 "Cmdline: ";
565 for (int i = 0; i < argc; ++i)
566 std::cerr << argv[i] << " ";
567 std::cerr << "\n";
568
569 return {};
570 }
571 }
572 }
573
574 return args;
575}
576
577// Returns the flags to add to the file merger's flags, or -1 in case of errors.
578static Int_t ParseFilterFile(const std::optional<std::string> &filterFileName,
579 std::optional<Int_t> objectFilterType, TFileMerger &fileMerger)
580{
581 if (filterFileName) {
582 std::ifstream filterFile(*filterFileName);
583 if (!filterFile) {
584 Err() << "error opening filter file '" << *filterFileName << "'\n";
585 return -1;
586 }
588 std::string line;
589 std::string objPath;
590 int nObjects = 0;
591 while (std::getline(filterFile, line)) {
592 std::istringstream ss(line);
593 // only read exactly 1 token per line (strips any whitespaces and such)
594 objPath.clear();
595 ss >> objPath;
596 if (!objPath.empty() && objPath[0] != '#') {
597 filteredObjects.Append(objPath + ' ');
598 ++nObjects;
599 }
600 }
601
602 if (nObjects) {
603 Info(2) << "added " << nObjects << " object from filter file '" << *filterFileName << "'\n";
604 fileMerger.AddObjectNames(filteredObjects);
605 } else {
606 Warn() << "no objects were added from filter file '" << *filterFileName << "'\n";
607 }
608
609 assert(objectFilterType.has_value());
610 const auto filterFlag = *objectFilterType;
612 return filterFlag;
613 }
614 return 0;
615}
616
617int main(int argc, char **argv)
618{
620
621 const auto argsOpt = ParseArgs(argc, argv);
622 if (!argsOpt)
623 return 1;
624 const HAddArgs &args = *argsOpt;
625
626 if (args.fHelp) {
628 return 0;
629 }
630
632 Int_t maxopenedfiles = args.fMaxOpenedFiles.value_or(0);
634 Int_t newcomp = args.fCompressionSettings.value_or(-1);
635 TString cacheSize = args.fCacheSize.value_or("");
636
637 // For the -j flag (nProcesses), we check if the flag is present and, if so, if it has a
638 // valid value (i.e. any value > 0).
639 // If the flag is present at all, we do multiprocessing. If the value of nProcesses is invalid,
640 // we default to the number of cpus on the machine.
641 Bool_t multiproc = args.fNProcesses.has_value();
642 int nProcesses;
643 if (args.fNProcesses && *args.fNProcesses > 0) {
644 nProcesses = *args.fNProcesses;
645 } else {
646 SysInfo_t s;
647 gSystem->GetSysInfo(&s);
648 nProcesses = s.fCpus;
649 }
650 if (multiproc)
651 Info(2) << "parallelizing with " << nProcesses << " processes.\n";
652
653 // If the user specified a workingDir, use that. Otherwise, default to the system temp dir.
654 std::string workingDir;
655 if (!args.fWorkingDir) {
657 } else if (args.fWorkingDir && gSystem->AccessPathName(args.fWorkingDir->c_str())) {
658 Err() << "could not access the directory specified: " << *args.fWorkingDir << ".\n";
659 return 1;
660 } else {
661 workingDir = *args.fWorkingDir;
662 }
663
664 // Verify that -L and -Ltype are either both present or both absent.
665 if (args.fObjectFilterFile.has_value() != args.fObjectFilterType.has_value()) {
666 Err() << "-L must always be passed along with -Ltype.\n";
667 return 1;
668 }
669
670 const char *targetname = 0;
671 if (!args.fOutputArgIdx) {
672 Err() << "missing output file.\n";
674 return 1;
675 }
676 if (!args.fFirstInputIdx) {
677 Err() << "missing input file.\n";
679 return 1;
680 }
682
683 Info(2) << "target file: " << targetname << "\n";
684
685 if (args.fCacheSize)
686 Info(2) << "Using " << cacheSize << "\n";
687
688 ////////////////////////////// end flags processing /////////////////////////////////
689
690 gSystem->Load("libTreePlayer");
691
693 fileMerger.SetMsgPrefix("hadd");
694 fileMerger.SetPrintLevel(GetLogVerbosity() - 1);
695 if (maxopenedfiles > 0) {
696 fileMerger.SetMaxOpenedFiles(maxopenedfiles);
697 }
698 // The following section will collect all input filenames into a vector,
699 // including those listed within an indirect file.
700 // If any file can not be accessed, it will error out, unless args.fSkipErrors is true
701 std::vector<std::string> allSubfiles;
702 for (int a = args.fFirstInputIdx; a < argc; ++a) {
703 if (!args.fNoFlagsAfterPositionalArguments && argv[a] && argv[a][0] == '-') {
704 break;
705 }
706 if (argv[a] && argv[a][0] == '@') {
707 std::ifstream indirect_file(argv[a] + 1);
708 if (!indirect_file.is_open()) {
709 Err() << "could not open indirect file " << (argv[a] + 1) << std::endl;
710 if (!args.fSkipErrors)
711 return 1;
712 } else {
713 std::string line;
714 while (indirect_file) {
715 if (std::getline(indirect_file, line) && line.length()) {
716 if (gSystem->AccessPathName(line.c_str(), kReadPermission) == kTRUE) {
717 Err() << "could not validate the file name \"" << line << "\" within indirect file "
718 << (argv[a] + 1) << std::endl;
719 if (!args.fSkipErrors)
720 return 1;
721 } else if (std::filesystem::exists(targetname) && std::filesystem::equivalent(line, targetname)) {
722 Err() << "file " << line << " cannot be both the target and an input!\n";
723 if (!args.fSkipErrors)
724 return 1;
725 } else {
726 allSubfiles.emplace_back(line);
727 }
728 }
729 }
730 }
731 } else {
732 const std::string line = argv[a];
733 if (gSystem->AccessPathName(line.c_str(), kReadPermission) == kTRUE) {
734 Err() << "could not validate argument \"" << line << "\" as input file " << std::endl;
735 if (!args.fSkipErrors)
736 return 1;
737 } else if (std::filesystem::exists(targetname) && std::filesystem::equivalent(line, targetname)) {
738 Err() << "file " << line << " cannot be both the target and an input!\n";
739 if (!args.fSkipErrors)
740 return 1;
741 } else {
742 allSubfiles.emplace_back(line);
743 }
744 }
745 }
746 if (allSubfiles.empty()) {
747 Err() << "could not find any valid input file " << std::endl;
748 return 1;
749 }
750 // The next snippet determines the output compression if unset
751 if (newcomp == -1) {
753 // grab from the first file.
754 TFile *firstInput = TFile::Open(allSubfiles.front().c_str());
755 if (firstInput && !firstInput->IsZombie())
756 newcomp = firstInput->GetCompressionSettings();
757 else
759 delete firstInput;
760 fileMerger.SetMergeOptions(TString("FirstSrcCompression"));
761 } else {
763 fileMerger.SetMergeOptions(TString("DefaultCompression"));
764 }
765 }
766 if (args.fKeepCompressionAsIs && !args.fReoptimize)
767 Info(2) << "compression setting for meta data: " << newcomp << '\n';
768 else
769 Info(2) << "compression setting for all output: " << newcomp << '\n';
770
771 if (args.fAppend) {
772 if (!fileMerger.OutputFile(targetname, "UPDATE", newcomp)) {
773 Err() << "error opening target file for update :" << targetname << ".\n";
774 return 2;
775 }
776 } else if (!fileMerger.OutputFile(targetname, args.fForce, newcomp)) {
777 std::stringstream ss;
778 ss << "error opening target file (does " << targetname << " exist?).\n";
779 if (!args.fForce)
780 ss << "pass \"-f\" argument to force re-creation of output file.\n";
781 Err() << ss.str();
782 return 1;
783 }
784
785 auto step = (allSubfiles.size() + nProcesses - 1) / nProcesses;
786 if (multiproc && step < 3) {
787 // At least 3 files per process
788 step = 3;
789 nProcesses = (allSubfiles.size() + step - 1) / step;
790 Info(2) << "each process should handle at least 3 files for efficiency."
791 " Setting the number of processes to: "
792 << nProcesses << std::endl;
793 }
794 if (nProcesses == 1)
796
797 std::vector<std::string> partialFiles;
798
799#ifndef R__WIN32
800 // this is commented out only to try to prevent false positive detection
801 // from several anti-virus engines on Windows, and multiproc is not
802 // supported on Windows anyway
803 if (multiproc) {
804 auto uuid = TUUID();
805 auto partialTail = uuid.AsString();
806 for (auto i = 0; (i * step) < allSubfiles.size(); i++) {
807 std::stringstream buffer;
808 buffer << workingDir << "/partial" << i << "_" << partialTail << ".root";
809 partialFiles.emplace_back(buffer.str());
810 }
811 }
812#endif
813
814 auto mergeFiles = [&](TFileMerger &merger) {
815 if (args.fReoptimize) {
816 merger.SetFastMethod(kFALSE);
817 } else {
818 if (!args.fKeepCompressionAsIs && merger.HasCompressionChange()) {
819 // Don't warn if the user has requested any re-optimization.
820 Warn() << "Sources and Target have different compression settings\n"
821 "hadd merging will be slower\n";
822 }
823 }
824 merger.SetNotrees(args.fNoTrees);
825 merger.SetMergeOptions(TString(merger.GetMergeOptions()) + " " + cacheSize);
828 merger.SetIOFeatures(features);
831 if (extraFlags < 0)
832 return false;
834 if (args.fAppend)
836 else
838 Bool_t status = merger.PartialMerge(fileMergerFlags);
839 return status;
840 };
841
842 auto sequentialMerge = [&](TFileMerger &merger, int start, int nFiles) {
843 for (auto i = start; i < (start + nFiles) && i < static_cast<int>(allSubfiles.size()); i++) {
844 if (!merger.AddFile(allSubfiles[i].c_str())) {
845 if (args.fSkipErrors) {
846 Warn() << "skipping file with error: " << allSubfiles[i] << std::endl;
847 } else {
848 Err() << "exiting due to error in " << allSubfiles[i] << std::endl;
849 return kFALSE;
850 }
851 }
852 }
853 return mergeFiles(merger);
854 };
855
856 auto parallelMerge = [&](int start) {
858 mergerP.SetMsgPrefix("hadd");
859 mergerP.SetPrintLevel(GetLogVerbosity() - 1);
860 if (maxopenedfiles > 0) {
861 mergerP.SetMaxOpenedFiles(maxopenedfiles / nProcesses);
862 }
863 if (!mergerP.OutputFile(partialFiles[start / step].c_str(), args.fForce, newcomp)) {
864 Err() << "error opening target partial file\n";
865 exit(1);
866 }
867 return sequentialMerge(mergerP, start, step);
868 };
869
870 auto reductionFunc = [&]() {
871 for (const auto &pf : partialFiles) {
872 fileMerger.AddFile(pf.c_str());
873 }
874 return mergeFiles(fileMerger);
875 };
876
877 Bool_t status;
878
879#ifndef R__WIN32
880 if (multiproc) {
882 auto res = p.Map(parallelMerge, ROOT::TSeqI(0, allSubfiles.size(), step));
883 status = std::accumulate(res.begin(), res.end(), 0U) == partialFiles.size();
884 if (status) {
885 status = reductionFunc();
886 } else {
887 Err() << "failed at the parallel stage\n";
888 }
889 if (!args.fDebug) {
890 for (const auto &pf : partialFiles) {
891 gSystem->Unlink(pf.c_str());
892 }
893 }
894 } else {
895 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
896 }
897#else
898 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
899#endif
900
901 if (status) {
902 Info(3) << "merged " << allSubfiles.size() << " (" << fileMerger.GetMergeList()->GetEntries()
903 << ") input (partial) files into " << targetname << "\n";
904 return 0;
905 } else {
906 Err() << "failure during the merge of " << allSubfiles.size() << " (" << fileMerger.GetMergeList()->GetEntries()
907 << ") input (partial) files into " << targetname << "\n";
908 return 1;
909 }
910}
int main()
Definition Prototype.cxx:12
#define a(i)
Definition RSha256.hxx:99
size_t size(const MatrixT &matrix)
retrieve the size of a square matrix
bool Bool_t
Boolean (0=false, 1=true) (bool)
Definition RtypesCore.h:77
int Int_t
Signed integer 4 bytes (int)
Definition RtypesCore.h:59
constexpr Bool_t kFALSE
Definition RtypesCore.h:108
constexpr Bool_t kTRUE
Definition RtypesCore.h:107
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
void Info(const char *location, const char *msgfmt,...)
Use this function for informational messages.
Definition TError.cxx:241
winID h TVirtualViewer3D TVirtualGLPainter p
@ kReadPermission
Definition TSystem.h:55
R__EXTERN TSystem * gSystem
Definition TSystem.h:572
TIOFeatures provides the end-user with the ability to change the IO behavior of data written via a TT...
This class provides a simple interface to execute the same task multiple times in parallel,...
This class provides file copy and merging services.
Definition TFileMerger.h:30
@ kAll
Merge all type of objects (default)
Definition TFileMerger.h:87
@ kIncremental
Merge the input file with the content of the output file (if already existing).
Definition TFileMerger.h:82
@ kSkipListed
Skip objects specified in fObjectNames list.
Definition TFileMerger.h:91
@ kOnlyListed
Only the objects specified in fObjectNames list.
Definition TFileMerger.h:90
@ kRegular
Normal merge, overwriting the output file.
Definition TFileMerger.h:81
@ kFailOnError
The merging process will stop and yield failure when encountering invalid objects.
@ kSkipOnError
The merging process will skip invalid objects and continue.
A ROOT file is an on-disk file, usually with extension .root, that stores objects in a file-system-li...
Definition TFile.h:131
static TFile * Open(const char *name, Option_t *option="", const char *ftitle="", Int_t compress=ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault, Int_t netopt=0)
Create / open a file.
Definition TFile.cxx:3764
Basic string class.
Definition TString.h:138
TString & Append(const char *cs)
Definition TString.h:581
virtual int GetSysInfo(SysInfo_t *info) const
Returns static system info, like OS type, CPU type, number of CPUs RAM size, etc into the SysInfo_t s...
Definition TSystem.cxx:2469
virtual int Load(const char *module, const char *entry="", Bool_t system=kFALSE)
Load a shared library.
Definition TSystem.cxx:1868
virtual Bool_t AccessPathName(const char *path, EAccessMode mode=kFileExists)
Returns FALSE if one can access a file using the specified access mode.
Definition TSystem.cxx:1307
virtual int Unlink(const char *name)
Unlink, i.e.
Definition TSystem.cxx:1392
virtual const char * TempDirectory() const
Return a user configured or systemwide directory to create temporary files in.
Definition TSystem.cxx:1493
This class defines a UUID (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDent...
Definition TUUID.h:42
TLine * line
static EFlagResult FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional< T > &flagOut, std::optional< T > defaultVal=std::nullopt, FlagConvResult< T >(*conv)(const char *)=ConvertArg< T >)
Definition hadd.cxx:328
EFlagResult
Definition hadd.cxx:211
static bool ValidCompressionSettings(int compSettings)
Definition hadd.cxx:379
FlagConvResult< IntFlag_t > ConvertArg< IntFlag_t >(const char *arg)
Definition hadd.cxx:262
#define PARSE_FLAG(func,...)
static FlagConvResult< Int_t > ConvertFilterType(const char *arg)
Definition hadd.cxx:312
static Int_t ParseFilterFile(const std::optional< std::string > &filterFileName, std::optional< Int_t > objectFilterType, TFileMerger &fileMerger)
Definition hadd.cxx:578
static FlagConvResult< T > ConvertArg(const char *)
uint32_t IntFlag_t
Definition hadd.cxx:180
static constexpr int kDefaultHaddVerbosity
Definition hadd.cxx:178
static std::optional< HAddArgs > ParseArgs(int argc, char **argv)
Definition hadd.cxx:475
FlagConvResult< ROOT::TIOFeatures > ConvertArg< ROOT::TIOFeatures >(const char *arg)
Definition hadd.cxx:277
static FlagConvResult< TString > ConvertCacheSize(const char *arg)
Definition hadd.cxx:290
static EFlagResult FlagF(const char *arg, HAddArgs &args)
Definition hadd.cxx:413
static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
Definition hadd.cxx:213
static std::optional< IntFlag_t > StrToUInt(const char *str)
Definition hadd.cxx:228
static constexpr const char kCommandLineOptionsHelp[]
void ToHumanReadableSize(value_type bytes, Bool_t si, Double_t *coeff, const char **units)
Return the size expressed in 'human readable' format.
EFromHumanReadableSize FromHumanReadableSize(std::string_view str, T &value)
Convert strings like the following into byte counts 5MB, 5 MB, 5M, 3.7GB, 123b, 456kB,...
EFlagResult fResult
Definition hadd.cxx:249
bool fNoFlagsAfterPositionalArguments
Definition hadd.cxx:208
bool fHelp
Definition hadd.cxx:191
bool fKeepCompressionAsIs
Definition hadd.cxx:189
bool fForce
Definition hadd.cxx:185
std::optional< TString > fCacheSize
Definition hadd.cxx:197
std::optional< IntFlag_t > fCompressionSettings
Definition hadd.cxx:201
bool fNoTrees
Definition hadd.cxx:183
std::optional< Int_t > fObjectFilterType
Definition hadd.cxx:196
int fFirstInputIdx
Definition hadd.cxx:204
std::optional< IntFlag_t > fNProcesses
Definition hadd.cxx:194
bool fUseFirstInputCompression
Definition hadd.cxx:190
std::optional< std::string > fObjectFilterFile
Definition hadd.cxx:195
bool fSkipErrors
Definition hadd.cxx:186
std::optional< IntFlag_t > fVerbosity
Definition hadd.cxx:200
std::optional< IntFlag_t > fMaxOpenedFiles
Definition hadd.cxx:199
std::optional< std::string > fWorkingDir
Definition hadd.cxx:193
int fOutputArgIdx
Definition hadd.cxx:203
bool fDebug
Definition hadd.cxx:188
bool fReoptimize
Definition hadd.cxx:187
std::optional< ROOT::TIOFeatures > fFeatures
Definition hadd.cxx:198
bool fAppend
Definition hadd.cxx:184
@ kUseCompiledDefault
Use the compile-time default setting.
Definition Compression.h:53
Int_t fCpus
Definition TSystem.h:162
TMarker m
Definition textangle.C:8