Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
hadd.cxx
Go to the documentation of this file.
1/**
2 \file hadd.cxx
3 \brief This program will merge compatible ROOT objects, such as histograms, Trees and RNTuples,
4 from a list of root files and write them to a target root file.
5 In order for a ROOT object to be mergeable, it must implement the Merge() function.
6 Non-mergeable objects will have all instances copied as-is into the target file.
7 The target file must not be identical to one of the source files.
8
9 Syntax:
10 ```{.cpp}
11 hadd [flags] targetfile source1 source2 ... [flags]
12 ```
13
14 Flags can be passed before or after the positional arguments.
15 The first positional (non-flag) argument will be interpreted as the targetfile.
16 After that, the first sequence of positional arguments will be interpreted as the input files.
17 If two sequences of positional arguments are separated by flags, hadd will emit an error and abort.
18
19 By default, any argument starting with `-` is interpreted as a flag. If you want to pass filenames
20 starting with `-` you need to pass them after `--`:
21 ```{.cpp}
22 hadd [flags] -- -file1 -file2 ...
23 ```
24 Note that in this case you need to pass ALL positional arguments after `--`.
25
26 If a flag requires an argument, the argument can be specified in any of these ways:
27
28 # All equally valid:
29 -j 16
30 -j16
31 -j=16
32
33 The first syntax is the preferred one since it's backward-compatible with previous versions of hadd.
34 The -f flag is an exception to this rule: it only supports the `-f[0-9]` syntax.
35
36 Note that merging multiple flags is NOT supported: `-jfa` will be interpreted as -j=fa, which is invalid!
37
38 The flags are as follows:
39
40 \param -a Append to the output
41 \param -cachesize <SIZE> Resize the prefetching cache used to speed up I/O operations (use 0 to disable).
42 \param -d <DIR> Carry out the partial multiprocess execution in the specified directory
43 \param -dbg Enable verbosity. If -j was specified, do not not delete partial files
44 stored inside working directory.
45 \param -experimental-io-features <FEATURES> Enables the corresponding experimental feature for output trees.
46 \see ROOT::Experimental::EIOFeatures
47 \param -f Force overwriting of output file.
48 \param -f[0-9] Set target compression algorithm `i` and level `j` passing the number `i*100 + j`, e.g. `-f505`.
49 The last digit (`j`) can be set from 0 = uncompressed to 9 = highly compressed.
50 The first digit (`i`) is 1 for ZLIB, 2 for LZMA, 4 for LZ4 and 5 for ZSTD.
51 Recommended numbers are 101 (ZLIB), 207 (LZMA), 404 (LZ4), 505 (ZSTD),
52 The default value for this flag is 101 (kDefaultZLIB).
53 See ROOT::RCompressionSetting and TFile::TFile documentation for more details.
54 \param -fk Sets the target file to contain the baskets with the same compression as the input files
55 (unless -O is specified). Compresses the meta data using the compression level specified
56 in the first input or the compression setting after fk (for example 505 when using -fk505)
57 \param -ff The compression level used is the one specified in the first input
58 \param -j [N_JOBS] Parallelise the execution in `N_JOBS` processes. If the number of processes is not specified,
59 or is 0, use the system maximum.
60 \param -k Skip corrupt or non-existent files, do not exit
61 \param -L <FILE> Read the list of objects from FILE and either only merge or skip those objects depending on
62 the value of "-Ltype". FILE must contain one object name per line, which cannot contain
63 whitespaces or '/'. You can also pass TDirectory names, which apply to the entire directory
64 content. Lines beginning with '#' are ignored. If this flag is passed, "-Ltype" MUST be
65 passed as well.
66 \param -Ltype <SkipListed|OnlyListed> Sets the type of operation performed on the objects listed in FILE given with
67 the
68 "-L" flag. "SkipListed" will skip all the listed objects; "OnlyListed" will only merge those
69 objects. If this flag is passed, "-L" must be passed as well.
70 \param -n <N_FILES> Open at most `N` files at once (use 0 to request to use the system maximum - which is also
71 the default). This number includes both the input reading files as well as the output file.
72 Thus, if set to 1, it will be automatically replaced to a minimum of 2. If set to a too large
73 value, it will be clipped to the system maximum.
74 \param -O Re-optimize basket size when merging TTree
75 \param -T Do not merge Trees
76 \param -v [LEVEL] Explicitly set the verbosity level:
77 <= 0 = only output errors;
78 1 = only output errors and warnings;
79 2 = output minimal informative messages, errors and warnings (default);
80 >= 3 = output all messages.
81 \return hadd returns a status code: 0 if OK, 1 otherwise
82
83 For example assume 3 files f1, f2, f3 containing histograms hn and Trees Tn
84 - f1 with h1 h2 h3 T1
85 - f2 with h1 h4 T1 T2
86 - f3 with h5
87 the result of
88 ```
89 hadd -f x.root f1.root f2.root f3.root
90 ```
91 will be a file x.root with h1 h2 h3 h4 h5 T1 T2
92 where
93 - h1 will be the sum of the 2 histograms in f1 and f2
94 - T1 will be the merge of the Trees in f1 and f2
95
96 The files may contain sub-directories.
97
98 If the source files contains histograms and Trees, one can skip
99 the Trees with
100 ```
101 hadd -T targetfile source1 source2 ...
102 ```
103
104 Wildcarding and indirect files are also supported
105 ```
106 hadd result.root myfil*.root
107 ```
108 will merge all files in myfil*.root
109 ```
110 hadd result.root file1.root @list.txt file2. root myfil*.root
111 ```
112 will merge file1.root, file2.root, all files in myfil*.root
113 and all files in the indirect text file list.txt ("@" as the first
114 character of the file indicates an indirect file. An indirect file
115 is a text file containing a list of other files, including other
116 indirect files, one line per file).
117
118 If the sources and and target compression levels are identical (default),
119 the program uses the TChain::Merge function with option "fast", ie
120 the merge will be done without unzipping or unstreaming the baskets
121 (i.e. direct copy of the raw byte on disk). The "fast" mode is typically
122 5 times faster than the mode unzipping and unstreaming the baskets.
123
124 If the option -cachesize is used, hadd will resize (or disable if 0) the
125 prefetching cache use to speed up I/O operations.
126
127 For options that take a size as argument, a decimal number of bytes is expected.
128 If the number ends with a `k`, `m`, `g`, etc., the number is multiplied
129 by 1000 (1K), 1000000 (1MB), 1000000000 (1G), etc.
130 If this prefix is followed by `i`, the number is multiplied by the traditional
131 1024 (1KiB), 1048576 (1MiB), 1073741824 (1GiB), etc.
132 The prefix can be optionally followed by B whose casing is ignored,
133 eg. 1k, 1K, 1Kb and 1KB are the same.
134
135 \note By default histograms are added. However hadd does not support the case where
136 histograms have their bit TH1::kIsAverage set.
137
138 \authors Rene Brun, Dirk Geppert, Sven A. Schmidt, Toby Burnett
139*/
140#include "Compression.h"
141#include "TClass.h"
142#include "TFile.h"
143#include "TFileMerger.h"
144#include "THashList.h"
145#include "TKey.h"
146#include "TSystem.h"
147#include "TUUID.h"
148
149#include <ROOT/RConfig.hxx>
150#include <ROOT/StringConv.hxx>
151#include <ROOT/TIOFeatures.hxx>
152
153#include "haddCommandLineOptionsHelp.h"
154
155#include <climits>
156#include <cstdlib>
157#include <filesystem>
158#include <fstream>
159#include <iostream>
160#include <optional>
161#include <sstream>
162#include <string>
163#include <streambuf>
164
165#ifndef R__WIN32
167#endif
168
169////////////////////////////////////////////////////////////////////////////////
170
171// NOTE: TFileMerger will use PrintLevel = gHaddVerbosity - 1. If PrintLevel is < 1, it will print nothing, otherwise
172// it will print everything. To give some granularity to hadd, we do the following:
173// gHaddVerbosity = 0: only print hadd errors
174// gHaddVerbosity = 1: only print hadd errors + warnings
175// gHaddVerbosity = 2: print hadd errors + warnings and TFileMerger messages
176// gHaddVerbosity > 2: print all hadd and TFileMerger messages.
177static constexpr int kDefaultHaddVerbosity = 2;
179
180namespace {
181
182class NullBuf : public std::streambuf {
183public:
184 int overflow(int c) final { return c; }
185};
186
187class NullStream : public std::ostream {
188 NullBuf fBuf;
189
190public:
191 NullStream() : std::ostream(&fBuf) {}
192};
193
194} // namespace
195
196static NullStream &GetNullStream()
197{
198 static NullStream nullStream;
199 return nullStream;
200}
201
202static inline std::ostream &Err()
203{
204 std::cerr << "Error in <hadd>: ";
205 return std::cerr;
206}
207
208static inline std::ostream &Warn()
209{
210 std::ostream &s = gHaddVerbosity < 1 ? GetNullStream() : std::cerr;
211 s << "Warning in <hadd>: ";
212 return s;
213}
214
215static inline std::ostream &Info(int minLevel)
216{
217 std::ostream &s = gHaddVerbosity < minLevel ? GetNullStream() : std::cerr;
218 s << "Info in <hadd>: ";
219 return s;
220}
221
222using IntFlag_t = uint32_t;
223
224struct HAddArgs {
227 bool fForce;
230 bool fDebug;
233
234 std::optional<std::string> fWorkingDir;
235 std::optional<IntFlag_t> fNProcesses;
236 std::optional<std::string> fObjectFilterFile;
237 std::optional<Int_t> fObjectFilterType;
238 std::optional<TString> fCacheSize;
239 std::optional<ROOT::TIOFeatures> fFeatures;
240 std::optional<IntFlag_t> fMaxOpenedFiles;
241 std::optional<IntFlag_t> fVerbosity;
242 std::optional<IntFlag_t> fCompressionSettings;
243
246 // This is set to true if and only if the user passed `--`. In this special
247 // case, we must not stop parsing positional arguments even if we find one
248 // that starts with a `-`.
250};
251
253
254static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
255{
256 const auto argLen = strlen(arg);
257 const auto flagLen = strlen(flagStr);
258 if (argLen == flagLen && strncmp(arg, flagStr, flagLen) == 0) {
259 if (flagOut)
260 Warn() << "duplicate flag: " << flagStr << "\n";
261 flagOut = true;
263 }
265}
266
267// NOTE: not using std::stoi or similar because they have bad error checking.
268// std::stoi will happily parse "120notvalid" as 120.
269static std::optional<IntFlag_t> StrToUInt(const char *str)
270{
271 if (!str)
272 return {};
273
274 uint32_t res = 0;
275 do {
276 if (!isdigit(*str))
277 return {};
278 if (res * 10 < res) // overflow is an error
279 return {};
280 res *= 10;
281 res += *str - '0';
282 } while (*++str);
283
284 return res;
285}
286
287template <typename T>
292
293template <typename T>
294static FlagConvResult<T> ConvertArg(const char *);
295
296template <>
298{
299 return {arg, EFlagResult::kParsed};
300}
301
302template <>
304{
305 // Don't even try to parse arg if it doesn't look like a number.
306 if (!isdigit(*arg))
307 return {0, EFlagResult::kIgnored};
308
309 auto intOpt = StrToUInt(arg);
310 if (intOpt)
311 return {*intOpt, EFlagResult::kParsed};
312
313 Err() << "error parsing integer argument '" << arg << "'\n";
314 return {0, EFlagResult::kErr};
315}
316
317template <>
319{
321 std::stringstream ss;
322 ss.str(arg);
323 std::string item;
324 while (std::getline(ss, item, ',')) {
325 if (!features.Set(item))
326 Warn() << "ignoring unknown feature request: " << item << "\n";
327 }
329}
330
332{
333 TString cacheSize;
334 int size;
337 Err() << "could not parse the cache size passed after -cachesize: '" << arg << "'\n";
338 return {"", EFlagResult::kErr};
340 double m;
341 const char *munit = nullptr;
343 Warn() << "the cache size passed after -cachesize is too large: " << arg << " is greater than " << m << munit
344 << ". We will use the maximum value.\n";
345 return {std::to_string(m) + munit, EFlagResult::kParsed};
346 } else {
347 cacheSize = "cachesize=";
348 cacheSize.Append(arg);
349 }
350 return {cacheSize, EFlagResult::kParsed};
351}
352
354{
355 if (strcmp(arg, "SkipListed") == 0)
357 if (strcmp(arg, "OnlyListed") == 0)
359
360 Err() << "invalid argument for -Ltype: '" << arg << "'. Can only be 'SkipListed' or 'OnlyListed' (case matters).\n";
361 return {{}, EFlagResult::kErr};
362}
363
364// Parses a flag that is followed by an argument of type T.
365// If `defaultVal` is provided, the following argument is optional and will be set to `defaultVal` if missing.
366// `conv` is used to convert the argument from string to its type T.
367template <typename T>
368static EFlagResult
369FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional<T> &flagOut,
370 std::optional<T> defaultVal = std::nullopt, FlagConvResult<T> (*conv)(const char *) = ConvertArg<T>)
371{
372 int argIdx = argIdxInOut;
373 const char *arg = argv[argIdx] + 1;
374 int argLen = strlen(arg);
375 int flagLen = strlen(flagStr);
376 const char *nxtArg = nullptr;
377
378 if (strncmp(arg, flagStr, flagLen) != 0)
380
381 bool argIsSeparate = false;
382 if (argLen > flagLen) {
383 // interpret anything after the flag as the argument.
384 nxtArg = arg + flagLen;
385 // Ignore one '=', if present
386 if (nxtArg[0] == '=')
387 ++nxtArg;
388 } else if (argLen == flagLen) {
389 argIsSeparate = true;
390 if (argIdx + 1 < argc) {
391 ++argIdxInOut;
393 } else {
394 Err() << "expected argument after '-" << flagStr << "' flag.\n";
395 return EFlagResult::kErr;
396 }
397 } else {
399 }
400
401 auto converted = conv(nxtArg);
402 if (converted.fResult == EFlagResult::kParsed) {
403 flagOut = converted.fValue;
404 } else if (converted.fResult == EFlagResult::kIgnored) {
405 if (defaultVal && argIsSeparate) {
407 // If we had tried parsing the next argument, step back one arg idx.
409 } else {
410 Err() << "the argument after '-" << flagStr << "' flag was not of the expected type.\n";
411 return EFlagResult::kErr;
412 }
413 } else {
414 return EFlagResult::kErr;
415 }
416
418}
419
421{
422 // Must be a number between 0 and 509 (with a 0 in the middle)
423 if (compSettings == 0)
424 return true;
425 // We also accept [1-9] as aliases of [101-109], but it's discouraged.
426 if (compSettings >= 1 && compSettings <= 9) {
427 Warn() << "interpreting " << compSettings << " as " << 100 + compSettings
428 << "."
429 " This behavior is deprecated, please use the full compression settings.\n";
430 return true;
431 }
432 return (compSettings >= 100 && compSettings <= 509) && ((compSettings / 10) % 10 == 0);
433}
434
435// The -f flag has a somewhat complicated logic.
436// We have 4 cases:
437// 1. -f
438// 2. -ff
439// 3. -fk
440// 4. -f[0-509]
441//
442// and a combination thereof (e.g. -fk101, -ff202, -ffk, -fk209)
443// -ff and -f[0-509] are incompatible.
444//
445// ALL these flags imply '-f' ("force overwrite"), but only if they parse successfully.
446// This means that if we see a -f[something] and that "something" doesn't parse to a valid
447// number between 0 and 509, or f or k, we consider the flag invalid and skip it without
448// setting any state.
449//
450// Note that we don't allow `-f [0-9]` because that would be a backwards-incompatible
451// change with the previous arg parsing semantic, changing the meaning of a cmdline like:
452//
453// $ hadd -f 200 f.root g.root # <- '200' is the output file, not an argument to -f!
454static EFlagResult FlagF(const char *arg, HAddArgs &args)
455{
456 if (arg[0] != 'f')
458
459 args.fForce = true;
460 const char *cur = arg + 1;
461 while (*cur) {
462 switch (cur[0]) {
463 case 'f':
465 Warn() << "duplicate flag: -ff\n";
466 if (args.fCompressionSettings) {
467 std::cerr
468 << "[err] Cannot specify both -ff and -f[0-9]. Either use the first input compression or specify it.\n";
469 return EFlagResult::kErr;
470 } else
471 args.fUseFirstInputCompression = true;
472 break;
473 case 'k':
474 if (args.fKeepCompressionAsIs)
475 Warn() << "duplicate flag: -fk\n";
476 args.fKeepCompressionAsIs = true;
477 break;
478 default:
479 if (isdigit(cur[0])) {
480 if (args.fUseFirstInputCompression) {
481 Err() << "cannot specify both -ff and -f[0-9]. Either use the first input compression or "
482 "specify it.\n";
483 return EFlagResult::kErr;
484 } else if (!args.fCompressionSettings) {
485 if (auto compLv = StrToUInt(cur)) {
488 // we can't see any other argument after the number, so we return here to avoid
489 // incorrectly parsing the rest of the characters in `arg`.
491 } else {
492 Err() << *compLv << " is not a supported compression settings.\n";
493 return EFlagResult::kErr;
494 }
495 } else {
496 Err() << "failed to parse compression settings '" << cur << "' as an integer.\n";
497 return EFlagResult::kErr;
498 }
499 } else {
500 Err() << "cannot specify -f[0-9] multiple times!\n";
501 return EFlagResult::kErr;
502 }
503 } else {
504 Err() << "invalid flag: " << arg << "\n";
505 return EFlagResult::kErr;
506 }
507 }
508 ++cur;
509 }
510
512}
513
514// Returns nullopt if any of the flags failed to parse.
515// If an unknown flag is encountered, it will print a warning and go on.
516static std::optional<HAddArgs> ParseArgs(int argc, char **argv)
517{
518 HAddArgs args{};
519
520 enum {
526
527 for (int argIdx = 1; argIdx < argc; ++argIdx) {
528 const char *argRaw = argv[argIdx];
529 if (!*argRaw)
530 continue;
531
532 if (!args.fNoFlagsAfterPositionalArguments && argRaw[0] == '-' && argRaw[1] != '\0') {
533 if (argRaw[1] == '-' && argRaw[2] == '\0') {
534 // special case `--`: force parsing to consider all future args as positional arguments.
536 Err()
537 << "found `--`, but we've already parsed (or are still parsing) a sequence of positional arguments!"
538 " This is not supported: you must have exactly one sequence of positional arguments, so if you"
539 " need to use `--` make sure to pass *all* positional arguments after it.";
540 return {};
541 }
542 args.fNoFlagsAfterPositionalArguments = true;
543 continue;
544 }
545
546 // parse flag
548
549 const char *arg = argRaw + 1;
550 bool validFlag = false;
551
552#define PARSE_FLAG(func, ...) \
553 do { \
554 if (!validFlag) { \
555 const auto res = func(__VA_ARGS__); \
556 if (res == EFlagResult::kErr) \
557 return {}; \
558 validFlag = res == EFlagResult::kParsed; \
559 } \
560 } while (0)
561
562 // NOTE: if two flags have the same prefix (e.g. -Ltype and -L) always put the longest one first!
563 PARSE_FLAG(FlagToggle, arg, "T", args.fNoTrees);
564 PARSE_FLAG(FlagToggle, arg, "a", args.fAppend);
565 PARSE_FLAG(FlagToggle, arg, "k", args.fSkipErrors);
566 PARSE_FLAG(FlagToggle, arg, "O", args.fReoptimize);
567 PARSE_FLAG(FlagToggle, arg, "dbg", args.fDebug);
568 PARSE_FLAG(FlagArg, argc, argv, argIdx, "d", args.fWorkingDir);
569 PARSE_FLAG(FlagArg, argc, argv, argIdx, "j", args.fNProcesses, {0});
570 PARSE_FLAG(FlagArg, argc, argv, argIdx, "Ltype", args.fObjectFilterType, {}, ConvertFilterType);
571 PARSE_FLAG(FlagArg, argc, argv, argIdx, "L", args.fObjectFilterFile);
572 PARSE_FLAG(FlagArg, argc, argv, argIdx, "cachesize", args.fCacheSize, {}, ConvertCacheSize);
573 PARSE_FLAG(FlagArg, argc, argv, argIdx, "experimental-io-features", args.fFeatures);
574 PARSE_FLAG(FlagArg, argc, argv, argIdx, "n", args.fMaxOpenedFiles);
575 PARSE_FLAG(FlagArg, argc, argv, argIdx, "v", args.fVerbosity, {kDefaultHaddVerbosity});
576 PARSE_FLAG(FlagF, arg, args);
577
578#undef PARSE_FLAG
579
580 if (!validFlag)
581 Warn() << "unknown flag: " << argRaw << "\n";
582
583 } else if (!args.fOutputArgIdx) {
584 // First positional argument is the output
585 args.fOutputArgIdx = argIdx;
588 } else {
589 // We should be in the same positional argument group as the output, error otherwise
591 if (!args.fFirstInputIdx) {
592 args.fFirstInputIdx = argIdx;
593 }
594 } else {
595 Err() << "seen a positional argument '" << argRaw
596 << "' after some flags."
597 " Positional arguments were already parsed at this point (from '"
598 << argv[args.fOutputArgIdx]
599 << "' onwards), and you can only have one sequence of them, so you cannot pass more."
600 " Please group your positional arguments all together so that hadd works as you expect.\n"
601 "Cmdline: ";
602 for (int i = 0; i < argc; ++i)
603 std::cerr << argv[i] << " ";
604 std::cerr << "\n";
605
606 return {};
607 }
608 }
609 }
610
611 return args;
612}
613
614// Returns the flags to add to the file merger's flags, or -1 in case of errors.
615static Int_t ParseFilterFile(const std::optional<std::string> &filterFileName,
616 std::optional<Int_t> objectFilterType, TFileMerger &fileMerger)
617{
618 if (filterFileName) {
619 std::ifstream filterFile(*filterFileName);
620 if (!filterFile) {
621 Err() << "error opening filter file '" << *filterFileName << "'\n";
622 return -1;
623 }
625 std::string line;
626 std::string objPath;
627 int nObjects = 0;
628 while (std::getline(filterFile, line)) {
629 std::istringstream ss(line);
630 // only read exactly 1 token per line (strips any whitespaces and such)
631 objPath.clear();
632 ss >> objPath;
633 if (!objPath.empty() && objPath[0] != '#') {
634 filteredObjects.Append(objPath + ' ');
635 ++nObjects;
636 }
637 }
638
639 if (nObjects) {
640 Info(2) << "added " << nObjects << " object from filter file '" << *filterFileName << "'\n";
641 fileMerger.AddObjectNames(filteredObjects);
642 } else {
643 Warn() << "no objects were added from filter file '" << *filterFileName << "'\n";
644 }
645
646 assert(objectFilterType.has_value());
647 const auto filterFlag = *objectFilterType;
649 return filterFlag;
650 }
651 return 0;
652}
653
654int main(int argc, char **argv)
655{
656 if (argc < 3 || "-h" == std::string(argv[1]) || "--help" == std::string(argv[1])) {
658 return (argc == 2 && ("-h" == std::string(argv[1]) || "--help" == std::string(argv[1]))) ? 0 : 1;
659 }
660
661 const auto argsOpt = ParseArgs(argc, argv);
662 if (!argsOpt)
663 return 1;
664 const HAddArgs &args = *argsOpt;
665
667 Int_t maxopenedfiles = args.fMaxOpenedFiles.value_or(0);
669 Int_t newcomp = args.fCompressionSettings.value_or(-1);
670 TString cacheSize = args.fCacheSize.value_or("");
671
672 // For the -j flag (nProcesses), we check if the flag is present and, if so, if it has a
673 // valid value (i.e. any value > 0).
674 // If the flag is present at all, we do multiprocessing. If the value of nProcesses is invalid,
675 // we default to the number of cpus on the machine.
676 Bool_t multiproc = args.fNProcesses.has_value();
677 int nProcesses;
678 if (args.fNProcesses && *args.fNProcesses > 0) {
679 nProcesses = *args.fNProcesses;
680 } else {
681 SysInfo_t s;
682 gSystem->GetSysInfo(&s);
683 nProcesses = s.fCpus;
684 }
685 if (multiproc)
686 Info(2) << "parallelizing with " << nProcesses << " processes.\n";
687
688 // If the user specified a workingDir, use that. Otherwise, default to the system temp dir.
689 std::string workingDir;
690 if (!args.fWorkingDir) {
692 } else if (args.fWorkingDir && gSystem->AccessPathName(args.fWorkingDir->c_str())) {
693 Err() << "could not access the directory specified: " << *args.fWorkingDir << ".\n";
694 return 1;
695 } else {
696 workingDir = *args.fWorkingDir;
697 }
698
699 // Verify that -L and -Ltype are either both present or both absent.
700 if (args.fObjectFilterFile.has_value() != args.fObjectFilterType.has_value()) {
701 Err() << "-L must always be passed along with -Ltype.\n";
702 return 1;
703 }
704
705 const char *targetname = 0;
706 if (!args.fOutputArgIdx) {
707 Err() << "missing output file.\n";
708 return 1;
709 }
710 if (!args.fFirstInputIdx) {
711 Err() << "missing input file.\n";
712 return 1;
713 }
715
716 Info(2) << "target file: " << targetname << "\n";
717
718 if (args.fCacheSize)
719 Info(2) << "Using " << cacheSize << "\n";
720
721 ////////////////////////////// end flags processing /////////////////////////////////
722
723 gSystem->Load("libTreePlayer");
724
726 fileMerger.SetMsgPrefix("hadd");
727 fileMerger.SetPrintLevel(gHaddVerbosity - 1);
728 if (maxopenedfiles > 0) {
729 fileMerger.SetMaxOpenedFiles(maxopenedfiles);
730 }
731 // The following section will collect all input filenames into a vector,
732 // including those listed within an indirect file.
733 // If any file can not be accessed, it will error out, unless args.fSkipErrors is true
734 std::vector<std::string> allSubfiles;
735 for (int a = args.fFirstInputIdx; a < argc; ++a) {
736 if (!args.fNoFlagsAfterPositionalArguments && argv[a] && argv[a][0] == '-') {
737 break;
738 }
739 if (argv[a] && argv[a][0] == '@') {
740 std::ifstream indirect_file(argv[a] + 1);
741 if (!indirect_file.is_open()) {
742 Err() << "could not open indirect file " << (argv[a] + 1) << std::endl;
743 if (!args.fSkipErrors)
744 return 1;
745 } else {
746 std::string line;
747 while (indirect_file) {
748 if (std::getline(indirect_file, line) && line.length()) {
749 if (gSystem->AccessPathName(line.c_str(), kReadPermission) == kTRUE) {
750 Err() << "could not validate the file name \"" << line << "\" within indirect file "
751 << (argv[a] + 1) << std::endl;
752 if (!args.fSkipErrors)
753 return 1;
754 } else if (std::filesystem::exists(targetname) && std::filesystem::equivalent(line, targetname)) {
755 Err() << "file " << line << " cannot be both the target and an input!\n";
756 if (!args.fSkipErrors)
757 return 1;
758 } else {
759 allSubfiles.emplace_back(line);
760 }
761 }
762 }
763 }
764 } else {
765 const std::string line = argv[a];
766 if (gSystem->AccessPathName(line.c_str(), kReadPermission) == kTRUE) {
767 Err() << "could not validate argument \"" << line << "\" as input file " << std::endl;
768 if (!args.fSkipErrors)
769 return 1;
770 } else if (std::filesystem::exists(targetname) && std::filesystem::equivalent(line, targetname)) {
771 Err() << "file " << line << " cannot be both the target and an input!\n";
772 if (!args.fSkipErrors)
773 return 1;
774 } else {
775 allSubfiles.emplace_back(line);
776 }
777 }
778 }
779 if (allSubfiles.empty()) {
780 Err() << "could not find any valid input file " << std::endl;
781 return 1;
782 }
783 // The next snippet determines the output compression if unset
784 if (newcomp == -1) {
786 // grab from the first file.
787 TFile *firstInput = TFile::Open(allSubfiles.front().c_str());
788 if (firstInput && !firstInput->IsZombie())
789 newcomp = firstInput->GetCompressionSettings();
790 else
792 delete firstInput;
793 fileMerger.SetMergeOptions(TString("FirstSrcCompression"));
794 } else {
796 fileMerger.SetMergeOptions(TString("DefaultCompression"));
797 }
798 }
799 if (args.fKeepCompressionAsIs && !args.fReoptimize)
800 Info(2) << "compression setting for meta data: " << newcomp << '\n';
801 else
802 Info(2) << "compression setting for all output: " << newcomp << '\n';
803
804 if (args.fAppend) {
805 if (!fileMerger.OutputFile(targetname, "UPDATE", newcomp)) {
806 Err() << "error opening target file for update :" << targetname << ".\n";
807 return 2;
808 }
809 } else if (!fileMerger.OutputFile(targetname, args.fForce, newcomp)) {
810 std::stringstream ss;
811 ss << "error opening target file (does " << targetname << " exist?).\n";
812 if (!args.fForce)
813 ss << "pass \"-f\" argument to force re-creation of output file.\n";
814 Err() << ss.str();
815 return 1;
816 }
817
818 auto step = (allSubfiles.size() + nProcesses - 1) / nProcesses;
819 if (multiproc && step < 3) {
820 // At least 3 files per process
821 step = 3;
822 nProcesses = (allSubfiles.size() + step - 1) / step;
823 Info(2) << "each process should handle at least 3 files for efficiency."
824 " Setting the number of processes to: "
825 << nProcesses << std::endl;
826 }
827 if (nProcesses == 1)
829
830 std::vector<std::string> partialFiles;
831
832#ifndef R__WIN32
833 // this is commented out only to try to prevent false positive detection
834 // from several anti-virus engines on Windows, and multiproc is not
835 // supported on Windows anyway
836 if (multiproc) {
837 auto uuid = TUUID();
838 auto partialTail = uuid.AsString();
839 for (auto i = 0; (i * step) < allSubfiles.size(); i++) {
840 std::stringstream buffer;
841 buffer << workingDir << "/partial" << i << "_" << partialTail << ".root";
842 partialFiles.emplace_back(buffer.str());
843 }
844 }
845#endif
846
847 auto mergeFiles = [&](TFileMerger &merger) {
848 if (args.fReoptimize) {
849 merger.SetFastMethod(kFALSE);
850 } else {
851 if (!args.fKeepCompressionAsIs && merger.HasCompressionChange()) {
852 // Don't warn if the user has requested any re-optimization.
853 Warn() << "Sources and Target have different compression settings\n"
854 "hadd merging will be slower\n";
855 }
856 }
857 merger.SetNotrees(args.fNoTrees);
858 merger.SetMergeOptions(TString(merger.GetMergeOptions()) + " " + cacheSize);
861 merger.SetIOFeatures(features);
864 if (extraFlags < 0)
865 return false;
867 if (args.fAppend)
869 else
871 Bool_t status = merger.PartialMerge(fileMergerFlags);
872 return status;
873 };
874
875 auto sequentialMerge = [&](TFileMerger &merger, int start, int nFiles) {
876 for (auto i = start; i < (start + nFiles) && i < static_cast<int>(allSubfiles.size()); i++) {
877 if (!merger.AddFile(allSubfiles[i].c_str())) {
878 if (args.fSkipErrors) {
879 Warn() << "skipping file with error: " << allSubfiles[i] << std::endl;
880 } else {
881 Err() << "exiting due to error in " << allSubfiles[i] << std::endl;
882 return kFALSE;
883 }
884 }
885 }
886 return mergeFiles(merger);
887 };
888
889 auto parallelMerge = [&](int start) {
891 mergerP.SetMsgPrefix("hadd");
892 mergerP.SetPrintLevel(gHaddVerbosity - 1);
893 if (maxopenedfiles > 0) {
894 mergerP.SetMaxOpenedFiles(maxopenedfiles / nProcesses);
895 }
896 if (!mergerP.OutputFile(partialFiles[start / step].c_str(), args.fForce, newcomp)) {
897 Err() << "error opening target partial file\n";
898 exit(1);
899 }
900 return sequentialMerge(mergerP, start, step);
901 };
902
903 auto reductionFunc = [&]() {
904 for (const auto &pf : partialFiles) {
905 fileMerger.AddFile(pf.c_str());
906 }
907 return mergeFiles(fileMerger);
908 };
909
910 Bool_t status;
911
912#ifndef R__WIN32
913 if (multiproc) {
915 auto res = p.Map(parallelMerge, ROOT::TSeqI(0, allSubfiles.size(), step));
916 status = std::accumulate(res.begin(), res.end(), 0U) == partialFiles.size();
917 if (status) {
918 status = reductionFunc();
919 } else {
920 Err() << "failed at the parallel stage\n";
921 }
922 if (!args.fDebug) {
923 for (const auto &pf : partialFiles) {
924 gSystem->Unlink(pf.c_str());
925 }
926 }
927 } else {
928 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
929 }
930#else
931 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
932#endif
933
934 if (status) {
935 Info(3) << "merged " << allSubfiles.size() << " (" << fileMerger.GetMergeList()->GetEntries()
936 << ") input (partial) files into " << targetname << "\n";
937 return 0;
938 } else {
939 Err() << "failure during the merge of " << allSubfiles.size() << " (" << fileMerger.GetMergeList()->GetEntries()
940 << ") input (partial) files into " << targetname << "\n";
941 return 1;
942 }
943}
int main()
Definition Prototype.cxx:12
#define c(i)
Definition RSha256.hxx:101
#define a(i)
Definition RSha256.hxx:99
size_t size(const MatrixT &matrix)
retrieve the size of a square matrix
bool Bool_t
Boolean (0=false, 1=true) (bool)
Definition RtypesCore.h:77
int Int_t
Signed integer 4 bytes (int)
Definition RtypesCore.h:59
constexpr Bool_t kFALSE
Definition RtypesCore.h:108
constexpr Bool_t kTRUE
Definition RtypesCore.h:107
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
void Info(const char *location, const char *msgfmt,...)
Use this function for informational messages.
Definition TError.cxx:241
winID h TVirtualViewer3D TVirtualGLPainter p
@ kReadPermission
Definition TSystem.h:55
R__EXTERN TSystem * gSystem
Definition TSystem.h:572
TIOFeatures provides the end-user with the ability to change the IO behavior of data written via a TT...
This class provides a simple interface to execute the same task multiple times in parallel,...
This class provides file copy and merging services.
Definition TFileMerger.h:30
@ kAll
Merge all type of objects (default)
Definition TFileMerger.h:87
@ kIncremental
Merge the input file with the content of the output file (if already existing).
Definition TFileMerger.h:82
@ kSkipListed
Skip objects specified in fObjectNames list.
Definition TFileMerger.h:91
@ kOnlyListed
Only the objects specified in fObjectNames list.
Definition TFileMerger.h:90
@ kRegular
Normal merge, overwriting the output file.
Definition TFileMerger.h:81
@ kFailOnError
The merging process will stop and yield failure when encountering invalid objects.
@ kSkipOnError
The merging process will skip invalid objects and continue.
A ROOT file is an on-disk file, usually with extension .root, that stores objects in a file-system-li...
Definition TFile.h:131
static TFile * Open(const char *name, Option_t *option="", const char *ftitle="", Int_t compress=ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault, Int_t netopt=0)
Create / open a file.
Definition TFile.cxx:3765
Basic string class.
Definition TString.h:138
TString & Append(const char *cs)
Definition TString.h:580
virtual int GetSysInfo(SysInfo_t *info) const
Returns static system info, like OS type, CPU type, number of CPUs RAM size, etc into the SysInfo_t s...
Definition TSystem.cxx:2471
virtual int Load(const char *module, const char *entry="", Bool_t system=kFALSE)
Load a shared library.
Definition TSystem.cxx:1870
virtual Bool_t AccessPathName(const char *path, EAccessMode mode=kFileExists)
Returns FALSE if one can access a file using the specified access mode.
Definition TSystem.cxx:1309
virtual int Unlink(const char *name)
Unlink, i.e.
Definition TSystem.cxx:1394
virtual const char * TempDirectory() const
Return a user configured or systemwide directory to create temporary files in.
Definition TSystem.cxx:1495
This class defines a UUID (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDent...
Definition TUUID.h:42
TLine * line
static EFlagResult FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional< T > &flagOut, std::optional< T > defaultVal=std::nullopt, FlagConvResult< T >(*conv)(const char *)=ConvertArg< T >)
Definition hadd.cxx:369
EFlagResult
Definition hadd.cxx:252
static bool ValidCompressionSettings(int compSettings)
Definition hadd.cxx:420
FlagConvResult< IntFlag_t > ConvertArg< IntFlag_t >(const char *arg)
Definition hadd.cxx:303
#define PARSE_FLAG(func,...)
static FlagConvResult< Int_t > ConvertFilterType(const char *arg)
Definition hadd.cxx:353
static Int_t ParseFilterFile(const std::optional< std::string > &filterFileName, std::optional< Int_t > objectFilterType, TFileMerger &fileMerger)
Definition hadd.cxx:615
static FlagConvResult< T > ConvertArg(const char *)
uint32_t IntFlag_t
Definition hadd.cxx:222
static constexpr int kDefaultHaddVerbosity
Definition hadd.cxx:177
static std::ostream & Info(int minLevel)
Definition hadd.cxx:215
static std::optional< HAddArgs > ParseArgs(int argc, char **argv)
Definition hadd.cxx:516
FlagConvResult< ROOT::TIOFeatures > ConvertArg< ROOT::TIOFeatures >(const char *arg)
Definition hadd.cxx:318
static int gHaddVerbosity
Definition hadd.cxx:178
static std::ostream & Warn()
Definition hadd.cxx:208
static FlagConvResult< TString > ConvertCacheSize(const char *arg)
Definition hadd.cxx:331
static std::ostream & Err()
Definition hadd.cxx:202
static EFlagResult FlagF(const char *arg, HAddArgs &args)
Definition hadd.cxx:454
static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
Definition hadd.cxx:254
static NullStream & GetNullStream()
Definition hadd.cxx:196
static std::optional< IntFlag_t > StrToUInt(const char *str)
Definition hadd.cxx:269
static constexpr const char kCommandLineOptionsHelp[]
void ToHumanReadableSize(value_type bytes, Bool_t si, Double_t *coeff, const char **units)
Return the size expressed in 'human readable' format.
EFromHumanReadableSize FromHumanReadableSize(std::string_view str, T &value)
Convert strings like the following into byte counts 5MB, 5 MB, 5M, 3.7GB, 123b, 456kB,...
EFlagResult fResult
Definition hadd.cxx:290
bool fNoFlagsAfterPositionalArguments
Definition hadd.cxx:249
bool fKeepCompressionAsIs
Definition hadd.cxx:231
bool fForce
Definition hadd.cxx:227
std::optional< TString > fCacheSize
Definition hadd.cxx:238
std::optional< IntFlag_t > fCompressionSettings
Definition hadd.cxx:242
bool fNoTrees
Definition hadd.cxx:225
std::optional< Int_t > fObjectFilterType
Definition hadd.cxx:237
int fFirstInputIdx
Definition hadd.cxx:245
std::optional< IntFlag_t > fNProcesses
Definition hadd.cxx:235
bool fUseFirstInputCompression
Definition hadd.cxx:232
std::optional< std::string > fObjectFilterFile
Definition hadd.cxx:236
bool fSkipErrors
Definition hadd.cxx:228
std::optional< IntFlag_t > fVerbosity
Definition hadd.cxx:241
std::optional< IntFlag_t > fMaxOpenedFiles
Definition hadd.cxx:240
std::optional< std::string > fWorkingDir
Definition hadd.cxx:234
int fOutputArgIdx
Definition hadd.cxx:244
bool fDebug
Definition hadd.cxx:230
bool fReoptimize
Definition hadd.cxx:229
std::optional< ROOT::TIOFeatures > fFeatures
Definition hadd.cxx:239
bool fAppend
Definition hadd.cxx:226
@ kUseCompiledDefault
Use the compile-time default setting.
Definition Compression.h:53
Int_t fCpus
Definition TSystem.h:162
TMarker m
Definition textangle.C:8