Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
hadd.cxx
Go to the documentation of this file.
1/**
2 \file hadd.cxx
3 \brief This program will merge compatible ROOT objects, such as histograms, Trees and RNTuples,
4 from a list of root files and write them to a target root file.
5 In order for a ROOT object to be mergeable, it must implement the Merge() function.
6 Non-mergeable objects will have all instances copied as-is into the target file.
7 The target file must not be identical to one of the source files.
8
9 Syntax:
10 ```{.cpp}
11 hadd [flags] targetfile source1 source2 ... [flags]
12 ```
13
14 Flags can be passed before or after the positional arguments.
15 The first positional (non-flag) argument will be interpreted as the targetfile.
16 After that, the first sequence of positional arguments will be interpreted as the input files.
17 If two sequences of positional arguments are separated by flags, hadd will emit an error and abort.
18
19 By default, any argument starting with `-` is interpreted as a flag. If you want to pass filenames
20 starting with `-` you need to pass them after `--`:
21 ```{.cpp}
22 hadd [flags] -- -file1 -file2 ...
23 ```
24 Note that in this case you need to pass ALL positional arguments after `--`.
25
26 If a flag requires an argument, the argument can be specified in any of these ways:
27
28 # All equally valid:
29 -j 16
30 -j16
31 -j=16
32
33 The first syntax is the preferred one since it's backward-compatible with previous versions of hadd.
34 The -f flag is an exception to this rule: it only supports the `-f[0-9]` syntax.
35
36 Note that merging multiple flags is NOT supported: `-jfa` will be interpreted as -j=fa, which is invalid!
37
38 The flags are as follows:
39
40 \param -a Append to the output
41 \param -cachesize <SIZE> Resize the prefetching cache used to speed up I/O operations (use 0 to disable).
42 \param -d <DIR> Carry out the partial multiprocess execution in the specified directory
43 \param -dbg Enable verbosity. If -j was specified, do not not delete partial files
44 stored inside working directory.
45 \param -experimental-io-features <FEATURES> Enables the corresponding experimental feature for output trees.
46 \see ROOT::Experimental::EIOFeatures
47 \param -f Force overwriting of output file.
48 \param -f[0-9] Set target compression algorithm `i` and level `j` passing the number `i*100 + j`, e.g. `-f505`.
49 The last digit (`j`) can be set from 0 = uncompressed to 9 = highly compressed.
50 The first digit (`i`) is 1 for ZLIB, 2 for LZMA, 4 for LZ4 and 5 for ZSTD.
51 Recommended numbers are 101 (ZLIB), 207 (LZMA), 404 (LZ4), 505 (ZSTD),
52 The default value for this flag is 101 (kDefaultZLIB).
53 See ROOT::RCompressionSetting and TFile::TFile documentation for more details.
54 \param -fk Sets the target file to contain the baskets with the same compression as the input files
55 (unless -O is specified). Compresses the meta data using the compression level specified
56 in the first input or the compression setting after fk (for example 505 when using -fk505)
57 \param -ff The compression level used is the one specified in the first input
58 \param -j [N_JOBS] Parallelise the execution in `N_JOBS` processes. If the number of processes is not specified,
59 or is 0, use the system maximum.
60 \param -k Skip corrupt or non-existent files, do not exit
61 \param -L <FILE> Read the list of objects from FILE and either only merge or skip those objects depending on
62 the value of "-Ltype". FILE must contain one object name per line, which cannot contain
63 whitespaces or '/'. You can also pass TDirectory names, which apply to the entire directory
64 content. Lines beginning with '#' are ignored. If this flag is passed, "-Ltype" MUST be
65 passed as well.
66 \param -Ltype <SkipListed|OnlyListed> Sets the type of operation performed on the objects listed in FILE given with
67 the
68 "-L" flag. "SkipListed" will skip all the listed objects; "OnlyListed" will only merge those
69 objects. If this flag is passed, "-L" must be passed as well.
70 \param -n <N_FILES> Open at most `N` files at once (use 0 to request to use the system maximum - which is also
71 the default). This number includes both the input reading files as well as the output file.
72 Thus, if set to 1, it will be automatically replaced to a minimum of 2. If set to a too large
73 value, it will be clipped to the system maximum.
74 \param -O Re-optimize basket size when merging TTree
75 \param -T Do not merge Trees
76 \param -v [LEVEL] Explicitly set the verbosity level:
77 <= 0 = only output errors;
78 1 = only output errors and warnings;
79 2 = output minimal informative messages, errors and warnings (default);
80 >= 3 = output all messages.
81 \return hadd returns a status code: 0 if OK, 1 otherwise
82
83 For example assume 3 files f1, f2, f3 containing histograms hn and Trees Tn
84 - f1 with h1 h2 h3 T1
85 - f2 with h1 h4 T1 T2
86 - f3 with h5
87 the result of
88 ```
89 hadd -f x.root f1.root f2.root f3.root
90 ```
91 will be a file x.root with h1 h2 h3 h4 h5 T1 T2
92 where
93 - h1 will be the sum of the 2 histograms in f1 and f2
94 - T1 will be the merge of the Trees in f1 and f2
95
96 The files may contain sub-directories.
97
98 If the source files contains histograms and Trees, one can skip
99 the Trees with
100 ```
101 hadd -T targetfile source1 source2 ...
102 ```
103
104 Wildcarding and indirect files are also supported
105 ```
106 hadd result.root myfil*.root
107 ```
108 will merge all files in myfil*.root
109 ```
110 hadd result.root file1.root @list.txt file2. root myfil*.root
111 ```
112 will merge file1.root, file2.root, all files in myfil*.root
113 and all files in the indirect text file list.txt ("@" as the first
114 character of the file indicates an indirect file. An indirect file
115 is a text file containing a list of other files, including other
116 indirect files, one line per file).
117
118 If the sources and and target compression levels are identical (default),
119 the program uses the TChain::Merge function with option "fast", ie
120 the merge will be done without unzipping or unstreaming the baskets
121 (i.e. direct copy of the raw byte on disk). The "fast" mode is typically
122 5 times faster than the mode unzipping and unstreaming the baskets.
123
124 If the option -cachesize is used, hadd will resize (or disable if 0) the
125 prefetching cache use to speed up I/O operations.
126
127 For options that take a size as argument, a decimal number of bytes is expected.
128 If the number ends with a `k`, `m`, `g`, etc., the number is multiplied
129 by 1000 (1K), 1000000 (1MB), 1000000000 (1G), etc.
130 If this prefix is followed by `i`, the number is multiplied by the traditional
131 1024 (1KiB), 1048576 (1MiB), 1073741824 (1GiB), etc.
132 The prefix can be optionally followed by B whose casing is ignored,
133 eg. 1k, 1K, 1Kb and 1KB are the same.
134
135 \note By default histograms are added. However hadd does not support the case where
136 histograms have their bit TH1::kIsAverage set.
137
138 \authors Rene Brun, Dirk Geppert, Sven A. Schmidt, Toby Burnett
139*/
140#include "Compression.h"
141#include "TClass.h"
142#include "TFile.h"
143#include "TFileMerger.h"
144#include "THashList.h"
145#include "TKey.h"
146#include "TSystem.h"
147#include "TUUID.h"
148
149#include <ROOT/RConfig.hxx>
150#include <ROOT/StringConv.hxx>
151#include <ROOT/TIOFeatures.hxx>
152
153#include "haddCommandLineOptionsHelp.h"
154
155#include <climits>
156#include <cstdlib>
157#include <filesystem>
158#include <fstream>
159#include <iostream>
160#include <optional>
161#include <sstream>
162#include <string>
163#include <streambuf>
164
165#ifndef R__WIN32
167#endif
168
169////////////////////////////////////////////////////////////////////////////////
170
171// NOTE: TFileMerger will use PrintLevel = gHaddVerbosity - 1. If PrintLevel is < 1, it will print nothing, otherwise
172// it will print everything. To give some granularity to hadd, we do the following:
173// gHaddVerbosity = 0: only print hadd errors
174// gHaddVerbosity = 1: only print hadd errors + warnings
175// gHaddVerbosity = 2: print hadd errors + warnings and TFileMerger messages
176// gHaddVerbosity > 2: print all hadd and TFileMerger messages.
177static constexpr int kDefaultHaddVerbosity = 2;
179
180namespace {
181
182class NullBuf : public std::streambuf {
183public:
184 int overflow(int c) final { return c; }
185};
186
187class NullStream : public std::ostream {
188 NullBuf fBuf;
189
190public:
191 NullStream() : std::ostream(&fBuf) {}
192};
193
194} // namespace
195
196static NullStream &GetNullStream()
197{
198 static NullStream nullStream;
199 return nullStream;
200}
201
202static inline std::ostream &Err()
203{
204 std::cerr << "Error in <hadd>: ";
205 return std::cerr;
206}
207
208static inline std::ostream &Warn()
209{
210 std::ostream &s = gHaddVerbosity < 1 ? GetNullStream() : std::cerr;
211 s << "Warning in <hadd>: ";
212 return s;
213}
214
215static inline std::ostream &Info(int minLevel)
216{
217 std::ostream &s = gHaddVerbosity < minLevel ? GetNullStream() : std::cerr;
218 s << "Info in <hadd>: ";
219 return s;
220}
221
222using IntFlag_t = uint32_t;
223
224struct HAddArgs {
227 bool fForce;
230 bool fDebug;
233 bool fHelp;
234
235 std::optional<std::string> fWorkingDir;
236 std::optional<IntFlag_t> fNProcesses;
237 std::optional<std::string> fObjectFilterFile;
238 std::optional<Int_t> fObjectFilterType;
239 std::optional<TString> fCacheSize;
240 std::optional<ROOT::TIOFeatures> fFeatures;
241 std::optional<IntFlag_t> fMaxOpenedFiles;
242 std::optional<IntFlag_t> fVerbosity;
243 std::optional<IntFlag_t> fCompressionSettings;
244
247 // This is set to true if and only if the user passed `--`. In this special
248 // case, we must not stop parsing positional arguments even if we find one
249 // that starts with a `-`.
251};
252
254
255static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
256{
257 const auto argLen = strlen(arg);
258 const auto flagLen = strlen(flagStr);
259 if (argLen == flagLen && strncmp(arg, flagStr, flagLen) == 0) {
260 if (flagOut)
261 Warn() << "duplicate flag: " << flagStr << "\n";
262 flagOut = true;
264 }
266}
267
268// NOTE: not using std::stoi or similar because they have bad error checking.
269// std::stoi will happily parse "120notvalid" as 120.
270static std::optional<IntFlag_t> StrToUInt(const char *str)
271{
272 if (!str)
273 return {};
274
275 uint32_t res = 0;
276 do {
277 if (!isdigit(*str))
278 return {};
279 if (res * 10 < res) // overflow is an error
280 return {};
281 res *= 10;
282 res += *str - '0';
283 } while (*++str);
284
285 return res;
286}
287
288template <typename T>
293
294template <typename T>
295static FlagConvResult<T> ConvertArg(const char *);
296
297template <>
299{
300 return {arg, EFlagResult::kParsed};
301}
302
303template <>
305{
306 // Don't even try to parse arg if it doesn't look like a number.
307 if (!isdigit(*arg))
308 return {0, EFlagResult::kIgnored};
309
310 auto intOpt = StrToUInt(arg);
311 if (intOpt)
312 return {*intOpt, EFlagResult::kParsed};
313
314 Err() << "error parsing integer argument '" << arg << "'\n";
315 return {0, EFlagResult::kErr};
316}
317
318template <>
320{
322 std::stringstream ss;
323 ss.str(arg);
324 std::string item;
325 while (std::getline(ss, item, ',')) {
326 if (!features.Set(item))
327 Warn() << "ignoring unknown feature request: " << item << "\n";
328 }
330}
331
333{
334 TString cacheSize;
335 int size;
338 Err() << "could not parse the cache size passed after -cachesize: '" << arg << "'\n";
339 return {"", EFlagResult::kErr};
341 double m;
342 const char *munit = nullptr;
344 Warn() << "the cache size passed after -cachesize is too large: " << arg << " is greater than " << m << munit
345 << ". We will use the maximum value.\n";
346 return {std::to_string(m) + munit, EFlagResult::kParsed};
347 } else {
348 cacheSize = "cachesize=";
349 cacheSize.Append(arg);
350 }
351 return {cacheSize, EFlagResult::kParsed};
352}
353
355{
356 if (strcmp(arg, "SkipListed") == 0)
358 if (strcmp(arg, "OnlyListed") == 0)
360
361 Err() << "invalid argument for -Ltype: '" << arg << "'. Can only be 'SkipListed' or 'OnlyListed' (case matters).\n";
362 return {{}, EFlagResult::kErr};
363}
364
365// Parses a flag that is followed by an argument of type T.
366// If `defaultVal` is provided, the following argument is optional and will be set to `defaultVal` if missing.
367// `conv` is used to convert the argument from string to its type T.
368template <typename T>
369static EFlagResult
370FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional<T> &flagOut,
371 std::optional<T> defaultVal = std::nullopt, FlagConvResult<T> (*conv)(const char *) = ConvertArg<T>)
372{
373 int argIdx = argIdxInOut;
374 const char *arg = argv[argIdx] + 1;
375 int argLen = strlen(arg);
376 int flagLen = strlen(flagStr);
377 const char *nxtArg = nullptr;
378
379 if (strncmp(arg, flagStr, flagLen) != 0)
381
382 bool argIsSeparate = false;
383 if (argLen > flagLen) {
384 // interpret anything after the flag as the argument.
385 nxtArg = arg + flagLen;
386 // Ignore one '=', if present
387 if (nxtArg[0] == '=')
388 ++nxtArg;
389 } else if (argLen == flagLen) {
390 argIsSeparate = true;
391 if (argIdx + 1 < argc) {
392 ++argIdxInOut;
394 } else {
395 Err() << "expected argument after '-" << flagStr << "' flag.\n";
396 return EFlagResult::kErr;
397 }
398 } else {
400 }
401
402 auto converted = conv(nxtArg);
403 if (converted.fResult == EFlagResult::kParsed) {
404 flagOut = converted.fValue;
405 } else if (converted.fResult == EFlagResult::kIgnored) {
406 if (defaultVal && argIsSeparate) {
408 // If we had tried parsing the next argument, step back one arg idx.
410 } else {
411 Err() << "the argument after '-" << flagStr << "' flag was not of the expected type.\n";
412 return EFlagResult::kErr;
413 }
414 } else {
415 return EFlagResult::kErr;
416 }
417
419}
420
422{
423 // Must be a number between 0 and 509 (with a 0 in the middle)
424 if (compSettings == 0)
425 return true;
426 // We also accept [1-9] as aliases of [101-109], but it's discouraged.
427 if (compSettings >= 1 && compSettings <= 9) {
428 Warn() << "interpreting " << compSettings << " as " << 100 + compSettings
429 << "."
430 " This behavior is deprecated, please use the full compression settings.\n";
431 return true;
432 }
433 return (compSettings >= 100 && compSettings <= 509) && ((compSettings / 10) % 10 == 0);
434}
435
436// The -f flag has a somewhat complicated logic.
437// We have 4 cases:
438// 1. -f
439// 2. -ff
440// 3. -fk
441// 4. -f[0-509]
442//
443// and a combination thereof (e.g. -fk101, -ff202, -ffk, -fk209)
444// -ff and -f[0-509] are incompatible.
445//
446// ALL these flags imply '-f' ("force overwrite"), but only if they parse successfully.
447// This means that if we see a -f[something] and that "something" doesn't parse to a valid
448// number between 0 and 509, or f or k, we consider the flag invalid and skip it without
449// setting any state.
450//
451// Note that we don't allow `-f [0-9]` because that would be a backwards-incompatible
452// change with the previous arg parsing semantic, changing the meaning of a cmdline like:
453//
454// $ hadd -f 200 f.root g.root # <- '200' is the output file, not an argument to -f!
455static EFlagResult FlagF(const char *arg, HAddArgs &args)
456{
457 if (arg[0] != 'f')
459
460 args.fForce = true;
461 const char *cur = arg + 1;
462 while (*cur) {
463 switch (cur[0]) {
464 case 'f':
466 Warn() << "duplicate flag: -ff\n";
467 if (args.fCompressionSettings) {
468 std::cerr
469 << "[err] Cannot specify both -ff and -f[0-9]. Either use the first input compression or specify it.\n";
470 return EFlagResult::kErr;
471 } else
472 args.fUseFirstInputCompression = true;
473 break;
474 case 'k':
475 if (args.fKeepCompressionAsIs)
476 Warn() << "duplicate flag: -fk\n";
477 args.fKeepCompressionAsIs = true;
478 break;
479 default:
480 if (isdigit(cur[0])) {
481 if (args.fUseFirstInputCompression) {
482 Err() << "cannot specify both -ff and -f[0-9]. Either use the first input compression or "
483 "specify it.\n";
484 return EFlagResult::kErr;
485 } else if (!args.fCompressionSettings) {
486 if (auto compLv = StrToUInt(cur)) {
489 // we can't see any other argument after the number, so we return here to avoid
490 // incorrectly parsing the rest of the characters in `arg`.
492 } else {
493 Err() << *compLv << " is not a supported compression settings.\n";
494 return EFlagResult::kErr;
495 }
496 } else {
497 Err() << "failed to parse compression settings '" << cur << "' as an integer.\n";
498 return EFlagResult::kErr;
499 }
500 } else {
501 Err() << "cannot specify -f[0-9] multiple times!\n";
502 return EFlagResult::kErr;
503 }
504 } else {
505 Err() << "invalid flag: " << arg << "\n";
506 return EFlagResult::kErr;
507 }
508 }
509 ++cur;
510 }
511
513}
514
515// Returns nullopt if any of the flags failed to parse.
516// If an unknown flag is encountered, it will print a warning and go on.
517static std::optional<HAddArgs> ParseArgs(int argc, char **argv)
518{
519 HAddArgs args{};
520
521 enum {
527
528 for (int argIdx = 1; argIdx < argc; ++argIdx) {
529 const char *argRaw = argv[argIdx];
530 if (!*argRaw)
531 continue;
532
533 if (!args.fNoFlagsAfterPositionalArguments && argRaw[0] == '-' && argRaw[1] != '\0') {
534 if (argRaw[1] == '-' && argRaw[2] == '\0') {
535 // special case `--`: force parsing to consider all future args as positional arguments.
537 Err()
538 << "found `--`, but we've already parsed (or are still parsing) a sequence of positional arguments!"
539 " This is not supported: you must have exactly one sequence of positional arguments, so if you"
540 " need to use `--` make sure to pass *all* positional arguments after it.";
541 return {};
542 }
543 args.fNoFlagsAfterPositionalArguments = true;
544 continue;
545 }
546
547 // parse flag
549
550 const char *arg = argRaw + 1;
551 bool validFlag = false;
552
553#define PARSE_FLAG(func, ...) \
554 do { \
555 if (!validFlag) { \
556 const auto res = func(__VA_ARGS__); \
557 if (res == EFlagResult::kErr) \
558 return {}; \
559 validFlag = res == EFlagResult::kParsed; \
560 } \
561 } while (0)
562
563 // NOTE: if two flags have the same prefix (e.g. -Ltype and -L) always put the longest one first!
564 PARSE_FLAG(FlagToggle, arg, "T", args.fNoTrees);
565 PARSE_FLAG(FlagToggle, arg, "a", args.fAppend);
566 PARSE_FLAG(FlagToggle, arg, "k", args.fSkipErrors);
567 PARSE_FLAG(FlagToggle, arg, "O", args.fReoptimize);
568 PARSE_FLAG(FlagToggle, arg, "dbg", args.fDebug);
569 // Accept --help, -help and -h as "help"
570 PARSE_FLAG(FlagToggle, arg, "-help", args.fHelp);
571 PARSE_FLAG(FlagToggle, arg, "help", args.fHelp);
572 PARSE_FLAG(FlagToggle, arg, "h", args.fHelp);
573 PARSE_FLAG(FlagArg, argc, argv, argIdx, "d", args.fWorkingDir);
574 PARSE_FLAG(FlagArg, argc, argv, argIdx, "j", args.fNProcesses, {0});
575 PARSE_FLAG(FlagArg, argc, argv, argIdx, "Ltype", args.fObjectFilterType, {}, ConvertFilterType);
576 PARSE_FLAG(FlagArg, argc, argv, argIdx, "L", args.fObjectFilterFile);
577 PARSE_FLAG(FlagArg, argc, argv, argIdx, "cachesize", args.fCacheSize, {}, ConvertCacheSize);
578 PARSE_FLAG(FlagArg, argc, argv, argIdx, "experimental-io-features", args.fFeatures);
579 PARSE_FLAG(FlagArg, argc, argv, argIdx, "n", args.fMaxOpenedFiles);
580 PARSE_FLAG(FlagArg, argc, argv, argIdx, "v", args.fVerbosity, {kDefaultHaddVerbosity});
581 PARSE_FLAG(FlagF, arg, args);
582
583#undef PARSE_FLAG
584
585 if (!validFlag)
586 Warn() << "unknown flag: " << argRaw << "\n";
587
588 } else if (!args.fOutputArgIdx) {
589 // First positional argument is the output
590 args.fOutputArgIdx = argIdx;
593 } else {
594 // We should be in the same positional argument group as the output, error otherwise
596 if (!args.fFirstInputIdx) {
597 args.fFirstInputIdx = argIdx;
598 }
599 } else {
600 Err() << "seen a positional argument '" << argRaw
601 << "' after some flags."
602 " Positional arguments were already parsed at this point (from '"
603 << argv[args.fOutputArgIdx]
604 << "' onwards), and you can only have one sequence of them, so you cannot pass more."
605 " Please group your positional arguments all together so that hadd works as you expect.\n"
606 "Cmdline: ";
607 for (int i = 0; i < argc; ++i)
608 std::cerr << argv[i] << " ";
609 std::cerr << "\n";
610
611 return {};
612 }
613 }
614 }
615
616 return args;
617}
618
619// Returns the flags to add to the file merger's flags, or -1 in case of errors.
620static Int_t ParseFilterFile(const std::optional<std::string> &filterFileName,
621 std::optional<Int_t> objectFilterType, TFileMerger &fileMerger)
622{
623 if (filterFileName) {
624 std::ifstream filterFile(*filterFileName);
625 if (!filterFile) {
626 Err() << "error opening filter file '" << *filterFileName << "'\n";
627 return -1;
628 }
630 std::string line;
631 std::string objPath;
632 int nObjects = 0;
633 while (std::getline(filterFile, line)) {
634 std::istringstream ss(line);
635 // only read exactly 1 token per line (strips any whitespaces and such)
636 objPath.clear();
637 ss >> objPath;
638 if (!objPath.empty() && objPath[0] != '#') {
639 filteredObjects.Append(objPath + ' ');
640 ++nObjects;
641 }
642 }
643
644 if (nObjects) {
645 Info(2) << "added " << nObjects << " object from filter file '" << *filterFileName << "'\n";
646 fileMerger.AddObjectNames(filteredObjects);
647 } else {
648 Warn() << "no objects were added from filter file '" << *filterFileName << "'\n";
649 }
650
651 assert(objectFilterType.has_value());
652 const auto filterFlag = *objectFilterType;
654 return filterFlag;
655 }
656 return 0;
657}
658
659int main(int argc, char **argv)
660{
661 const auto argsOpt = ParseArgs(argc, argv);
662 if (!argsOpt)
663 return 1;
664 const HAddArgs &args = *argsOpt;
665
666 if (args.fHelp) {
668 return 0;
669 }
670
672 Int_t maxopenedfiles = args.fMaxOpenedFiles.value_or(0);
674 Int_t newcomp = args.fCompressionSettings.value_or(-1);
675 TString cacheSize = args.fCacheSize.value_or("");
676
677 // For the -j flag (nProcesses), we check if the flag is present and, if so, if it has a
678 // valid value (i.e. any value > 0).
679 // If the flag is present at all, we do multiprocessing. If the value of nProcesses is invalid,
680 // we default to the number of cpus on the machine.
681 Bool_t multiproc = args.fNProcesses.has_value();
682 int nProcesses;
683 if (args.fNProcesses && *args.fNProcesses > 0) {
684 nProcesses = *args.fNProcesses;
685 } else {
686 SysInfo_t s;
687 gSystem->GetSysInfo(&s);
688 nProcesses = s.fCpus;
689 }
690 if (multiproc)
691 Info(2) << "parallelizing with " << nProcesses << " processes.\n";
692
693 // If the user specified a workingDir, use that. Otherwise, default to the system temp dir.
694 std::string workingDir;
695 if (!args.fWorkingDir) {
697 } else if (args.fWorkingDir && gSystem->AccessPathName(args.fWorkingDir->c_str())) {
698 Err() << "could not access the directory specified: " << *args.fWorkingDir << ".\n";
699 return 1;
700 } else {
701 workingDir = *args.fWorkingDir;
702 }
703
704 // Verify that -L and -Ltype are either both present or both absent.
705 if (args.fObjectFilterFile.has_value() != args.fObjectFilterType.has_value()) {
706 Err() << "-L must always be passed along with -Ltype.\n";
707 return 1;
708 }
709
710 const char *targetname = 0;
711 if (!args.fOutputArgIdx) {
712 Err() << "missing output file.\n";
714 return 1;
715 }
716 if (!args.fFirstInputIdx) {
717 Err() << "missing input file.\n";
719 return 1;
720 }
722
723 Info(2) << "target file: " << targetname << "\n";
724
725 if (args.fCacheSize)
726 Info(2) << "Using " << cacheSize << "\n";
727
728 ////////////////////////////// end flags processing /////////////////////////////////
729
730 gSystem->Load("libTreePlayer");
731
733 fileMerger.SetMsgPrefix("hadd");
734 fileMerger.SetPrintLevel(gHaddVerbosity - 1);
735 if (maxopenedfiles > 0) {
736 fileMerger.SetMaxOpenedFiles(maxopenedfiles);
737 }
738 // The following section will collect all input filenames into a vector,
739 // including those listed within an indirect file.
740 // If any file can not be accessed, it will error out, unless args.fSkipErrors is true
741 std::vector<std::string> allSubfiles;
742 for (int a = args.fFirstInputIdx; a < argc; ++a) {
743 if (!args.fNoFlagsAfterPositionalArguments && argv[a] && argv[a][0] == '-') {
744 break;
745 }
746 if (argv[a] && argv[a][0] == '@') {
747 std::ifstream indirect_file(argv[a] + 1);
748 if (!indirect_file.is_open()) {
749 Err() << "could not open indirect file " << (argv[a] + 1) << std::endl;
750 if (!args.fSkipErrors)
751 return 1;
752 } else {
753 std::string line;
754 while (indirect_file) {
755 if (std::getline(indirect_file, line) && line.length()) {
756 if (gSystem->AccessPathName(line.c_str(), kReadPermission) == kTRUE) {
757 Err() << "could not validate the file name \"" << line << "\" within indirect file "
758 << (argv[a] + 1) << std::endl;
759 if (!args.fSkipErrors)
760 return 1;
761 } else if (std::filesystem::exists(targetname) && std::filesystem::equivalent(line, targetname)) {
762 Err() << "file " << line << " cannot be both the target and an input!\n";
763 if (!args.fSkipErrors)
764 return 1;
765 } else {
766 allSubfiles.emplace_back(line);
767 }
768 }
769 }
770 }
771 } else {
772 const std::string line = argv[a];
773 if (gSystem->AccessPathName(line.c_str(), kReadPermission) == kTRUE) {
774 Err() << "could not validate argument \"" << line << "\" as input file " << std::endl;
775 if (!args.fSkipErrors)
776 return 1;
777 } else if (std::filesystem::exists(targetname) && std::filesystem::equivalent(line, targetname)) {
778 Err() << "file " << line << " cannot be both the target and an input!\n";
779 if (!args.fSkipErrors)
780 return 1;
781 } else {
782 allSubfiles.emplace_back(line);
783 }
784 }
785 }
786 if (allSubfiles.empty()) {
787 Err() << "could not find any valid input file " << std::endl;
788 return 1;
789 }
790 // The next snippet determines the output compression if unset
791 if (newcomp == -1) {
793 // grab from the first file.
794 TFile *firstInput = TFile::Open(allSubfiles.front().c_str());
795 if (firstInput && !firstInput->IsZombie())
796 newcomp = firstInput->GetCompressionSettings();
797 else
799 delete firstInput;
800 fileMerger.SetMergeOptions(TString("FirstSrcCompression"));
801 } else {
803 fileMerger.SetMergeOptions(TString("DefaultCompression"));
804 }
805 }
806 if (args.fKeepCompressionAsIs && !args.fReoptimize)
807 Info(2) << "compression setting for meta data: " << newcomp << '\n';
808 else
809 Info(2) << "compression setting for all output: " << newcomp << '\n';
810
811 if (args.fAppend) {
812 if (!fileMerger.OutputFile(targetname, "UPDATE", newcomp)) {
813 Err() << "error opening target file for update :" << targetname << ".\n";
814 return 2;
815 }
816 } else if (!fileMerger.OutputFile(targetname, args.fForce, newcomp)) {
817 std::stringstream ss;
818 ss << "error opening target file (does " << targetname << " exist?).\n";
819 if (!args.fForce)
820 ss << "pass \"-f\" argument to force re-creation of output file.\n";
821 Err() << ss.str();
822 return 1;
823 }
824
825 auto step = (allSubfiles.size() + nProcesses - 1) / nProcesses;
826 if (multiproc && step < 3) {
827 // At least 3 files per process
828 step = 3;
829 nProcesses = (allSubfiles.size() + step - 1) / step;
830 Info(2) << "each process should handle at least 3 files for efficiency."
831 " Setting the number of processes to: "
832 << nProcesses << std::endl;
833 }
834 if (nProcesses == 1)
836
837 std::vector<std::string> partialFiles;
838
839#ifndef R__WIN32
840 // this is commented out only to try to prevent false positive detection
841 // from several anti-virus engines on Windows, and multiproc is not
842 // supported on Windows anyway
843 if (multiproc) {
844 auto uuid = TUUID();
845 auto partialTail = uuid.AsString();
846 for (auto i = 0; (i * step) < allSubfiles.size(); i++) {
847 std::stringstream buffer;
848 buffer << workingDir << "/partial" << i << "_" << partialTail << ".root";
849 partialFiles.emplace_back(buffer.str());
850 }
851 }
852#endif
853
854 auto mergeFiles = [&](TFileMerger &merger) {
855 if (args.fReoptimize) {
856 merger.SetFastMethod(kFALSE);
857 } else {
858 if (!args.fKeepCompressionAsIs && merger.HasCompressionChange()) {
859 // Don't warn if the user has requested any re-optimization.
860 Warn() << "Sources and Target have different compression settings\n"
861 "hadd merging will be slower\n";
862 }
863 }
864 merger.SetNotrees(args.fNoTrees);
865 merger.SetMergeOptions(TString(merger.GetMergeOptions()) + " " + cacheSize);
868 merger.SetIOFeatures(features);
871 if (extraFlags < 0)
872 return false;
874 if (args.fAppend)
876 else
878 Bool_t status = merger.PartialMerge(fileMergerFlags);
879 return status;
880 };
881
882 auto sequentialMerge = [&](TFileMerger &merger, int start, int nFiles) {
883 for (auto i = start; i < (start + nFiles) && i < static_cast<int>(allSubfiles.size()); i++) {
884 if (!merger.AddFile(allSubfiles[i].c_str())) {
885 if (args.fSkipErrors) {
886 Warn() << "skipping file with error: " << allSubfiles[i] << std::endl;
887 } else {
888 Err() << "exiting due to error in " << allSubfiles[i] << std::endl;
889 return kFALSE;
890 }
891 }
892 }
893 return mergeFiles(merger);
894 };
895
896 auto parallelMerge = [&](int start) {
898 mergerP.SetMsgPrefix("hadd");
899 mergerP.SetPrintLevel(gHaddVerbosity - 1);
900 if (maxopenedfiles > 0) {
901 mergerP.SetMaxOpenedFiles(maxopenedfiles / nProcesses);
902 }
903 if (!mergerP.OutputFile(partialFiles[start / step].c_str(), args.fForce, newcomp)) {
904 Err() << "error opening target partial file\n";
905 exit(1);
906 }
907 return sequentialMerge(mergerP, start, step);
908 };
909
910 auto reductionFunc = [&]() {
911 for (const auto &pf : partialFiles) {
912 fileMerger.AddFile(pf.c_str());
913 }
914 return mergeFiles(fileMerger);
915 };
916
917 Bool_t status;
918
919#ifndef R__WIN32
920 if (multiproc) {
922 auto res = p.Map(parallelMerge, ROOT::TSeqI(0, allSubfiles.size(), step));
923 status = std::accumulate(res.begin(), res.end(), 0U) == partialFiles.size();
924 if (status) {
925 status = reductionFunc();
926 } else {
927 Err() << "failed at the parallel stage\n";
928 }
929 if (!args.fDebug) {
930 for (const auto &pf : partialFiles) {
931 gSystem->Unlink(pf.c_str());
932 }
933 }
934 } else {
935 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
936 }
937#else
938 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
939#endif
940
941 if (status) {
942 Info(3) << "merged " << allSubfiles.size() << " (" << fileMerger.GetMergeList()->GetEntries()
943 << ") input (partial) files into " << targetname << "\n";
944 return 0;
945 } else {
946 Err() << "failure during the merge of " << allSubfiles.size() << " (" << fileMerger.GetMergeList()->GetEntries()
947 << ") input (partial) files into " << targetname << "\n";
948 return 1;
949 }
950}
int main()
Definition Prototype.cxx:12
#define c(i)
Definition RSha256.hxx:101
#define a(i)
Definition RSha256.hxx:99
size_t size(const MatrixT &matrix)
retrieve the size of a square matrix
bool Bool_t
Boolean (0=false, 1=true) (bool)
Definition RtypesCore.h:77
int Int_t
Signed integer 4 bytes (int)
Definition RtypesCore.h:59
constexpr Bool_t kFALSE
Definition RtypesCore.h:108
constexpr Bool_t kTRUE
Definition RtypesCore.h:107
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
void Info(const char *location, const char *msgfmt,...)
Use this function for informational messages.
Definition TError.cxx:241
winID h TVirtualViewer3D TVirtualGLPainter p
@ kReadPermission
Definition TSystem.h:55
R__EXTERN TSystem * gSystem
Definition TSystem.h:572
TIOFeatures provides the end-user with the ability to change the IO behavior of data written via a TT...
This class provides a simple interface to execute the same task multiple times in parallel,...
This class provides file copy and merging services.
Definition TFileMerger.h:30
@ kAll
Merge all type of objects (default)
Definition TFileMerger.h:87
@ kIncremental
Merge the input file with the content of the output file (if already existing).
Definition TFileMerger.h:82
@ kSkipListed
Skip objects specified in fObjectNames list.
Definition TFileMerger.h:91
@ kOnlyListed
Only the objects specified in fObjectNames list.
Definition TFileMerger.h:90
@ kRegular
Normal merge, overwriting the output file.
Definition TFileMerger.h:81
@ kFailOnError
The merging process will stop and yield failure when encountering invalid objects.
@ kSkipOnError
The merging process will skip invalid objects and continue.
A ROOT file is an on-disk file, usually with extension .root, that stores objects in a file-system-li...
Definition TFile.h:131
static TFile * Open(const char *name, Option_t *option="", const char *ftitle="", Int_t compress=ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault, Int_t netopt=0)
Create / open a file.
Definition TFile.cxx:3764
Basic string class.
Definition TString.h:138
TString & Append(const char *cs)
Definition TString.h:581
virtual int GetSysInfo(SysInfo_t *info) const
Returns static system info, like OS type, CPU type, number of CPUs RAM size, etc into the SysInfo_t s...
Definition TSystem.cxx:2469
virtual int Load(const char *module, const char *entry="", Bool_t system=kFALSE)
Load a shared library.
Definition TSystem.cxx:1868
virtual Bool_t AccessPathName(const char *path, EAccessMode mode=kFileExists)
Returns FALSE if one can access a file using the specified access mode.
Definition TSystem.cxx:1307
virtual int Unlink(const char *name)
Unlink, i.e.
Definition TSystem.cxx:1392
virtual const char * TempDirectory() const
Return a user configured or systemwide directory to create temporary files in.
Definition TSystem.cxx:1493
This class defines a UUID (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDent...
Definition TUUID.h:42
TLine * line
static EFlagResult FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional< T > &flagOut, std::optional< T > defaultVal=std::nullopt, FlagConvResult< T >(*conv)(const char *)=ConvertArg< T >)
Definition hadd.cxx:370
EFlagResult
Definition hadd.cxx:253
static bool ValidCompressionSettings(int compSettings)
Definition hadd.cxx:421
FlagConvResult< IntFlag_t > ConvertArg< IntFlag_t >(const char *arg)
Definition hadd.cxx:304
#define PARSE_FLAG(func,...)
static FlagConvResult< Int_t > ConvertFilterType(const char *arg)
Definition hadd.cxx:354
static Int_t ParseFilterFile(const std::optional< std::string > &filterFileName, std::optional< Int_t > objectFilterType, TFileMerger &fileMerger)
Definition hadd.cxx:620
static FlagConvResult< T > ConvertArg(const char *)
uint32_t IntFlag_t
Definition hadd.cxx:222
static constexpr int kDefaultHaddVerbosity
Definition hadd.cxx:177
static std::ostream & Info(int minLevel)
Definition hadd.cxx:215
static std::optional< HAddArgs > ParseArgs(int argc, char **argv)
Definition hadd.cxx:517
FlagConvResult< ROOT::TIOFeatures > ConvertArg< ROOT::TIOFeatures >(const char *arg)
Definition hadd.cxx:319
static int gHaddVerbosity
Definition hadd.cxx:178
static std::ostream & Warn()
Definition hadd.cxx:208
static FlagConvResult< TString > ConvertCacheSize(const char *arg)
Definition hadd.cxx:332
static std::ostream & Err()
Definition hadd.cxx:202
static EFlagResult FlagF(const char *arg, HAddArgs &args)
Definition hadd.cxx:455
static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
Definition hadd.cxx:255
static NullStream & GetNullStream()
Definition hadd.cxx:196
static std::optional< IntFlag_t > StrToUInt(const char *str)
Definition hadd.cxx:270
static constexpr const char kCommandLineOptionsHelp[]
void ToHumanReadableSize(value_type bytes, Bool_t si, Double_t *coeff, const char **units)
Return the size expressed in 'human readable' format.
EFromHumanReadableSize FromHumanReadableSize(std::string_view str, T &value)
Convert strings like the following into byte counts 5MB, 5 MB, 5M, 3.7GB, 123b, 456kB,...
EFlagResult fResult
Definition hadd.cxx:291
bool fNoFlagsAfterPositionalArguments
Definition hadd.cxx:250
bool fHelp
Definition hadd.cxx:233
bool fKeepCompressionAsIs
Definition hadd.cxx:231
bool fForce
Definition hadd.cxx:227
std::optional< TString > fCacheSize
Definition hadd.cxx:239
std::optional< IntFlag_t > fCompressionSettings
Definition hadd.cxx:243
bool fNoTrees
Definition hadd.cxx:225
std::optional< Int_t > fObjectFilterType
Definition hadd.cxx:238
int fFirstInputIdx
Definition hadd.cxx:246
std::optional< IntFlag_t > fNProcesses
Definition hadd.cxx:236
bool fUseFirstInputCompression
Definition hadd.cxx:232
std::optional< std::string > fObjectFilterFile
Definition hadd.cxx:237
bool fSkipErrors
Definition hadd.cxx:228
std::optional< IntFlag_t > fVerbosity
Definition hadd.cxx:242
std::optional< IntFlag_t > fMaxOpenedFiles
Definition hadd.cxx:241
std::optional< std::string > fWorkingDir
Definition hadd.cxx:235
int fOutputArgIdx
Definition hadd.cxx:245
bool fDebug
Definition hadd.cxx:230
bool fReoptimize
Definition hadd.cxx:229
std::optional< ROOT::TIOFeatures > fFeatures
Definition hadd.cxx:240
bool fAppend
Definition hadd.cxx:226
@ kUseCompiledDefault
Use the compile-time default setting.
Definition Compression.h:53
Int_t fCpus
Definition TSystem.h:162
TMarker m
Definition textangle.C:8