Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
hadd.cxx
Go to the documentation of this file.
1/**
2 \file hadd.cxx
3 \brief This program will merge compatible ROOT objects, such as histograms, Trees and RNTuples,
4 from a list of root files and write them to a target root file.
5 In order for a ROOT object to be mergeable, it must implement the Merge() function.
6 Non-mergeable objects will have all instances copied as-is into the target file.
7 The target file must not be identical to one of the source files.
8
9 Syntax:
10 ```{.cpp}
11 hadd [flags] targetfile source1 source2 ... [flags]
12 ```
13
14 Flags can be passed before or after the positional arguments.
15 The first positional (non-flag) argument will be interpreted as the targetfile.
16 After that, the first sequence of positional arguments will be interpreted as the input files.
17 If two sequences of positional arguments are separated by flags, hadd will emit an error and abort.
18
19 By default, any argument starting with `-` is interpreted as a flag. If you want to pass filenames
20 starting with `-` you need to pass them after `--`:
21 ```{.cpp}
22 hadd [flags] -- -file1 -file2 ...
23 ```
24 Note that in this case you need to pass ALL positional arguments after `--`.
25
26 If a flag requires an argument, the argument can be specified in any of these ways:
27
28 # All equally valid:
29 -j 16
30 -j16
31 -j=16
32
33 The first syntax is the preferred one since it's backward-compatible with previous versions of hadd.
34 The -f flag is an exception to this rule: it only supports the `-f[0-9]` syntax.
35
36 Note that merging multiple flags is NOT supported: `-jfa` will be interpreted as -j=fa, which is invalid!
37
38 The flags are as follows:
39
40 \param -a Append to the output
41 \param -cachesize <SIZE> Resize the prefetching cache used to speed up I/O operations (use 0 to disable).
42 \param -d <DIR> Carry out the partial multiprocess execution in the specified directory
43 \param -dbg Enable verbosity. If -j was specified, do not not delete partial files
44 stored inside working directory.
45 \param -experimental-io-features <FEATURES> Enables the corresponding experimental feature for output trees.
46 \see ROOT::Experimental::EIOFeatures
47 \param -f Force overwriting of output file.
48 \param -f[0-9] Set target compression algorithm `i` and level `j` passing the number `i*100 + j`, e.g. `-f505`.
49 The last digit (`j`) can be set from 0 = uncompressed to 9 = highly compressed.
50 The first digit (`i`) is 1 for ZLIB, 2 for LZMA, 4 for LZ4 and 5 for ZSTD.
51 Recommended numbers are 101 (ZLIB), 207 (LZMA), 404 (LZ4), 505 (ZSTD),
52 The default value for this flag is 101 (kDefaultZLIB).
53 See ROOT::RCompressionSetting and TFile::TFile documentation for more details.
54 \param -fk Sets the target file to contain the baskets with the same compression as the input files
55 (unless -O is specified). Compresses the meta data using the compression level specified
56 in the first input or the compression setting after fk (for example 505 when using -fk505)
57 \param -ff The compression level used is the one specified in the first input
58 \param -j [N_JOBS] Parallelise the execution in `N_JOBS` processes. If the number of processes is not specified,
59 or is 0, use the system maximum.
60 \param -k Skip corrupt or non-existent files, do not exit
61 \param -L <FILE> Read the list of objects from FILE and either only merge or skip those objects depending on
62 the value of "-Ltype". FILE must contain one object name per line, which cannot contain
63 whitespaces or '/'. You can also pass TDirectory names, which apply to the entire directory
64 content. Lines beginning with '#' are ignored. If this flag is passed, "-Ltype" MUST be
65 passed as well.
66 \param -Ltype <SkipListed|OnlyListed> Sets the type of operation performed on the objects listed in FILE given with
67 the
68 "-L" flag. "SkipListed" will skip all the listed objects; "OnlyListed" will only merge those
69 objects. If this flag is passed, "-L" must be passed as well.
70 \param -n <N_FILES> Open at most `N` files at once (use 0 to request to use the system maximum - which is also
71 the default). This number includes both the input reading files as well as the output file.
72 Thus, if set to 1, it will be automatically replaced to a minimum of 2. If set to a too large
73 value, it will be clipped to the system maximum.
74 \param -O Re-optimize basket size when merging TTree
75 \param -T Do not merge Trees
76 \param -v [LEVEL] Explicitly set the verbosity level:
77 <= 0 = only output errors;
78 1 = only output errors and warnings;
79 2 = output minimal informative messages, errors and warnings (default);
80 >= 3 = output all messages.
81 \return hadd returns a status code: 0 if OK, 1 otherwise
82
83 For example assume 3 files f1, f2, f3 containing histograms hn and Trees Tn
84 - f1 with h1 h2 h3 T1
85 - f2 with h1 h4 T1 T2
86 - f3 with h5
87 the result of
88 ```
89 hadd -f x.root f1.root f2.root f3.root
90 ```
91 will be a file x.root with h1 h2 h3 h4 h5 T1 T2
92 where
93 - h1 will be the sum of the 2 histograms in f1 and f2
94 - T1 will be the merge of the Trees in f1 and f2
95
96 The files may contain sub-directories.
97
98 If the source files contains histograms and Trees, one can skip
99 the Trees with
100 ```
101 hadd -T targetfile source1 source2 ...
102 ```
103
104 Wildcarding and indirect files are also supported
105 ```
106 hadd result.root myfil*.root
107 ```
108 will merge all files in myfil*.root
109 ```
110 hadd result.root file1.root @list.txt file2. root myfil*.root
111 ```
112 will merge file1.root, file2.root, all files in myfil*.root
113 and all files in the indirect text file list.txt ("@" as the first
114 character of the file indicates an indirect file. An indirect file
115 is a text file containing a list of other files, including other
116 indirect files, one line per file).
117
118 If the sources and and target compression levels are identical (default),
119 the program uses the TChain::Merge function with option "fast", ie
120 the merge will be done without unzipping or unstreaming the baskets
121 (i.e. direct copy of the raw byte on disk). The "fast" mode is typically
122 5 times faster than the mode unzipping and unstreaming the baskets.
123
124 If the option -cachesize is used, hadd will resize (or disable if 0) the
125 prefetching cache use to speed up I/O operations.
126
127 For options that take a size as argument, a decimal number of bytes is expected.
128 If the number ends with a `k`, `m`, `g`, etc., the number is multiplied
129 by 1000 (1K), 1000000 (1MB), 1000000000 (1G), etc.
130 If this prefix is followed by `i`, the number is multiplied by the traditional
131 1024 (1KiB), 1048576 (1MiB), 1073741824 (1GiB), etc.
132 The prefix can be optionally followed by B whose casing is ignored,
133 eg. 1k, 1K, 1Kb and 1KB are the same.
134
135 \note By default histograms are added. However hadd does not support the case where
136 histograms have their bit TH1::kIsAverage set.
137
138 \authors Rene Brun, Dirk Geppert, Sven A. Schmidt, Toby Burnett
139*/
140#include "Compression.h"
141#include "TClass.h"
142#include "TFile.h"
143#include "TFileMerger.h"
144#include "THashList.h"
145#include "TKey.h"
146#include "TSystem.h"
147#include "TUUID.h"
148
149#include <ROOT/RConfig.hxx>
150#include <ROOT/StringConv.hxx>
151#include <ROOT/TIOFeatures.hxx>
152
153#include "haddCommandLineOptionsHelp.h"
154
155#include <climits>
156#include <cstdlib>
157#include <filesystem>
158#include <fstream>
159#include <iostream>
160#include <optional>
161#include <sstream>
162#include <string>
163#include <streambuf>
164
165#ifndef R__WIN32
167#endif
168
169////////////////////////////////////////////////////////////////////////////////
170
171// NOTE: TFileMerger will use PrintLevel = gHaddVerbosity - 1. If PrintLevel is < 1, it will print nothing, otherwise
172// it will print everything. To give some granularity to hadd, we do the following:
173// gHaddVerbosity = 0: only print hadd errors
174// gHaddVerbosity = 1: only print hadd errors + warnings
175// gHaddVerbosity = 2: print hadd errors + warnings and TFileMerger messages
176// gHaddVerbosity > 2: print all hadd and TFileMerger messages.
177static constexpr int kDefaultHaddVerbosity = 2;
179
180namespace {
181
182class NullBuf : public std::streambuf {
183public:
184 int overflow(int c) final { return c; }
185};
186
187class NullStream : public std::ostream {
188 NullBuf fBuf;
189
190public:
191 NullStream() : std::ostream(&fBuf) {}
192};
193
194} // namespace
195
196static NullStream &GetNullStream()
197{
198 static NullStream nullStream;
199 return nullStream;
200}
201
202static inline std::ostream &Err()
203{
204 std::cerr << "Error in <hadd>: ";
205 return std::cerr;
206}
207
208static inline std::ostream &Warn()
209{
210 std::ostream &s = gHaddVerbosity < 1 ? GetNullStream() : std::cerr;
211 s << "Warning in <hadd>: ";
212 return s;
213}
214
215static inline std::ostream &Info(int minLevel)
216{
217 std::ostream &s = gHaddVerbosity < minLevel ? GetNullStream() : std::cerr;
218 s << "Info in <hadd>: ";
219 return s;
220}
221
222using IntFlag_t = uint32_t;
223
224struct HAddArgs {
227 bool fForce;
230 bool fDebug;
233 bool fHelp;
234
235 std::optional<std::string> fWorkingDir;
236 std::optional<IntFlag_t> fNProcesses;
237 std::optional<std::string> fObjectFilterFile;
238 std::optional<Int_t> fObjectFilterType;
239 std::optional<TString> fCacheSize;
240 std::optional<ROOT::TIOFeatures> fFeatures;
241 std::optional<IntFlag_t> fMaxOpenedFiles;
242 std::optional<IntFlag_t> fVerbosity;
243 std::optional<IntFlag_t> fCompressionSettings;
244
247 // This is set to true if and only if the user passed `--`. In this special
248 // case, we must not stop parsing positional arguments even if we find one
249 // that starts with a `-`.
251};
252
254
255static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
256{
257 const auto argLen = strlen(arg);
258 const auto flagLen = strlen(flagStr);
259 if (argLen == flagLen && strncmp(arg, flagStr, flagLen) == 0) {
260 if (flagOut)
261 Warn() << "duplicate flag: " << flagStr << "\n";
262 flagOut = true;
264 }
266}
267
268// NOTE: not using std::stoi or similar because they have bad error checking.
269// std::stoi will happily parse "120notvalid" as 120.
270static std::optional<IntFlag_t> StrToUInt(const char *str)
271{
272 if (!str)
273 return {};
274
275 uint32_t res = 0;
276 do {
277 if (!isdigit(*str))
278 return {};
279 if (res * 10 < res) // overflow is an error
280 return {};
281 res *= 10;
282 res += *str - '0';
283 } while (*++str);
284
285 return res;
286}
287
288template <typename T>
293
294template <typename T>
295static FlagConvResult<T> ConvertArg(const char *);
296
297template <>
299{
300 return {arg, EFlagResult::kParsed};
301}
302
303template <>
305{
306 // Don't even try to parse arg if it doesn't look like a number.
307 if (!isdigit(*arg))
308 return {0, EFlagResult::kIgnored};
309
310 auto intOpt = StrToUInt(arg);
311 if (intOpt)
312 return {*intOpt, EFlagResult::kParsed};
313
314 Err() << "error parsing integer argument '" << arg << "'\n";
315 return {0, EFlagResult::kErr};
316}
317
318template <>
320{
322 std::stringstream ss;
323 ss.str(arg);
324 std::string item;
325 while (std::getline(ss, item, ',')) {
326 if (!features.Set(item))
327 Warn() << "ignoring unknown feature request: " << item << "\n";
328 }
330}
331
333{
334 TString cacheSize;
335 int size;
338 Err() << "could not parse the cache size passed after -cachesize: '" << arg << "'\n";
339 return {"", EFlagResult::kErr};
341 double m;
342 const char *munit = nullptr;
344 Warn() << "the cache size passed after -cachesize is too large: " << arg << " is greater than " << m << munit
345 << ". We will use the maximum value.\n";
346 return {std::to_string(m) + munit, EFlagResult::kParsed};
347 } else {
348 cacheSize = "cachesize=";
349 cacheSize.Append(arg);
350 }
351 return {cacheSize, EFlagResult::kParsed};
352}
353
355{
356 if (strcmp(arg, "SkipListed") == 0)
358 if (strcmp(arg, "OnlyListed") == 0)
360
361 Err() << "invalid argument for -Ltype: '" << arg << "'. Can only be 'SkipListed' or 'OnlyListed' (case matters).\n";
362 return {{}, EFlagResult::kErr};
363}
364
365// Parses a flag that is followed by an argument of type T.
366// If `defaultVal` is provided, the following argument is optional and will be set to `defaultVal` if missing.
367// `conv` is used to convert the argument from string to its type T.
368template <typename T>
369static EFlagResult
370FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional<T> &flagOut,
371 std::optional<T> defaultVal = std::nullopt, FlagConvResult<T> (*conv)(const char *) = ConvertArg<T>)
372{
373 int argIdx = argIdxInOut;
374 const char *arg = argv[argIdx] + 1;
375 int argLen = strlen(arg);
376 int flagLen = strlen(flagStr);
377 const char *nxtArg = nullptr;
378
379 if (strncmp(arg, flagStr, flagLen) != 0)
381
382 bool argIsSeparate = false;
383 if (argLen > flagLen) {
384 // interpret anything after the flag as the argument.
385 nxtArg = arg + flagLen;
386 // Ignore one '=', if present
387 if (nxtArg[0] == '=')
388 ++nxtArg;
389 } else if (argLen == flagLen) {
390 argIsSeparate = true;
391 if (argIdx + 1 < argc) {
392 ++argIdxInOut;
394 } else {
395 Err() << "expected argument after '-" << flagStr << "' flag.\n";
396 return EFlagResult::kErr;
397 }
398 } else {
400 }
401
402 auto converted = conv(nxtArg);
403 if (converted.fResult == EFlagResult::kParsed) {
404 flagOut = converted.fValue;
405 } else if (converted.fResult == EFlagResult::kIgnored) {
406 if (defaultVal && argIsSeparate) {
408 // If we had tried parsing the next argument, step back one arg idx.
410 } else {
411 Err() << "the argument after '-" << flagStr << "' flag was not of the expected type.\n";
412 return EFlagResult::kErr;
413 }
414 } else {
415 return EFlagResult::kErr;
416 }
417
419}
420
422{
423 // Must be a number between 0 and 509 (with a 0 in the middle)
424 if (compSettings == 0)
425 return true;
426 // We also accept [1-9] as aliases of [101-109], but it's discouraged.
427 if (compSettings >= 1 && compSettings <= 9) {
428 Warn() << "interpreting " << compSettings << " as " << 100 + compSettings
429 << "."
430 " This behavior is deprecated, please use the full compression settings.\n";
431 return true;
432 }
433 return (compSettings >= 100 && compSettings <= 509) && ((compSettings / 10) % 10 == 0);
434}
435
436// The -f flag has a somewhat complicated logic.
437// We have 4 cases:
438// 1. -f
439// 2. -ff
440// 3. -fk
441// 4. -f[0-509]
442//
443// and a combination thereof (e.g. -fk101, -ff202, -ffk, -fk209)
444// -ff and -f[0-509] are incompatible.
445//
446// ALL these flags imply '-f' ("force overwrite"), but only if they parse successfully.
447// This means that if we see a -f[something] and that "something" doesn't parse to a valid
448// number between 0 and 509, or f or k, we consider the flag invalid and skip it without
449// setting any state.
450//
451// Note that we don't allow `-f [0-9]` because that would be a backwards-incompatible
452// change with the previous arg parsing semantic, changing the meaning of a cmdline like:
453//
454// $ hadd -f 200 f.root g.root # <- '200' is the output file, not an argument to -f!
455static EFlagResult FlagF(const char *arg, HAddArgs &args)
456{
457 if (arg[0] != 'f')
459
460 args.fForce = true;
461 const char *cur = arg + 1;
462 while (*cur) {
463 switch (cur[0]) {
464 case 'f':
466 Warn() << "duplicate flag: -ff\n";
467 if (args.fCompressionSettings) {
468 std::cerr
469 << "[err] Cannot specify both -ff and -f[0-9]. Either use the first input compression or specify it.\n";
470 return EFlagResult::kErr;
471 } else
472 args.fUseFirstInputCompression = true;
473 break;
474 case 'k':
475 if (args.fKeepCompressionAsIs)
476 Warn() << "duplicate flag: -fk\n";
477 args.fKeepCompressionAsIs = true;
478 break;
479 default:
480 if (isdigit(cur[0])) {
481 if (args.fUseFirstInputCompression) {
482 Err() << "cannot specify both -ff and -f[0-9]. Either use the first input compression or "
483 "specify it.\n";
484 return EFlagResult::kErr;
485 } else if (!args.fCompressionSettings) {
486 if (auto compLv = StrToUInt(cur)) {
489 // we can't see any other argument after the number, so we return here to avoid
490 // incorrectly parsing the rest of the characters in `arg`.
492 } else {
493 Err() << *compLv << " is not a supported compression settings.\n";
494 return EFlagResult::kErr;
495 }
496 } else {
497 Err() << "failed to parse compression settings '" << cur << "' as an integer.\n";
498 return EFlagResult::kErr;
499 }
500 } else {
501 Err() << "cannot specify -f[0-9] multiple times!\n";
502 return EFlagResult::kErr;
503 }
504 } else {
505 Err() << "invalid flag: " << arg << "\n";
506 return EFlagResult::kErr;
507 }
508 }
509 ++cur;
510 }
511
513}
514
515// Returns nullopt if any of the flags failed to parse.
516// If an unknown flag is encountered, it will print a warning and go on.
517static std::optional<HAddArgs> ParseArgs(int argc, char **argv)
518{
519 HAddArgs args{};
520
521 enum {
527
528 for (int argIdx = 1; argIdx < argc; ++argIdx) {
529 const char *argRaw = argv[argIdx];
530 if (!*argRaw)
531 continue;
532
533 if (!args.fNoFlagsAfterPositionalArguments && argRaw[0] == '-' && argRaw[1] != '\0') {
534 if (argRaw[1] == '-' && argRaw[2] == '\0') {
535 // special case `--`: force parsing to consider all future args as positional arguments.
537 Err()
538 << "found `--`, but we've already parsed (or are still parsing) a sequence of positional arguments!"
539 " This is not supported: you must have exactly one sequence of positional arguments, so if you"
540 " need to use `--` make sure to pass *all* positional arguments after it.";
541 return {};
542 }
543 args.fNoFlagsAfterPositionalArguments = true;
544 continue;
545 }
546
547 // parse flag
549
550 const char *arg = argRaw + 1;
551 bool validFlag = false;
552
553#define PARSE_FLAG(func, ...) \
554 do { \
555 if (!validFlag) { \
556 const auto res = func(__VA_ARGS__); \
557 if (res == EFlagResult::kErr) \
558 return {}; \
559 validFlag = res == EFlagResult::kParsed; \
560 } \
561 } while (0)
562
563 // NOTE: if two flags have the same prefix (e.g. -Ltype and -L) always put the longest one first!
564 PARSE_FLAG(FlagToggle, arg, "T", args.fNoTrees);
565 PARSE_FLAG(FlagToggle, arg, "a", args.fAppend);
566 PARSE_FLAG(FlagToggle, arg, "k", args.fSkipErrors);
567 PARSE_FLAG(FlagToggle, arg, "O", args.fReoptimize);
568 PARSE_FLAG(FlagToggle, arg, "dbg", args.fDebug);
569 // Accept --help, -help and -h as "help"
570 PARSE_FLAG(FlagToggle, arg, "-help", args.fHelp);
571 PARSE_FLAG(FlagToggle, arg, "help", args.fHelp);
572 PARSE_FLAG(FlagToggle, arg, "h", args.fHelp);
573 PARSE_FLAG(FlagArg, argc, argv, argIdx, "d", args.fWorkingDir);
574 PARSE_FLAG(FlagArg, argc, argv, argIdx, "j", args.fNProcesses, {0});
575 PARSE_FLAG(FlagArg, argc, argv, argIdx, "Ltype", args.fObjectFilterType, {}, ConvertFilterType);
576 PARSE_FLAG(FlagArg, argc, argv, argIdx, "L", args.fObjectFilterFile);
577 PARSE_FLAG(FlagArg, argc, argv, argIdx, "cachesize", args.fCacheSize, {}, ConvertCacheSize);
578 PARSE_FLAG(FlagArg, argc, argv, argIdx, "experimental-io-features", args.fFeatures);
579 PARSE_FLAG(FlagArg, argc, argv, argIdx, "n", args.fMaxOpenedFiles);
580 PARSE_FLAG(FlagArg, argc, argv, argIdx, "v", args.fVerbosity, {kDefaultHaddVerbosity});
581 PARSE_FLAG(FlagF, arg, args);
582
583#undef PARSE_FLAG
584
585 if (!validFlag)
586 Warn() << "unknown flag: " << argRaw << "\n";
587
588 } else if (!args.fOutputArgIdx) {
589 // First positional argument is the output
590 args.fOutputArgIdx = argIdx;
593 } else {
594 // We should be in the same positional argument group as the output, error otherwise
596 if (!args.fFirstInputIdx) {
597 args.fFirstInputIdx = argIdx;
598 }
599 } else {
600 Err() << "seen a positional argument '" << argRaw
601 << "' after some flags."
602 " Positional arguments were already parsed at this point (from '"
603 << argv[args.fOutputArgIdx]
604 << "' onwards), and you can only have one sequence of them, so you cannot pass more."
605 " Please group your positional arguments all together so that hadd works as you expect.\n"
606 "Cmdline: ";
607 for (int i = 0; i < argc; ++i)
608 std::cerr << argv[i] << " ";
609 std::cerr << "\n";
610
611 return {};
612 }
613 }
614 }
615
616 return args;
617}
618
619// Returns the flags to add to the file merger's flags, or -1 in case of errors.
620static Int_t ParseFilterFile(const std::optional<std::string> &filterFileName,
621 std::optional<Int_t> objectFilterType, TFileMerger &fileMerger)
622{
623 if (filterFileName) {
624 std::ifstream filterFile(*filterFileName);
625 if (!filterFile) {
626 Err() << "error opening filter file '" << *filterFileName << "'\n";
627 return -1;
628 }
630 std::string line;
631 std::string objPath;
632 int nObjects = 0;
633 while (std::getline(filterFile, line)) {
634 std::istringstream ss(line);
635 // only read exactly 1 token per line (strips any whitespaces and such)
636 objPath.clear();
637 ss >> objPath;
638 if (!objPath.empty() && objPath[0] != '#') {
639 filteredObjects.Append(objPath + ' ');
640 ++nObjects;
641 }
642 }
643
644 if (nObjects) {
645 Info(2) << "added " << nObjects << " object from filter file '" << *filterFileName << "'\n";
646 fileMerger.AddObjectNames(filteredObjects);
647 } else {
648 Warn() << "no objects were added from filter file '" << *filterFileName << "'\n";
649 }
650
651 assert(objectFilterType.has_value());
652 const auto filterFlag = *objectFilterType;
654 return filterFlag;
655 }
656 return 0;
657}
658
659static bool FilesAreEquivalent(std::string_view source, std::string_view target)
660{
661 const bool sourceHasProtocol = source.find_first_of("://") == std::string_view::npos;
662 const bool targetHasProtocol = target.find_first_of("://") == std::string_view::npos;
664 return false;
665
666 // We cannot use std::filesystem functions for file paths that have a protocol.
668 return source == target;
669
670 return std::filesystem::exists(target) && std::filesystem::equivalent(source, target);
671}
672
673int main(int argc, char **argv)
674{
675 const auto argsOpt = ParseArgs(argc, argv);
676 if (!argsOpt)
677 return 1;
678 const HAddArgs &args = *argsOpt;
679
680 if (args.fHelp) {
682 return 0;
683 }
684
686 Int_t maxopenedfiles = args.fMaxOpenedFiles.value_or(0);
688 Int_t newcomp = args.fCompressionSettings.value_or(-1);
689 TString cacheSize = args.fCacheSize.value_or("");
690
691 // For the -j flag (nProcesses), we check if the flag is present and, if so, if it has a
692 // valid value (i.e. any value > 0).
693 // If the flag is present at all, we do multiprocessing. If the value of nProcesses is invalid,
694 // we default to the number of cpus on the machine.
695 Bool_t multiproc = args.fNProcesses.has_value();
696 int nProcesses;
697 if (args.fNProcesses && *args.fNProcesses > 0) {
698 nProcesses = *args.fNProcesses;
699 } else {
700 SysInfo_t s;
701 gSystem->GetSysInfo(&s);
702 nProcesses = s.fCpus;
703 }
704 if (multiproc)
705 Info(2) << "parallelizing with " << nProcesses << " processes.\n";
706
707 // If the user specified a workingDir, use that. Otherwise, default to the system temp dir.
708 std::string workingDir;
709 if (!args.fWorkingDir) {
711 } else if (args.fWorkingDir && gSystem->AccessPathName(args.fWorkingDir->c_str())) {
712 Err() << "could not access the directory specified: " << *args.fWorkingDir << ".\n";
713 return 1;
714 } else {
715 workingDir = *args.fWorkingDir;
716 }
717
718 // Verify that -L and -Ltype are either both present or both absent.
719 if (args.fObjectFilterFile.has_value() != args.fObjectFilterType.has_value()) {
720 Err() << "-L must always be passed along with -Ltype.\n";
721 return 1;
722 }
723
724 const char *targetname = 0;
725 if (!args.fOutputArgIdx) {
726 Err() << "missing output file.\n";
728 return 1;
729 }
730 if (!args.fFirstInputIdx) {
731 Err() << "missing input file.\n";
733 return 1;
734 }
736
737 Info(2) << "target file: " << targetname << "\n";
738
739 if (args.fCacheSize)
740 Info(2) << "Using " << cacheSize << "\n";
741
742 ////////////////////////////// end flags processing /////////////////////////////////
743
744 gSystem->Load("libTreePlayer");
745
747 fileMerger.SetMsgPrefix("hadd");
748 fileMerger.SetPrintLevel(gHaddVerbosity - 1);
749 if (maxopenedfiles > 0) {
750 fileMerger.SetMaxOpenedFiles(maxopenedfiles);
751 }
752 // The following section will collect all input filenames into a vector,
753 // including those listed within an indirect file.
754 // If any file can not be accessed, it will error out, unless args.fSkipErrors is true
755 std::vector<std::string> allSubfiles;
756 for (int a = args.fFirstInputIdx; a < argc; ++a) {
757 if (!args.fNoFlagsAfterPositionalArguments && argv[a] && argv[a][0] == '-') {
758 break;
759 }
760 if (argv[a] && argv[a][0] == '@') {
761 std::ifstream indirect_file(argv[a] + 1);
762 if (!indirect_file.is_open()) {
763 Err() << "could not open indirect file " << (argv[a] + 1) << std::endl;
764 if (!args.fSkipErrors)
765 return 1;
766 } else {
767 std::string line;
768 while (indirect_file) {
769 if (std::getline(indirect_file, line) && line.length()) {
770 if (gSystem->AccessPathName(line.c_str(), kReadPermission) == kTRUE) {
771 Err() << "could not validate the file name \"" << line << "\" within indirect file "
772 << (argv[a] + 1) << std::endl;
773 if (!args.fSkipErrors)
774 return 1;
775 } else if (FilesAreEquivalent(line, targetname)) {
776 Err() << "file " << line << " cannot be both the target and an input!\n";
777 if (!args.fSkipErrors)
778 return 1;
779 } else {
780 allSubfiles.emplace_back(line);
781 }
782 }
783 }
784 }
785 } else {
786 const char *line = argv[a];
788 Err() << "could not validate argument \"" << line << "\" as input file " << std::endl;
789 if (!args.fSkipErrors)
790 return 1;
791 } else if (FilesAreEquivalent(line, targetname)) {
792 Err() << "file " << line << " cannot be both the target and an input!\n";
793 if (!args.fSkipErrors)
794 return 1;
795 } else {
796 allSubfiles.emplace_back(line);
797 }
798 }
799 }
800 if (allSubfiles.empty()) {
801 Err() << "could not find any valid input file " << std::endl;
802 return 1;
803 }
804 // The next snippet determines the output compression if unset
805 if (newcomp == -1) {
807 // grab from the first file.
808 TFile *firstInput = TFile::Open(allSubfiles.front().c_str());
809 if (firstInput && !firstInput->IsZombie())
810 newcomp = firstInput->GetCompressionSettings();
811 else
813 delete firstInput;
814 fileMerger.SetMergeOptions(TString("FirstSrcCompression"));
815 } else {
817 fileMerger.SetMergeOptions(TString("DefaultCompression"));
818 }
819 }
820 if (args.fKeepCompressionAsIs && !args.fReoptimize)
821 Info(2) << "compression setting for meta data: " << newcomp << '\n';
822 else
823 Info(2) << "compression setting for all output: " << newcomp << '\n';
824
825 if (args.fAppend) {
826 if (!fileMerger.OutputFile(targetname, "UPDATE", newcomp)) {
827 Err() << "error opening target file for update :" << targetname << ".\n";
828 return 2;
829 }
830 } else if (!fileMerger.OutputFile(targetname, args.fForce, newcomp)) {
831 std::stringstream ss;
832 ss << "error opening target file (does " << targetname << " exist?).\n";
833 if (!args.fForce)
834 ss << "pass \"-f\" argument to force re-creation of output file.\n";
835 Err() << ss.str();
836 return 1;
837 }
838
839 auto step = (allSubfiles.size() + nProcesses - 1) / nProcesses;
840 if (multiproc && step < 3) {
841 // At least 3 files per process
842 step = 3;
843 nProcesses = (allSubfiles.size() + step - 1) / step;
844 Info(2) << "each process should handle at least 3 files for efficiency."
845 " Setting the number of processes to: "
846 << nProcesses << std::endl;
847 }
848 if (nProcesses == 1)
850
851 std::vector<std::string> partialFiles;
852
853#ifndef R__WIN32
854 // this is commented out only to try to prevent false positive detection
855 // from several anti-virus engines on Windows, and multiproc is not
856 // supported on Windows anyway
857 if (multiproc) {
858 auto uuid = TUUID();
859 auto partialTail = uuid.AsString();
860 for (auto i = 0; (i * step) < allSubfiles.size(); i++) {
861 std::stringstream buffer;
862 buffer << workingDir << "/partial" << i << "_" << partialTail << ".root";
863 partialFiles.emplace_back(buffer.str());
864 }
865 }
866#endif
867
868 auto mergeFiles = [&](TFileMerger &merger) {
869 if (args.fReoptimize) {
870 merger.SetFastMethod(kFALSE);
871 } else {
872 if (!args.fKeepCompressionAsIs && merger.HasCompressionChange()) {
873 // Don't warn if the user has requested any re-optimization.
874 Warn() << "Sources and Target have different compression settings\n"
875 "hadd merging will be slower\n";
876 }
877 }
878 merger.SetNotrees(args.fNoTrees);
879 merger.SetMergeOptions(TString(merger.GetMergeOptions()) + " " + cacheSize);
882 merger.SetIOFeatures(features);
885 if (extraFlags < 0)
886 return false;
888 if (args.fAppend)
890 else
892 Bool_t status = merger.PartialMerge(fileMergerFlags);
893 return status;
894 };
895
896 auto sequentialMerge = [&](TFileMerger &merger, int start, int nFiles) {
897 for (auto i = start; i < (start + nFiles) && i < static_cast<int>(allSubfiles.size()); i++) {
898 if (!merger.AddFile(allSubfiles[i].c_str())) {
899 if (args.fSkipErrors) {
900 Warn() << "skipping file with error: " << allSubfiles[i] << std::endl;
901 } else {
902 Err() << "exiting due to error in " << allSubfiles[i] << std::endl;
903 return kFALSE;
904 }
905 }
906 }
907 return mergeFiles(merger);
908 };
909
910 auto parallelMerge = [&](int start) {
912 mergerP.SetMsgPrefix("hadd");
913 mergerP.SetPrintLevel(gHaddVerbosity - 1);
914 if (maxopenedfiles > 0) {
915 mergerP.SetMaxOpenedFiles(maxopenedfiles / nProcesses);
916 }
917 if (!mergerP.OutputFile(partialFiles[start / step].c_str(), args.fForce, newcomp)) {
918 Err() << "error opening target partial file\n";
919 exit(1);
920 }
921 return sequentialMerge(mergerP, start, step);
922 };
923
924 auto reductionFunc = [&]() {
925 for (const auto &pf : partialFiles) {
926 fileMerger.AddFile(pf.c_str());
927 }
928 return mergeFiles(fileMerger);
929 };
930
931 Bool_t status;
932
933#ifndef R__WIN32
934 if (multiproc) {
936 auto res = p.Map(parallelMerge, ROOT::TSeqI(0, allSubfiles.size(), step));
937 status = std::accumulate(res.begin(), res.end(), 0U) == partialFiles.size();
938 if (status) {
939 status = reductionFunc();
940 } else {
941 Err() << "failed at the parallel stage\n";
942 }
943 if (!args.fDebug) {
944 for (const auto &pf : partialFiles) {
945 gSystem->Unlink(pf.c_str());
946 }
947 }
948 } else {
949 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
950 }
951#else
952 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
953#endif
954
955 if (status) {
956 Info(3) << "merged " << allSubfiles.size() << " (" << fileMerger.GetMergeList()->GetEntries()
957 << ") input (partial) files into " << targetname << "\n";
958 return 0;
959 } else {
960 Err() << "failure during the merge of " << allSubfiles.size() << " (" << fileMerger.GetMergeList()->GetEntries()
961 << ") input (partial) files into " << targetname << "\n";
962 return 1;
963 }
964}
int main()
Definition Prototype.cxx:12
#define c(i)
Definition RSha256.hxx:101
#define a(i)
Definition RSha256.hxx:99
size_t size(const MatrixT &matrix)
retrieve the size of a square matrix
bool Bool_t
Boolean (0=false, 1=true) (bool)
Definition RtypesCore.h:77
int Int_t
Signed integer 4 bytes (int)
Definition RtypesCore.h:59
constexpr Bool_t kFALSE
Definition RtypesCore.h:108
constexpr Bool_t kTRUE
Definition RtypesCore.h:107
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
void Info(const char *location, const char *msgfmt,...)
Use this function for informational messages.
Definition TError.cxx:241
winID h TVirtualViewer3D TVirtualGLPainter p
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t target
@ kReadPermission
Definition TSystem.h:55
R__EXTERN TSystem * gSystem
Definition TSystem.h:572
TIOFeatures provides the end-user with the ability to change the IO behavior of data written via a TT...
This class provides a simple interface to execute the same task multiple times in parallel,...
This class provides file copy and merging services.
Definition TFileMerger.h:30
@ kAll
Merge all type of objects (default)
Definition TFileMerger.h:87
@ kIncremental
Merge the input file with the content of the output file (if already existing).
Definition TFileMerger.h:82
@ kSkipListed
Skip objects specified in fObjectNames list.
Definition TFileMerger.h:91
@ kOnlyListed
Only the objects specified in fObjectNames list.
Definition TFileMerger.h:90
@ kRegular
Normal merge, overwriting the output file.
Definition TFileMerger.h:81
@ kFailOnError
The merging process will stop and yield failure when encountering invalid objects.
@ kSkipOnError
The merging process will skip invalid objects and continue.
A ROOT file is an on-disk file, usually with extension .root, that stores objects in a file-system-li...
Definition TFile.h:130
static TFile * Open(const char *name, Option_t *option="", const char *ftitle="", Int_t compress=ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault, Int_t netopt=0)
Create / open a file.
Definition TFile.cxx:3764
Basic string class.
Definition TString.h:138
TString & Append(const char *cs)
Definition TString.h:581
virtual int GetSysInfo(SysInfo_t *info) const
Returns static system info, like OS type, CPU type, number of CPUs RAM size, etc into the SysInfo_t s...
Definition TSystem.cxx:2469
virtual int Load(const char *module, const char *entry="", Bool_t system=kFALSE)
Load a shared library.
Definition TSystem.cxx:1868
virtual Bool_t AccessPathName(const char *path, EAccessMode mode=kFileExists)
Returns FALSE if one can access a file using the specified access mode.
Definition TSystem.cxx:1307
virtual int Unlink(const char *name)
Unlink, i.e.
Definition TSystem.cxx:1392
virtual const char * TempDirectory() const
Return a user configured or systemwide directory to create temporary files in.
Definition TSystem.cxx:1493
This class defines a UUID (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDent...
Definition TUUID.h:42
TLine * line
static EFlagResult FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional< T > &flagOut, std::optional< T > defaultVal=std::nullopt, FlagConvResult< T >(*conv)(const char *)=ConvertArg< T >)
Definition hadd.cxx:370
EFlagResult
Definition hadd.cxx:253
static bool ValidCompressionSettings(int compSettings)
Definition hadd.cxx:421
FlagConvResult< IntFlag_t > ConvertArg< IntFlag_t >(const char *arg)
Definition hadd.cxx:304
#define PARSE_FLAG(func,...)
static FlagConvResult< Int_t > ConvertFilterType(const char *arg)
Definition hadd.cxx:354
static bool FilesAreEquivalent(std::string_view source, std::string_view target)
Definition hadd.cxx:659
static Int_t ParseFilterFile(const std::optional< std::string > &filterFileName, std::optional< Int_t > objectFilterType, TFileMerger &fileMerger)
Definition hadd.cxx:620
static FlagConvResult< T > ConvertArg(const char *)
uint32_t IntFlag_t
Definition hadd.cxx:222
static constexpr int kDefaultHaddVerbosity
Definition hadd.cxx:177
static std::ostream & Info(int minLevel)
Definition hadd.cxx:215
static std::optional< HAddArgs > ParseArgs(int argc, char **argv)
Definition hadd.cxx:517
FlagConvResult< ROOT::TIOFeatures > ConvertArg< ROOT::TIOFeatures >(const char *arg)
Definition hadd.cxx:319
static int gHaddVerbosity
Definition hadd.cxx:178
static std::ostream & Warn()
Definition hadd.cxx:208
static FlagConvResult< TString > ConvertCacheSize(const char *arg)
Definition hadd.cxx:332
static std::ostream & Err()
Definition hadd.cxx:202
static EFlagResult FlagF(const char *arg, HAddArgs &args)
Definition hadd.cxx:455
static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
Definition hadd.cxx:255
static NullStream & GetNullStream()
Definition hadd.cxx:196
static std::optional< IntFlag_t > StrToUInt(const char *str)
Definition hadd.cxx:270
static constexpr const char kCommandLineOptionsHelp[]
void ToHumanReadableSize(value_type bytes, Bool_t si, Double_t *coeff, const char **units)
Return the size expressed in 'human readable' format.
EFromHumanReadableSize FromHumanReadableSize(std::string_view str, T &value)
Convert strings like the following into byte counts 5MB, 5 MB, 5M, 3.7GB, 123b, 456kB,...
EFlagResult fResult
Definition hadd.cxx:291
bool fNoFlagsAfterPositionalArguments
Definition hadd.cxx:250
bool fHelp
Definition hadd.cxx:233
bool fKeepCompressionAsIs
Definition hadd.cxx:231
bool fForce
Definition hadd.cxx:227
std::optional< TString > fCacheSize
Definition hadd.cxx:239
std::optional< IntFlag_t > fCompressionSettings
Definition hadd.cxx:243
bool fNoTrees
Definition hadd.cxx:225
std::optional< Int_t > fObjectFilterType
Definition hadd.cxx:238
int fFirstInputIdx
Definition hadd.cxx:246
std::optional< IntFlag_t > fNProcesses
Definition hadd.cxx:236
bool fUseFirstInputCompression
Definition hadd.cxx:232
std::optional< std::string > fObjectFilterFile
Definition hadd.cxx:237
bool fSkipErrors
Definition hadd.cxx:228
std::optional< IntFlag_t > fVerbosity
Definition hadd.cxx:242
std::optional< IntFlag_t > fMaxOpenedFiles
Definition hadd.cxx:241
std::optional< std::string > fWorkingDir
Definition hadd.cxx:235
int fOutputArgIdx
Definition hadd.cxx:245
bool fDebug
Definition hadd.cxx:230
bool fReoptimize
Definition hadd.cxx:229
std::optional< ROOT::TIOFeatures > fFeatures
Definition hadd.cxx:240
bool fAppend
Definition hadd.cxx:226
@ kUseCompiledDefault
Use the compile-time default setting.
Definition Compression.h:53
Int_t fCpus
Definition TSystem.h:162
TMarker m
Definition textangle.C:8