Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
hadd.cxx
Go to the documentation of this file.
1/**
2 \file hadd.cxx
3 \brief This program will merge compatible ROOT objects, such as histograms, Trees and RNTuples,
4 from a list of root files and write them to a target root file.
5 In order for a ROOT object to be mergeable, it must implement the Merge() function.
6 Non-mergeable objects will have all instances copied as-is into the target file.
7 The target file must not be identical to one of the source files.
8
9 Syntax:
10 ```{.cpp}
11 hadd [flags] targetfile source1 source2 ... [flags]
12 ```
13
14 Flags can be passed before or after the positional arguments.
15 The first positional (non-flag) argument will be interpreted as the targetfile.
16 After that, the first sequence of positional arguments will be interpreted as the input files.
17 If two sequences of positional arguments are separated by flags, hadd will emit an error and abort.
18
19 By default, any argument starting with `-` is interpreted as a flag. If you want to pass filenames
20 starting with `-` you need to pass them after `--`:
21 ```{.cpp}
22 hadd [flags] -- -file1 -file2 ...
23 ```
24 Note that in this case you need to pass ALL positional arguments after `--`.
25
26 If a flag requires an argument, the argument can be specified in any of these ways:
27
28 # All equally valid:
29 -j 16
30 -j16
31 -j=16
32
33 The first syntax is the preferred one since it's backward-compatible with previous versions of hadd.
34 The -f flag is an exception to this rule: it only supports the `-f[0-9]` syntax.
35
36 Note that merging multiple flags is NOT supported: `-jfa` will be interpreted as -j=fa, which is invalid!
37
38 The flags are as follows:
39
40 \param -a Append to the output
41 \param -cachesize <SIZE> Resize the prefetching cache used to speed up I/O operations (use 0 to disable).
42 \param -d <DIR> Carry out the partial multiprocess execution in the specified directory
43 \param -dbg Enable verbosity. If -j was specified, do not not delete partial files
44 stored inside working directory.
45 \param -experimental-io-features <FEATURES> Enables the corresponding experimental feature for output trees.
46 \see ROOT::Experimental::EIOFeatures
47 \param -f Force overwriting of output file.
48 \param -f[0-9] Set target compression algorithm `i` and level `j` passing the number `i*100 + j`, e.g. `-f505`.
49 The last digit (`j`) can be set from 0 = uncompressed to 9 = highly compressed.
50 The first digit (`i`) is 1 for ZLIB, 2 for LZMA, 4 for LZ4 and 5 for ZSTD.
51 Recommended numbers are 101 (ZLIB), 207 (LZMA), 404 (LZ4), 505 (ZSTD),
52 The default value for this flag is 101 (kDefaultZLIB).
53 See ROOT::RCompressionSetting and TFile::TFile documentation for more details.
54 \param -fk Sets the target file to contain the baskets with the same compression as the input files
55 (unless -O is specified). Compresses the meta data using the compression level specified
56 in the first input or the compression setting after fk (for example 505 when using -fk505)
57 \param -ff The compression level used is the one specified in the first input
58 \param -j [N_JOBS] Parallelise the execution in `N_JOBS` processes. If the number of processes is not specified,
59 or is 0, use the system maximum.
60 \param -k Skip corrupt or non-existent files, do not exit
61 \param -L <FILE> Read the list of objects from FILE and either only merge or skip those objects depending on
62 the value of "-Ltype". FILE must contain one object name per line, which cannot contain
63 whitespaces or '/'. You can also pass TDirectory names, which apply to the entire directory
64 content. Lines beginning with '#' are ignored. If this flag is passed, "-Ltype" MUST be
65 passed as well.
66 \param -Ltype <SkipListed|OnlyListed> Sets the type of operation performed on the objects listed in FILE given with
67 the
68 "-L" flag. "SkipListed" will skip all the listed objects; "OnlyListed" will only merge those
69 objects. If this flag is passed, "-L" must be passed as well.
70 \param -n <N_FILES> Open at most `N` files at once (use 0 to request to use the system maximum - which is also
71 the default). This number includes both the input reading files as well as the output file.
72 Thus, if set to 1, it will be automatically replaced to a minimum of 2. If set to a too large
73 value, it will be clipped to the system maximum.
74 \param -O Re-optimize basket size when merging TTree
75 \param -T Do not merge Trees
76 \param -v [LEVEL] Explicitly set the verbosity level:
77 <= 0 = only output errors;
78 1 = only output errors and warnings;
79 2 = output minimal informative messages, errors and warnings (default);
80 >= 3 = output all messages.
81 \return hadd returns a status code: 0 if OK, 1 otherwise
82
83 For example assume 3 files f1, f2, f3 containing histograms hn and Trees Tn
84 - f1 with h1 h2 h3 T1
85 - f2 with h1 h4 T1 T2
86 - f3 with h5
87 the result of
88 ```
89 hadd -f x.root f1.root f2.root f3.root
90 ```
91 will be a file x.root with h1 h2 h3 h4 h5 T1 T2
92 where
93 - h1 will be the sum of the 2 histograms in f1 and f2
94 - T1 will be the merge of the Trees in f1 and f2
95
96 The files may contain sub-directories.
97
98 If the source files contains histograms and Trees, one can skip
99 the Trees with
100 ```
101 hadd -T targetfile source1 source2 ...
102 ```
103
104 Wildcarding and indirect files are also supported
105 ```
106 hadd result.root myfil*.root
107 ```
108 will merge all files in myfil*.root
109 ```
110 hadd result.root file1.root @list.txt file2. root myfil*.root
111 ```
112 will merge file1.root, file2.root, all files in myfil*.root
113 and all files in the indirect text file list.txt ("@" as the first
114 character of the file indicates an indirect file. An indirect file
115 is a text file containing a list of other files, including other
116 indirect files, one line per file).
117
118 If the sources and and target compression levels are identical (default),
119 the program uses the TChain::Merge function with option "fast", ie
120 the merge will be done without unzipping or unstreaming the baskets
121 (i.e. direct copy of the raw byte on disk). The "fast" mode is typically
122 5 times faster than the mode unzipping and unstreaming the baskets.
123
124 If the option -cachesize is used, hadd will resize (or disable if 0) the
125 prefetching cache use to speed up I/O operations.
126
127 For options that take a size as argument, a decimal number of bytes is expected.
128 If the number ends with a `k`, `m`, `g`, etc., the number is multiplied
129 by 1000 (1K), 1000000 (1MB), 1000000000 (1G), etc.
130 If this prefix is followed by `i`, the number is multiplied by the traditional
131 1024 (1KiB), 1048576 (1MiB), 1073741824 (1GiB), etc.
132 The prefix can be optionally followed by B whose casing is ignored,
133 eg. 1k, 1K, 1Kb and 1KB are the same.
134
135 \note By default histograms are added. However hadd does not support the case where
136 histograms have their bit TH1::kIsAverage set.
137
138 \authors Rene Brun, Dirk Geppert, Sven A. Schmidt, Toby Burnett
139*/
140#include "Compression.h"
141#include "TClass.h"
142#include "TFile.h"
143#include "TFileMerger.h"
144#include "THashList.h"
145#include "TKey.h"
146#include "TSystem.h"
147#include "TUUID.h"
148
149#include <ROOT/RConfig.hxx>
150#include <ROOT/StringConv.hxx>
151#include <ROOT/TIOFeatures.hxx>
152
153#include "haddCommandLineOptionsHelp.h"
154#include "logging.hxx"
155
156#include <climits>
157#include <cstdlib>
158#include <filesystem>
159#include <fstream>
160#include <iostream>
161#include <optional>
162#include <sstream>
163#include <string>
164#include <streambuf>
165
166#ifndef R__WIN32
168#endif
169
170////////////////////////////////////////////////////////////////////////////////
171
172// NOTE: TFileMerger will use PrintLevel = GetLogVerbosity() - 1. If PrintLevel is < 1, it will print nothing, otherwise
173// it will print everything. To give some granularity to hadd, we do the following:
174// LogVerbosity = 0: only print hadd errors
175// LogVerbosity = 1: only print hadd errors + warnings
176// LogVerbosity = 2: print hadd errors + warnings and TFileMerger messages
177// LogVerbosity > 2: print all hadd and TFileMerger messages.
178static constexpr int kDefaultHaddVerbosity = 2;
179
180using IntFlag_t = uint32_t;
181
182struct HAddArgs {
185 bool fForce;
188 bool fDebug;
191 bool fHelp;
192
193 std::optional<std::string> fWorkingDir;
194 std::optional<IntFlag_t> fNProcesses;
195 std::optional<std::string> fObjectFilterFile;
196 std::optional<Int_t> fObjectFilterType;
197 std::optional<TString> fCacheSize;
198 std::optional<ROOT::TIOFeatures> fFeatures;
199 std::optional<IntFlag_t> fMaxOpenedFiles;
200 std::optional<IntFlag_t> fVerbosity;
201 std::optional<IntFlag_t> fCompressionSettings;
202
205 // This is set to true if and only if the user passed `--`. In this special
206 // case, we must not stop parsing positional arguments even if we find one
207 // that starts with a `-`.
209};
210
212
213static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
214{
215 const auto argLen = strlen(arg);
216 const auto flagLen = strlen(flagStr);
217 if (argLen == flagLen && strncmp(arg, flagStr, flagLen) == 0) {
218 if (flagOut)
219 Warn() << "duplicate flag: " << flagStr << "\n";
220 flagOut = true;
222 }
224}
225
226// NOTE: not using std::stoi or similar because they have bad error checking.
227// std::stoi will happily parse "120notvalid" as 120.
228static std::optional<IntFlag_t> StrToUInt(const char *str)
229{
230 if (!str)
231 return {};
232
233 uint32_t res = 0;
234 do {
235 if (!isdigit(*str))
236 return {};
237 if (res * 10 < res) // overflow is an error
238 return {};
239 res *= 10;
240 res += *str - '0';
241 } while (*++str);
242
243 return res;
244}
245
246template <typename T>
251
252template <typename T>
253static FlagConvResult<T> ConvertArg(const char *);
254
255template <>
257{
258 return {arg, EFlagResult::kParsed};
259}
260
261template <>
263{
264 // Don't even try to parse arg if it doesn't look like a number.
265 if (!isdigit(*arg))
266 return {0, EFlagResult::kIgnored};
267
268 auto intOpt = StrToUInt(arg);
269 if (intOpt)
270 return {*intOpt, EFlagResult::kParsed};
271
272 Err() << "error parsing integer argument '" << arg << "'\n";
273 return {0, EFlagResult::kErr};
274}
275
276template <>
278{
280 std::stringstream ss;
281 ss.str(arg);
282 std::string item;
283 while (std::getline(ss, item, ',')) {
284 if (!features.Set(item))
285 Warn() << "ignoring unknown feature request: " << item << "\n";
286 }
288}
289
291{
292 TString cacheSize;
293 int size;
296 Err() << "could not parse the cache size passed after -cachesize: '" << arg << "'\n";
297 return {"", EFlagResult::kErr};
299 double m;
300 const char *munit = nullptr;
302 Warn() << "the cache size passed after -cachesize is too large: " << arg << " is greater than " << m << munit
303 << ". We will use the maximum value.\n";
304 return {std::to_string(m) + munit, EFlagResult::kParsed};
305 } else {
306 cacheSize = "cachesize=";
307 cacheSize.Append(arg);
308 }
309 return {cacheSize, EFlagResult::kParsed};
310}
311
313{
314 if (strcmp(arg, "SkipListed") == 0)
316 if (strcmp(arg, "OnlyListed") == 0)
318
319 Err() << "invalid argument for -Ltype: '" << arg << "'. Can only be 'SkipListed' or 'OnlyListed' (case matters).\n";
320 return {{}, EFlagResult::kErr};
321}
322
323// Parses a flag that is followed by an argument of type T.
324// If `defaultVal` is provided, the following argument is optional and will be set to `defaultVal` if missing.
325// `conv` is used to convert the argument from string to its type T.
326template <typename T>
327static EFlagResult
328FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional<T> &flagOut,
329 std::optional<T> defaultVal = std::nullopt, FlagConvResult<T> (*conv)(const char *) = ConvertArg<T>)
330{
331 int argIdx = argIdxInOut;
332 const char *arg = argv[argIdx] + 1;
333 int argLen = strlen(arg);
334 int flagLen = strlen(flagStr);
335 const char *nxtArg = nullptr;
336
337 if (strncmp(arg, flagStr, flagLen) != 0)
339
340 bool argIsSeparate = false;
341 if (argLen > flagLen) {
342 // interpret anything after the flag as the argument.
343 nxtArg = arg + flagLen;
344 // Ignore one '=', if present
345 if (nxtArg[0] == '=')
346 ++nxtArg;
347 } else if (argLen == flagLen) {
348 argIsSeparate = true;
349 if (argIdx + 1 < argc) {
350 ++argIdxInOut;
352 } else {
353 Err() << "expected argument after '-" << flagStr << "' flag.\n";
354 return EFlagResult::kErr;
355 }
356 } else {
358 }
359
360 auto converted = conv(nxtArg);
361 if (converted.fResult == EFlagResult::kParsed) {
362 flagOut = converted.fValue;
363 } else if (converted.fResult == EFlagResult::kIgnored) {
364 if (defaultVal && argIsSeparate) {
366 // If we had tried parsing the next argument, step back one arg idx.
368 } else {
369 Err() << "the argument after '-" << flagStr << "' flag was not of the expected type.\n";
370 return EFlagResult::kErr;
371 }
372 } else {
373 return EFlagResult::kErr;
374 }
375
377}
378
380{
381 // Must be a number between 0 and 509 (with a 0 in the middle)
382 if (compSettings == 0)
383 return true;
384 // We also accept [1-9] as aliases of [101-109], but it's discouraged.
385 if (compSettings >= 1 && compSettings <= 9) {
386 Warn() << "interpreting " << compSettings << " as " << 100 + compSettings
387 << "."
388 " This behavior is deprecated, please use the full compression settings.\n";
389 return true;
390 }
391 return (compSettings >= 100 && compSettings <= 509) && ((compSettings / 10) % 10 == 0);
392}
393
394// The -f flag has a somewhat complicated logic.
395// We have 4 cases:
396// 1. -f
397// 2. -ff
398// 3. -fk
399// 4. -f[0-509]
400//
401// and a combination thereof (e.g. -fk101, -ff202, -ffk, -fk209)
402// -ff and -f[0-509] are incompatible.
403//
404// ALL these flags imply '-f' ("force overwrite"), but only if they parse successfully.
405// This means that if we see a -f[something] and that "something" doesn't parse to a valid
406// number between 0 and 509, or f or k, we consider the flag invalid and skip it without
407// setting any state.
408//
409// Note that we don't allow `-f [0-9]` because that would be a backwards-incompatible
410// change with the previous arg parsing semantic, changing the meaning of a cmdline like:
411//
412// $ hadd -f 200 f.root g.root # <- '200' is the output file, not an argument to -f!
413static EFlagResult FlagF(const char *arg, HAddArgs &args)
414{
415 if (arg[0] != 'f')
417
418 args.fForce = true;
419 const char *cur = arg + 1;
420 while (*cur) {
421 switch (cur[0]) {
422 case 'f':
424 Warn() << "duplicate flag: -ff\n";
425 if (args.fCompressionSettings) {
426 std::cerr
427 << "[err] Cannot specify both -ff and -f[0-9]. Either use the first input compression or specify it.\n";
428 return EFlagResult::kErr;
429 } else
430 args.fUseFirstInputCompression = true;
431 break;
432 case 'k':
433 if (args.fKeepCompressionAsIs)
434 Warn() << "duplicate flag: -fk\n";
435 args.fKeepCompressionAsIs = true;
436 break;
437 default:
438 if (isdigit(cur[0])) {
439 if (args.fUseFirstInputCompression) {
440 Err() << "cannot specify both -ff and -f[0-9]. Either use the first input compression or "
441 "specify it.\n";
442 return EFlagResult::kErr;
443 } else if (!args.fCompressionSettings) {
444 if (auto compLv = StrToUInt(cur)) {
447 // we can't see any other argument after the number, so we return here to avoid
448 // incorrectly parsing the rest of the characters in `arg`.
450 } else {
451 Err() << *compLv << " is not a supported compression settings.\n";
452 return EFlagResult::kErr;
453 }
454 } else {
455 Err() << "failed to parse compression settings '" << cur << "' as an integer.\n";
456 return EFlagResult::kErr;
457 }
458 } else {
459 Err() << "cannot specify -f[0-9] multiple times!\n";
460 return EFlagResult::kErr;
461 }
462 } else {
463 Err() << "invalid flag: " << arg << "\n";
464 return EFlagResult::kErr;
465 }
466 }
467 ++cur;
468 }
469
471}
472
473// Returns nullopt if any of the flags failed to parse.
474// If an unknown flag is encountered, it will print a warning and go on.
475static std::optional<HAddArgs> ParseArgs(int argc, char **argv)
476{
477 HAddArgs args{};
478
479 enum {
485
486 for (int argIdx = 1; argIdx < argc; ++argIdx) {
487 const char *argRaw = argv[argIdx];
488 if (!*argRaw)
489 continue;
490
491 if (!args.fNoFlagsAfterPositionalArguments && argRaw[0] == '-' && argRaw[1] != '\0') {
492 if (argRaw[1] == '-' && argRaw[2] == '\0') {
493 // special case `--`: force parsing to consider all future args as positional arguments.
495 Err()
496 << "found `--`, but we've already parsed (or are still parsing) a sequence of positional arguments!"
497 " This is not supported: you must have exactly one sequence of positional arguments, so if you"
498 " need to use `--` make sure to pass *all* positional arguments after it.";
499 return {};
500 }
501 args.fNoFlagsAfterPositionalArguments = true;
502 continue;
503 }
504
505 // parse flag
507
508 const char *arg = argRaw + 1;
509 bool validFlag = false;
510
511#define PARSE_FLAG(func, ...) \
512 do { \
513 if (!validFlag) { \
514 const auto res = func(__VA_ARGS__); \
515 if (res == EFlagResult::kErr) \
516 return {}; \
517 validFlag = res == EFlagResult::kParsed; \
518 } \
519 } while (0)
520
521 // NOTE: if two flags have the same prefix (e.g. -Ltype and -L) always put the longest one first!
522 PARSE_FLAG(FlagToggle, arg, "T", args.fNoTrees);
523 PARSE_FLAG(FlagToggle, arg, "a", args.fAppend);
524 PARSE_FLAG(FlagToggle, arg, "k", args.fSkipErrors);
525 PARSE_FLAG(FlagToggle, arg, "O", args.fReoptimize);
526 PARSE_FLAG(FlagToggle, arg, "dbg", args.fDebug);
527 // Accept --help, -help and -h as "help"
528 PARSE_FLAG(FlagToggle, arg, "-help", args.fHelp);
529 PARSE_FLAG(FlagToggle, arg, "help", args.fHelp);
530 PARSE_FLAG(FlagToggle, arg, "h", args.fHelp);
531 PARSE_FLAG(FlagArg, argc, argv, argIdx, "d", args.fWorkingDir);
532 PARSE_FLAG(FlagArg, argc, argv, argIdx, "j", args.fNProcesses, {0});
533 PARSE_FLAG(FlagArg, argc, argv, argIdx, "Ltype", args.fObjectFilterType, {}, ConvertFilterType);
534 PARSE_FLAG(FlagArg, argc, argv, argIdx, "L", args.fObjectFilterFile);
535 PARSE_FLAG(FlagArg, argc, argv, argIdx, "cachesize", args.fCacheSize, {}, ConvertCacheSize);
536 PARSE_FLAG(FlagArg, argc, argv, argIdx, "experimental-io-features", args.fFeatures);
537 PARSE_FLAG(FlagArg, argc, argv, argIdx, "n", args.fMaxOpenedFiles);
538 PARSE_FLAG(FlagArg, argc, argv, argIdx, "v", args.fVerbosity, {kDefaultHaddVerbosity});
539 PARSE_FLAG(FlagF, arg, args);
540
541#undef PARSE_FLAG
542
543 if (!validFlag)
544 Warn() << "unknown flag: " << argRaw << "\n";
545
546 } else if (!args.fOutputArgIdx) {
547 // First positional argument is the output
548 args.fOutputArgIdx = argIdx;
551 } else {
552 // We should be in the same positional argument group as the output, error otherwise
554 if (!args.fFirstInputIdx) {
555 args.fFirstInputIdx = argIdx;
556 }
557 } else {
558 Err() << "seen a positional argument '" << argRaw
559 << "' after some flags."
560 " Positional arguments were already parsed at this point (from '"
561 << argv[args.fOutputArgIdx]
562 << "' onwards), and you can only have one sequence of them, so you cannot pass more."
563 " Please group your positional arguments all together so that hadd works as you expect.\n"
564 "Cmdline: ";
565 for (int i = 0; i < argc; ++i)
566 std::cerr << argv[i] << " ";
567 std::cerr << "\n";
568
569 return {};
570 }
571 }
572 }
573
574 return args;
575}
576
577// Returns the flags to add to the file merger's flags, or -1 in case of errors.
578static Int_t ParseFilterFile(const std::optional<std::string> &filterFileName,
579 std::optional<Int_t> objectFilterType, TFileMerger &fileMerger)
580{
581 if (filterFileName) {
582 std::ifstream filterFile(*filterFileName);
583 if (!filterFile) {
584 Err() << "error opening filter file '" << *filterFileName << "'\n";
585 return -1;
586 }
588 std::string line;
589 std::string objPath;
590 int nObjects = 0;
591 while (std::getline(filterFile, line)) {
592 std::istringstream ss(line);
593 // only read exactly 1 token per line (strips any whitespaces and such)
594 objPath.clear();
595 ss >> objPath;
596 if (!objPath.empty() && objPath[0] != '#') {
597 filteredObjects.Append(objPath + ' ');
598 ++nObjects;
599 }
600 }
601
602 if (nObjects) {
603 Info(2) << "added " << nObjects << " object from filter file '" << *filterFileName << "'\n";
604 fileMerger.AddObjectNames(filteredObjects);
605 } else {
606 Warn() << "no objects were added from filter file '" << *filterFileName << "'\n";
607 }
608
609 assert(objectFilterType.has_value());
610 const auto filterFlag = *objectFilterType;
612 return filterFlag;
613 }
614 return 0;
615}
616
617static bool FilesAreEquivalent(std::string_view source, std::string_view target)
618{
619 const bool sourceHasProtocol = source.find_first_of("://") == std::string_view::npos;
620 const bool targetHasProtocol = target.find_first_of("://") == std::string_view::npos;
622 return false;
623
624 // We cannot use std::filesystem functions for file paths that have a protocol.
626 return source == target;
627
628 return std::filesystem::exists(target) && std::filesystem::equivalent(source, target);
629}
630
631int main(int argc, char **argv)
632{
634
635 const auto argsOpt = ParseArgs(argc, argv);
636 if (!argsOpt)
637 return 1;
638 const HAddArgs &args = *argsOpt;
639
640 if (args.fHelp) {
642 return 0;
643 }
644
646 Int_t maxopenedfiles = args.fMaxOpenedFiles.value_or(0);
648 Int_t newcomp = args.fCompressionSettings.value_or(-1);
649 TString cacheSize = args.fCacheSize.value_or("");
650
651 // For the -j flag (nProcesses), we check if the flag is present and, if so, if it has a
652 // valid value (i.e. any value > 0).
653 // If the flag is present at all, we do multiprocessing. If the value of nProcesses is invalid,
654 // we default to the number of cpus on the machine.
655 Bool_t multiproc = args.fNProcesses.has_value();
656 int nProcesses;
657 if (args.fNProcesses && *args.fNProcesses > 0) {
658 nProcesses = *args.fNProcesses;
659 } else {
660 SysInfo_t s;
661 gSystem->GetSysInfo(&s);
662 nProcesses = s.fCpus;
663 }
664 if (multiproc)
665 Info(2) << "parallelizing with " << nProcesses << " processes.\n";
666
667 // If the user specified a workingDir, use that. Otherwise, default to the system temp dir.
668 std::string workingDir;
669 if (!args.fWorkingDir) {
671 } else if (args.fWorkingDir && gSystem->AccessPathName(args.fWorkingDir->c_str())) {
672 Err() << "could not access the directory specified: " << *args.fWorkingDir << ".\n";
673 return 1;
674 } else {
675 workingDir = *args.fWorkingDir;
676 }
677
678 // Verify that -L and -Ltype are either both present or both absent.
679 if (args.fObjectFilterFile.has_value() != args.fObjectFilterType.has_value()) {
680 Err() << "-L must always be passed along with -Ltype.\n";
681 return 1;
682 }
683
684 const char *targetname = 0;
685 if (!args.fOutputArgIdx) {
686 Err() << "missing output file.\n";
688 return 1;
689 }
690 if (!args.fFirstInputIdx) {
691 Err() << "missing input file.\n";
693 return 1;
694 }
696
697 Info(2) << "target file: " << targetname << "\n";
698
699 if (args.fCacheSize)
700 Info(2) << "Using " << cacheSize << "\n";
701
702 ////////////////////////////// end flags processing /////////////////////////////////
703
704 gSystem->Load("libTreePlayer");
705
707 fileMerger.SetMsgPrefix("hadd");
708 fileMerger.SetPrintLevel(GetLogVerbosity() - 1);
709 if (maxopenedfiles > 0) {
710 fileMerger.SetMaxOpenedFiles(maxopenedfiles);
711 }
712 // The following section will collect all input filenames into a vector,
713 // including those listed within an indirect file.
714 // If any file can not be accessed, it will error out, unless args.fSkipErrors is true
715 std::vector<std::string> allSubfiles;
716 for (int a = args.fFirstInputIdx; a < argc; ++a) {
717 if (!args.fNoFlagsAfterPositionalArguments && argv[a] && argv[a][0] == '-') {
718 break;
719 }
720 if (argv[a] && argv[a][0] == '@') {
721 std::ifstream indirect_file(argv[a] + 1);
722 if (!indirect_file.is_open()) {
723 Err() << "could not open indirect file " << (argv[a] + 1) << std::endl;
724 if (!args.fSkipErrors)
725 return 1;
726 } else {
727 std::string line;
728 while (indirect_file) {
729 if (std::getline(indirect_file, line) && line.length()) {
730 if (gSystem->AccessPathName(line.c_str(), kReadPermission) == kTRUE) {
731 Err() << "could not validate the file name \"" << line << "\" within indirect file "
732 << (argv[a] + 1) << std::endl;
733 if (!args.fSkipErrors)
734 return 1;
735 } else if (FilesAreEquivalent(line, targetname)) {
736 Err() << "file " << line << " cannot be both the target and an input!\n";
737 if (!args.fSkipErrors)
738 return 1;
739 } else {
740 allSubfiles.emplace_back(line);
741 }
742 }
743 }
744 }
745 } else {
746 const char *line = argv[a];
748 Err() << "could not validate argument \"" << line << "\" as input file " << std::endl;
749 if (!args.fSkipErrors)
750 return 1;
751 } else if (FilesAreEquivalent(line, targetname)) {
752 Err() << "file " << line << " cannot be both the target and an input!\n";
753 if (!args.fSkipErrors)
754 return 1;
755 } else {
756 allSubfiles.emplace_back(line);
757 }
758 }
759 }
760 if (allSubfiles.empty()) {
761 Err() << "could not find any valid input file " << std::endl;
762 return 1;
763 }
764 // The next snippet determines the output compression if unset
765 if (newcomp == -1) {
767 // grab from the first file.
768 TFile *firstInput = TFile::Open(allSubfiles.front().c_str());
769 if (firstInput && !firstInput->IsZombie())
770 newcomp = firstInput->GetCompressionSettings();
771 else
773 delete firstInput;
774 fileMerger.SetMergeOptions(TString("FirstSrcCompression"));
775 } else {
777 fileMerger.SetMergeOptions(TString("DefaultCompression"));
778 }
779 }
780 if (args.fKeepCompressionAsIs && !args.fReoptimize)
781 Info(2) << "compression setting for meta data: " << newcomp << '\n';
782 else
783 Info(2) << "compression setting for all output: " << newcomp << '\n';
784
785 if (args.fAppend) {
786 if (!fileMerger.OutputFile(targetname, "UPDATE", newcomp)) {
787 Err() << "error opening target file for update :" << targetname << ".\n";
788 return 2;
789 }
790 } else if (!fileMerger.OutputFile(targetname, args.fForce, newcomp)) {
791 std::stringstream ss;
792 ss << "error opening target file (does " << targetname << " exist?).\n";
793 if (!args.fForce)
794 ss << "pass \"-f\" argument to force re-creation of output file.\n";
795 Err() << ss.str();
796 return 1;
797 }
798
799 auto step = (allSubfiles.size() + nProcesses - 1) / nProcesses;
800 if (multiproc && step < 3) {
801 // At least 3 files per process
802 step = 3;
803 nProcesses = (allSubfiles.size() + step - 1) / step;
804 Info(2) << "each process should handle at least 3 files for efficiency."
805 " Setting the number of processes to: "
806 << nProcesses << std::endl;
807 }
808 if (nProcesses == 1)
810
811 std::vector<std::string> partialFiles;
812
813#ifndef R__WIN32
814 // this is commented out only to try to prevent false positive detection
815 // from several anti-virus engines on Windows, and multiproc is not
816 // supported on Windows anyway
817 if (multiproc) {
818 auto uuid = TUUID();
819 auto partialTail = uuid.AsString();
820 for (auto i = 0; (i * step) < allSubfiles.size(); i++) {
821 std::stringstream buffer;
822 buffer << workingDir << "/partial" << i << "_" << partialTail << ".root";
823 partialFiles.emplace_back(buffer.str());
824 }
825 }
826#endif
827
828 auto mergeFiles = [&](TFileMerger &merger) {
829 if (args.fReoptimize) {
830 merger.SetFastMethod(kFALSE);
831 } else {
832 if (!args.fKeepCompressionAsIs && merger.HasCompressionChange()) {
833 // Don't warn if the user has requested any re-optimization.
834 Warn() << "Sources and Target have different compression settings\n"
835 "hadd merging will be slower\n";
836 }
837 }
838 merger.SetNotrees(args.fNoTrees);
839 merger.SetMergeOptions(TString(merger.GetMergeOptions()) + " " + cacheSize);
842 merger.SetIOFeatures(features);
845 if (extraFlags < 0)
846 return false;
848 if (args.fAppend)
850 else
852 Bool_t status = merger.PartialMerge(fileMergerFlags);
853 return status;
854 };
855
856 auto sequentialMerge = [&](TFileMerger &merger, int start, int nFiles) {
857 for (auto i = start; i < (start + nFiles) && i < static_cast<int>(allSubfiles.size()); i++) {
858 if (!merger.AddFile(allSubfiles[i].c_str())) {
859 if (args.fSkipErrors) {
860 Warn() << "skipping file with error: " << allSubfiles[i] << std::endl;
861 } else {
862 Err() << "exiting due to error in " << allSubfiles[i] << std::endl;
863 return kFALSE;
864 }
865 }
866 }
867 return mergeFiles(merger);
868 };
869
870 auto parallelMerge = [&](int start) {
872 mergerP.SetMsgPrefix("hadd");
873 mergerP.SetPrintLevel(GetLogVerbosity() - 1);
874 if (maxopenedfiles > 0) {
875 mergerP.SetMaxOpenedFiles(maxopenedfiles / nProcesses);
876 }
877 if (!mergerP.OutputFile(partialFiles[start / step].c_str(), args.fForce, newcomp)) {
878 Err() << "error opening target partial file\n";
879 exit(1);
880 }
881 return sequentialMerge(mergerP, start, step);
882 };
883
884 auto reductionFunc = [&]() {
885 for (const auto &pf : partialFiles) {
886 fileMerger.AddFile(pf.c_str());
887 }
888 return mergeFiles(fileMerger);
889 };
890
891 Bool_t status;
892
893#ifndef R__WIN32
894 if (multiproc) {
896 auto res = p.Map(parallelMerge, ROOT::TSeqI(0, allSubfiles.size(), step));
897 status = std::accumulate(res.begin(), res.end(), 0U) == partialFiles.size();
898 if (status) {
899 status = reductionFunc();
900 } else {
901 Err() << "failed at the parallel stage\n";
902 }
903 if (!args.fDebug) {
904 for (const auto &pf : partialFiles) {
905 gSystem->Unlink(pf.c_str());
906 }
907 }
908 } else {
909 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
910 }
911#else
912 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
913#endif
914
915 if (status) {
916 Info(3) << "merged " << allSubfiles.size() << " (" << fileMerger.GetMergeList()->GetEntries()
917 << ") input (partial) files into " << targetname << "\n";
918 return 0;
919 } else {
920 Err() << "failure during the merge of " << allSubfiles.size() << " (" << fileMerger.GetMergeList()->GetEntries()
921 << ") input (partial) files into " << targetname << "\n";
922 return 1;
923 }
924}
int main()
Definition Prototype.cxx:12
#define a(i)
Definition RSha256.hxx:99
size_t size(const MatrixT &matrix)
retrieve the size of a square matrix
bool Bool_t
Boolean (0=false, 1=true) (bool)
Definition RtypesCore.h:77
int Int_t
Signed integer 4 bytes (int)
Definition RtypesCore.h:59
constexpr Bool_t kFALSE
Definition RtypesCore.h:108
constexpr Bool_t kTRUE
Definition RtypesCore.h:107
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
void Info(const char *location, const char *msgfmt,...)
Use this function for informational messages.
Definition TError.cxx:241
winID h TVirtualViewer3D TVirtualGLPainter p
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t target
@ kReadPermission
Definition TSystem.h:55
R__EXTERN TSystem * gSystem
Definition TSystem.h:572
TIOFeatures provides the end-user with the ability to change the IO behavior of data written via a TT...
This class provides a simple interface to execute the same task multiple times in parallel,...
This class provides file copy and merging services.
Definition TFileMerger.h:30
@ kAll
Merge all type of objects (default)
Definition TFileMerger.h:87
@ kIncremental
Merge the input file with the content of the output file (if already existing).
Definition TFileMerger.h:82
@ kSkipListed
Skip objects specified in fObjectNames list.
Definition TFileMerger.h:91
@ kOnlyListed
Only the objects specified in fObjectNames list.
Definition TFileMerger.h:90
@ kRegular
Normal merge, overwriting the output file.
Definition TFileMerger.h:81
@ kFailOnError
The merging process will stop and yield failure when encountering invalid objects.
@ kSkipOnError
The merging process will skip invalid objects and continue.
A ROOT file is an on-disk file, usually with extension .root, that stores objects in a file-system-li...
Definition TFile.h:130
static TFile * Open(const char *name, Option_t *option="", const char *ftitle="", Int_t compress=ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault, Int_t netopt=0)
Create / open a file.
Definition TFile.cxx:3764
Basic string class.
Definition TString.h:138
TString & Append(const char *cs)
Definition TString.h:581
virtual int GetSysInfo(SysInfo_t *info) const
Returns static system info, like OS type, CPU type, number of CPUs RAM size, etc into the SysInfo_t s...
Definition TSystem.cxx:2469
virtual int Load(const char *module, const char *entry="", Bool_t system=kFALSE)
Load a shared library.
Definition TSystem.cxx:1868
virtual Bool_t AccessPathName(const char *path, EAccessMode mode=kFileExists)
Returns FALSE if one can access a file using the specified access mode.
Definition TSystem.cxx:1307
virtual int Unlink(const char *name)
Unlink, i.e.
Definition TSystem.cxx:1392
virtual const char * TempDirectory() const
Return a user configured or systemwide directory to create temporary files in.
Definition TSystem.cxx:1493
This class defines a UUID (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDent...
Definition TUUID.h:42
TLine * line
static EFlagResult FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional< T > &flagOut, std::optional< T > defaultVal=std::nullopt, FlagConvResult< T >(*conv)(const char *)=ConvertArg< T >)
Definition hadd.cxx:328
EFlagResult
Definition hadd.cxx:211
static bool ValidCompressionSettings(int compSettings)
Definition hadd.cxx:379
FlagConvResult< IntFlag_t > ConvertArg< IntFlag_t >(const char *arg)
Definition hadd.cxx:262
#define PARSE_FLAG(func,...)
static FlagConvResult< Int_t > ConvertFilterType(const char *arg)
Definition hadd.cxx:312
static bool FilesAreEquivalent(std::string_view source, std::string_view target)
Definition hadd.cxx:617
static Int_t ParseFilterFile(const std::optional< std::string > &filterFileName, std::optional< Int_t > objectFilterType, TFileMerger &fileMerger)
Definition hadd.cxx:578
static FlagConvResult< T > ConvertArg(const char *)
uint32_t IntFlag_t
Definition hadd.cxx:180
static constexpr int kDefaultHaddVerbosity
Definition hadd.cxx:178
static std::optional< HAddArgs > ParseArgs(int argc, char **argv)
Definition hadd.cxx:475
FlagConvResult< ROOT::TIOFeatures > ConvertArg< ROOT::TIOFeatures >(const char *arg)
Definition hadd.cxx:277
static FlagConvResult< TString > ConvertCacheSize(const char *arg)
Definition hadd.cxx:290
static EFlagResult FlagF(const char *arg, HAddArgs &args)
Definition hadd.cxx:413
static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
Definition hadd.cxx:213
static std::optional< IntFlag_t > StrToUInt(const char *str)
Definition hadd.cxx:228
static constexpr const char kCommandLineOptionsHelp[]
void ToHumanReadableSize(value_type bytes, Bool_t si, Double_t *coeff, const char **units)
Return the size expressed in 'human readable' format.
EFromHumanReadableSize FromHumanReadableSize(std::string_view str, T &value)
Convert strings like the following into byte counts 5MB, 5 MB, 5M, 3.7GB, 123b, 456kB,...
EFlagResult fResult
Definition hadd.cxx:249
bool fNoFlagsAfterPositionalArguments
Definition hadd.cxx:208
bool fHelp
Definition hadd.cxx:191
bool fKeepCompressionAsIs
Definition hadd.cxx:189
bool fForce
Definition hadd.cxx:185
std::optional< TString > fCacheSize
Definition hadd.cxx:197
std::optional< IntFlag_t > fCompressionSettings
Definition hadd.cxx:201
bool fNoTrees
Definition hadd.cxx:183
std::optional< Int_t > fObjectFilterType
Definition hadd.cxx:196
int fFirstInputIdx
Definition hadd.cxx:204
std::optional< IntFlag_t > fNProcesses
Definition hadd.cxx:194
bool fUseFirstInputCompression
Definition hadd.cxx:190
std::optional< std::string > fObjectFilterFile
Definition hadd.cxx:195
bool fSkipErrors
Definition hadd.cxx:186
std::optional< IntFlag_t > fVerbosity
Definition hadd.cxx:200
std::optional< IntFlag_t > fMaxOpenedFiles
Definition hadd.cxx:199
std::optional< std::string > fWorkingDir
Definition hadd.cxx:193
int fOutputArgIdx
Definition hadd.cxx:203
bool fDebug
Definition hadd.cxx:188
bool fReoptimize
Definition hadd.cxx:187
std::optional< ROOT::TIOFeatures > fFeatures
Definition hadd.cxx:198
bool fAppend
Definition hadd.cxx:184
@ kUseCompiledDefault
Use the compile-time default setting.
Definition Compression.h:53
Int_t fCpus
Definition TSystem.h:162
TMarker m
Definition textangle.C:8