Re: hadd: "too many open files"

From: Philippe Canal <pcanal_at_fnal.gov>
Date: Mon, 8 Aug 2011 10:46:39 -0500


 > Otherwise users typically write some kind of wrapper script which calls hadd on a subset of the files until all files are merged (essentially doing as suggested above).

At the time we originally wrote hadd, limit on the length of the command line meant that in many case it was not even possible to pass 1000s of file name to hadd
and thus the comprise you described make sense.

We currently reviewing some of the implementation details of hadd and will indeed consider your recommendation.

Cheers,
Philippe.

On 8/8/11 2:11 AM, Noel Dawe wrote:
> Hi Philippe,
>
> I see. Why not perform the merge in batches containing a maximum of "ulimit -n" files then? Or add an option -n allowing the user
> to specify a maximum number of files to consider at once. Although taking a slight performance hit if more than "ulimit -n" files
> were being merged, at least hadd would not hit the system limits and fail. I think the slight performance hit is definitely worth
> actually running to completion. Actually, I think any necessary performance hit is worth it. Otherwise users typically write some
> kind of wrapper script which calls hadd on a subset of the files until all files are merged (essentially doing as suggested above).
>
> Noel
>
> On Mon, Aug 8, 2011 at 1:51 AM, Philippe Canal <pcanal_at_fnal.gov <mailto:pcanal_at_fnal.gov>> wrote:
>
> Hi Noel,
>
> The current scheme comes from 2 observation, one being that opening a file is comparitively slow especially if the file is not
> local.
> The 2nd is that it is more efficient time wise to get one object to be merged and then merge into this object the equivalent
> objects from all the remaining files and then to move on to the next object/directory. This is particular helpful with deep
> directory
> hierarchy are its reduced the number of traversal that are needed.
>
> Cheers,
> Philippe.
>
>
> On 8/6/11 5:19 AM, Noel Dawe wrote:
>
> I don't know why hadd needs to open all the files at the same time but probably a better way to write this tool would be
> to never open more than two files at once: copy the first file to the destination and keep it open, then pop off the next
> file, open it, merge it into the first, close it, then pop off the next file and open it, etc...
>
> Noel
>
>
Received on Mon Aug 08 2011 - 17:46:52 CEST

This archive was generated by hypermail 2.2.0 : Tue Aug 09 2011 - 17:50:03 CEST