Hi roottalk, We are experiencing a problem with partially corrupt root data files. These files have been produced on a reconstruction production farm, and the source of the corruption is not clear yet and is being investigated. We believe that only a small subset of the entries on any one tree are affected by the data corruption, and I would like to be able to recognize when a corrupt record has been read into memory, warn the user, and then move to the next record in the tree so that all subsequent unaffected records can be processed. I'm wondering how this can be done. An example of what happens in a case where TTree::GetEntry() attempts to read a corrupt data record: Error in <TObjArray::At>: index 66 out of bounds (size: 16, this: 0x0921aad0) Error in <TObjArray::At>: index 3072 out of bounds (size: 15, this: 0x09239978) Error in <TObjArray::At>: index 852 out of bounds (size: 15, this: 0x0923a740) Error in <TObjArray::At>: index -11069 out of bounds (size: 68, this: 0x0921aad0) Error in <TObjArray::AddAt>: out of bounds at -11069 in 921aad0 Error in <TBuffer::CheckByteCount>: object of class PlexPlaneId read too many bytes: 6 instead of -1879030366 Warning in <TBuffer::CheckByteCount>: PlexPlaneId::Streamer() not in sync with data on file, fix Streamer() Segmentation fault (core dumped) The first sign of a problem, the TObjArray::At error messages, are produced when the TClass::ReadBuffer method reads a version number from the corrupt data buffer that is ridiculous for the class and uses that corrupt version number to access the TObjArray containing the StreamerInfo's. The subsequent segv occurs deep within root and the GetEntry method never returns to the user. Other corrupt data records produce different symptoms, but the segv is usually preceded by some error messages from root. I thought I could perhaps use an error handler to catch the errors and abort the read of the current entry without aborting the job, which would allow the user to continue processing entries. Unfortunately, I'm really a novice at using error handlers, and although I see that I can override root's ErrorHandler default function using TError's SetErrorHandler method, I don't see how to write the function so that it resurfaces at the place in my code just after the TTree::GetEntry() method is invoked. Perhaps this is a bad idea anyway, since it may leave some unfinished business in TTree::GetEntry? Can anyone suggest a solution to skip past these corrupt records? Of course, our first priority is to fix the cause of the data corruption and all data will eventually be reprocessed, but this will be a stopgap measure to allow the user to look at the data in the meantime. Thanks for your help, -Sue p.s. I'm using root cvs as of this past Sunday and gcc 3.2 on rh linux to read the data files. The data files were produced with an older version of root.
This archive was generated by hypermail 2b29 : Thu Jan 01 2004 - 17:50:09 MET