Hi,
I've recently come across a strang thing that happens to me when
reading back a TTree from a file. The TTree contains one top level
branch of a class Foo, that contains a TObjArray. The TObjArray may
contain objects of classes Foo, Bar, Baz, Qux, ... and any number of
them.
When things go wrong, it's usually flaged with
index -20432 out of bounds (size: 13, ....
and then a segmentation violation causing an abort and core dump.
So I did some serious debugging. It turns out the out of bounds error
comes from "Int_t TClass::ReadBuffer(TBuffer &b, void *pointer)":
UInt_t R__s, R__c;
Version_t version = b.ReadVersion(&R__s, &R__c);
if (gFile && gFile->GetVersion() < 30000) version = -1;
TStreamerInfo *sinfo = (TStreamerInfo*)fStreamerInfo->At(version);
^^^^^^^
where version is some absurd number. Further debugging showed, that
the problem poped actually popped up in
"Version_t TBuffer::ReadVersion(UInt_t *startpos, UInt_t *bcnt)":
union {
UInt_t cnt;
Version_t vers[2];
} v;
*this >> v.vers[1];
*this >> v.vers[0];
Here, v.cnt is supposed to be a masked byte count, but instead it is
0, or something so that in the next instruction
if (!(v.cnt & kByteCountMask)) {
fBufCur -= sizeof(UInt_t);
v.cnt = 0;
}
the conditional is true, and so the buffer backs up "sizeof(UInt_t)" =
4 bytes, and then goes on. The next thing is:
*bcnt = (v.cnt & ~kByteCountMask);
*this >> version;
So the TClass gets the wrong byte count, and a wierd version number,
since the buffer was reading the wrong stuff! Later on this is also
what causes the segmentation violation. So, the whole thing is messed
up because of this short (or is it long?) read. So, I played around a
bit, tried different things.
It turns out, that certain combinations of the buffer size and split
level makes the thing happen. I did an investigation, and here's what
I found:
split | buffer size
level | 100 2000 4000 6400 16000 32000
------+-----------------------------------------
0 | n/a n/a ok bad n/a n/a
1 | ok ok bad bad bad bad
2 | ok n/a n/a ok ok ok
99 | ok n/a n/a ok ok ok
"n/a" means I was lazy and didn't do the test. "ok" I could read back
fine. "bad" means it failed as outlined above.
Ok, so the numbers above only really makes sense if you have the full
class specs and are running the thing on a machine like to mine.
This seems odd to me. Does anyone have a good explanation? Is this
really the behaviour intended (presuming that I'm not doing something
wrong, which I don't thing I do).
My machine is a Pentium III, 733 MHz, 256 MB RAM + 1 GB swap, Redhat
6.2, ROOT 3.01/06 (CVS head a week ago).
Yours,
Christian Holm Christensen -------------------------------------------
Address: Sankt Hansgade 23, 1. th. Phone: (+45) 35 35 96 91
DK-2200 Copenhagen N Cell: (+45) 28 82 16 23
Denmark Office: (+45) 353 25 305
Email: cholm@nbi.dk Web: www.nbi.dk/~cholm
This archive was generated by hypermail 2b29 : Tue Jan 01 2002 - 17:50:58 MET