Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
TRobustEstimator.cxx
Go to the documentation of this file.
1// @(#)root/physics:$Id$
2// Author: Anna Kreshuk 08/10/2004
3
4/*************************************************************************
5 * Copyright (C) 1995-2004, Rene Brun and Fons Rademakers. *
6 * All rights reserved. *
7 * *
8 * For the licensing terms see $ROOTSYS/LICENSE. *
9 * For the list of contributors see $ROOTSYS/README/CREDITS. *
10 *************************************************************************/
11
12/** \class TRobustEstimator
13 \ingroup Physics
14Minimum Covariance Determinant Estimator - a Fast Algorithm
15invented by Peter J.Rousseeuw and Katrien Van Dreissen
16"A Fast Algorithm for the Minimum covariance Determinant Estimator"
17Technometrics, August 1999, Vol.41, NO.3
18
19What are robust estimators?
20"An important property of an estimator is its robustness. An estimator
21is called robust if it is insensitive to measurements that deviate
22from the expected behaviour. There are 2 ways to treat such deviating
23measurements: one may either try to recognise them and then remove
24them from the data sample; or one may leave them in the sample, taking
25care that they do not influence the estimate unduly. In both cases robust
26estimators are needed...Robust procedures compensate for systematic errors
27as much as possible, and indicate any situation in which a danger of not being
28able to operate reliably is detected."
29R.Fruhwirth, M.Regler, R.K.Bock, H.Grote, D.Notz
30"Data Analysis Techniques for High-Energy Physics", 2nd edition
31
32What does this algorithm do?
33It computes a highly robust estimator of multivariate location and scatter.
34Then, it takes those estimates to compute robust distances of all the
35data vectors. Those with large robust distances are considered outliers.
36Robust distances can then be plotted for better visualization of the data.
37
38How does this algorithm do it?
39The MCD objective is to find h observations(out of n) whose classical
40covariance matrix has the lowest determinant. The MCD estimator of location
41is then the average of those h points and the MCD estimate of scatter
42is their covariance matrix. The minimum(and default) h = (n+nvariables+1)/2
43so the algorithm is effective when less than (n+nvar+1)/2 variables are outliers.
44The algorithm also allows for exact fit situations - that is, when h or more
45observations lie on a hyperplane. Then the algorithm still yields the MCD location T
46and scatter matrix S, the latter being singular as it should be. From (T,S) the
47program then computes the equation of the hyperplane.
48
49How can this algorithm be used?
50In any case, when contamination of data is suspected, that might influence
51the classical estimates.
52Also, robust estimation of location and scatter is a tool to robustify
53other multivariate techniques such as, for example, principal-component analysis
54and discriminant analysis.
55
56Technical details of the algorithm:
57
581. The default h = (n+nvariables+1)/2, but the user may choose any integer h with
59 (n+nvariables+1)/2<=h<=n. The program then reports the MCD's breakdown value
60 (n-h+1)/n. If you are sure that the dataset contains less than 25% contamination
61 which is usually the case, a good compromise between breakdown value and
62 efficiency is obtained by putting h=[.75*n].
632. If h=n,the MCD location estimate is the average of the whole dataset, and
64 the MCD scatter estimate is its covariance matrix. Report this and stop
653. If nvariables=1 (univariate data), compute the MCD estimate by the exact
66 algorithm of Rousseeuw and Leroy (1987, pp.171-172) in O(nlogn)time and stop
674. From here on, h<n and nvariables>=2.
68 1. If n is small:
69 - repeat (say) 500 times:
70 - construct an initial h-subset, starting from a random (nvar+1)-subset
71 - carry out 2 C-steps (described in the comments of CStep function)
72 - for the 10 results with lowest det(S):
73 - carry out C-steps until convergence
74 - report the solution (T, S) with the lowest det(S)
75 2. If n is larger (say, n>600), then
76 - construct up to 5 disjoint random subsets of size nsub (say, nsub=300)
77 - inside each subset repeat 500/5 times:
78 - construct an initial subset of size hsub=[nsub*h/n]
79 - carry out 2 C-steps
80 - keep the best 10 results (Tsub, Ssub)
81 - pool the subsets, yielding the merged set (say, of size nmerged=1500)
82 - in the merged set, repeat for each of the 50 solutions (Tsub, Ssub)
83 - carry out 2 C-steps
84 - keep the 10 best results
85 - in the full dataset, repeat for those best results:
86 - take several C-steps, using n and h
87 - report the best final result (T, S)
885. To obtain consistency when the data comes from a multivariate normal
89 distribution, covariance matrix is multiplied by a correction factor
906. Robust distances for all elements, using the final (T, S) are calculated
91 Then the very final mean and covariance estimates are calculated only for
92 values, whose robust distances are less than a cutoff value (0.975 quantile
93 of chi2 distribution with nvariables degrees of freedom)
94*/
95
96#include "TRobustEstimator.h"
97#include "TMatrixDSymEigen.h"
98#include "TRandom.h"
99#include "TMath.h"
100#include "TDecompChol.h"
101
103
105 0.454937, 1.38629, 2.36597, 3.35670, 4.35146, 5.34812, 6.34581, 7.34412, 8.34283,
106 9.34182, 10.34, 11.34, 12.34, 13.34, 14.34, 15.34, 16.34, 17.34, 18.34, 19.34,
107 20.34, 21.34, 22.34, 23.34, 24.34, 25.34, 26.34, 27.34, 28.34, 29.34, 30.34,
108 31.34, 32.34, 33.34, 34.34, 35.34, 36.34, 37.34, 38.34, 39.34, 40.34,
109 41.34, 42.34, 43.34, 44.34, 45.34, 46.34, 47.34, 48.34, 49.33};
110
112 5.02389, 7.3776,9.34840,11.1433,12.8325,
113 14.4494,16.0128,17.5346,19.0228,20.4831,21.920,23.337,
114 24.736,26.119,27.488,28.845,30.191,31.526,32.852,34.170,
115 35.479,36.781,38.076,39.364,40.646,41.923,43.194,44.461,
116 45.722,46.979,48.232,49.481,50.725,51.966,53.203,54.437,
117 55.668,56.896,58.120,59.342,60.561,61.777,62.990,64.201,
118 65.410,66.617,67.821,69.022,70.222,71.420};
119
120////////////////////////////////////////////////////////////////////////////////
121///this constructor should be used in a univariate case:
122///first call this constructor, then - the EvaluateUni(..) function
123
125}
126
127////////////////////////////////////////////////////////////////////////////////
128///constructor
129
131 :fMean(nvariables),
132 fCovariance(nvariables),
133 fInvcovariance(nvariables),
134 fCorrelation(nvariables),
135 fRd(nvectors),
136 fSd(nvectors),
137 fOut(1),
138 fHyperplane(nvariables),
139 fData(nvectors, nvariables)
140{
141 if ((nvectors<=1)||(nvariables<=0)){
142 Error("TRobustEstimator","Not enough vectors or variables");
143 return;
144 }
145 if (nvariables==1){
146 Error("TRobustEstimator","For the univariate case, use the default constructor and EvaluateUni() function");
147 return;
148 }
149
150 fN=nvectors;
151 fNvar=nvariables;
152 if (hh<(fN+fNvar+1)/2){
153 if (hh>0)
154 Warning("TRobustEstimator","chosen h is too small, default h is taken instead");
155 fH=(fN+fNvar+1)/2;
156 } else
157 fH=hh;
158
159 fVarTemp=0;
160 fVecTemp=0;
161 fExact=0;
162}
163
164////////////////////////////////////////////////////////////////////////////////
165///adds a column to the data matrix
166///it is assumed that the column has size fN
167///variable fVarTemp keeps the number of columns l
168///already added
169
171{
172 if (fVarTemp==fNvar) {
173 fNvar++;
180 }
181 for (Int_t i=0; i<fN; i++) {
182 fData(i, fVarTemp)=col[i];
183 }
184 fVarTemp++;
185}
186
187////////////////////////////////////////////////////////////////////////////////
188///adds a vector to the data matrix
189///it is supposed that the vector is of size fNvar
190
192{
193 if(fVecTemp==fN) {
194 fN++;
195 fRd.ResizeTo(fN);
196 fSd.ResizeTo(fN);
198 }
199 for (Int_t i=0; i<fNvar; i++)
200 fData(fVecTemp, i)=row[i];
201
202 fVecTemp++;
203}
204
205////////////////////////////////////////////////////////////////////////////////
206///Finds the estimate of multivariate mean and variance
207
209{
210 Double_t kEps=1e-14;
211
212 if (fH==fN){
213 Warning("Evaluate","Chosen h = #observations, so classic estimates of location and scatter will be calculated");
214 Classic();
215 return;
216 }
217
218 Int_t i, j, k;
219 Int_t ii, jj;
220 Int_t nmini = 300;
221 Int_t k1=500;
222 Int_t nbest=10;
223 TMatrixD sscp(fNvar+1, fNvar+1);
224 TVectorD vec(fNvar);
225
226 Int_t *index = new Int_t[fN];
227 Double_t *ndist = new Double_t[fN];
228 Double_t det;
229 Double_t *deti=new Double_t[nbest];
230 for (i=0; i<nbest; i++)
231 deti[i]=1e16;
232
233 for (i=0; i<fN; i++)
234 fRd(i)=0;
235 ////////////////////////////
236 //for small n
237 ////////////////////////////
238 if (fN<nmini*2) {
239 //for storing the best fMeans and covariances
240
241 TMatrixD mstock(nbest, fNvar);
242 TMatrixD cstock(fNvar, fNvar*nbest);
243
244 for (k=0; k<k1; k++) {
245 CreateSubset(fN, fH, fNvar, index, fData, sscp, ndist);
246 //calculate the mean and covariance of the created subset
247 ClearSscp(sscp);
248 for (i=0; i<fH; i++) {
249 for(j=0; j<fNvar; j++)
250 vec(j)=fData[index[i]][j];
251 AddToSscp(sscp, vec);
252 }
253 Covar(sscp, fMean, fCovariance, fSd, fH);
254 det = fCovariance.Determinant();
255 if (det < kEps) {
256 fExact = Exact(ndist);
257 delete [] index;
258 delete [] ndist;
259 delete [] deti;
260 return;
261 }
262 //make 2 CSteps
263 det = CStep(fN, fH, index, fData, sscp, ndist);
264 if (det < kEps) {
265 fExact = Exact(ndist);
266 delete [] index;
267 delete [] ndist;
268 delete [] deti;
269 return;
270 }
271 det = CStep(fN, fH, index, fData, sscp, ndist);
272 if (det < kEps) {
273 fExact = Exact(ndist);
274 delete [] index;
275 delete [] ndist;
276 delete [] deti;
277 return;
278 } else {
279 Int_t maxind=TMath::LocMax(nbest, deti);
280 if(det<deti[maxind]) {
281 deti[maxind]=det;
282 for(ii=0; ii<fNvar; ii++) {
283 mstock(maxind, ii)=fMean(ii);
284 for(jj=0; jj<fNvar; jj++)
285 cstock(ii, jj+maxind*fNvar)=fCovariance(ii, jj);
286 }
287 }
288 }
289 }
290
291 //now for nbest best results perform CSteps until convergence
292
293 for (i=0; i<nbest; i++) {
294 for(ii=0; ii<fNvar; ii++) {
295 fMean(ii)=mstock(i, ii);
296 for (jj=0; jj<fNvar; jj++)
297 fCovariance(ii, jj)=cstock(ii, jj+i*fNvar);
298 }
299
300 det=1;
301 while (det>kEps) {
302 det=CStep(fN, fH, index, fData, sscp, ndist);
303 if(TMath::Abs(det-deti[i])<kEps)
304 break;
305 else
306 deti[i]=det;
307 }
308 for(ii=0; ii<fNvar; ii++) {
309 mstock(i,ii)=fMean(ii);
310 for (jj=0; jj<fNvar; jj++)
311 cstock(ii,jj+i*fNvar)=fCovariance(ii, jj);
312 }
313 }
314
315 Int_t detind=TMath::LocMin(nbest, deti);
316 for(ii=0; ii<fNvar; ii++) {
317 fMean(ii)=mstock(detind,ii);
318
319 for(jj=0; jj<fNvar; jj++)
320 fCovariance(ii, jj)=cstock(ii,jj+detind*fNvar);
321 }
322
323 if (deti[detind]!=0) {
324 //calculate robust distances and throw out the bad points
325 Int_t nout = RDist(sscp);
326 Double_t cutoff=kChiQuant[fNvar-1];
327
328 fOut.Set(nout);
329
330 j=0;
331 for (i=0; i<fN; i++) {
332 if(fRd(i)>cutoff) {
333 fOut[j]=i;
334 j++;
335 }
336 }
337
338 } else {
339 fExact=Exact(ndist);
340 }
341 delete [] index;
342 delete [] ndist;
343 delete [] deti;
344 return;
345
346 }
347 /////////////////////////////////////////////////
348 //if n>nmini, the dataset should be partitioned
349 //partitioning
350 ////////////////////////////////////////////////
351 Int_t indsubdat[5];
352 Int_t nsub;
353 for (ii=0; ii<5; ii++)
354 indsubdat[ii]=0;
355
356 nsub = Partition(nmini, indsubdat);
357
358 Int_t sum=0;
359 for (ii=0; ii<5; ii++)
360 sum+=indsubdat[ii];
361 Int_t *subdat=new Int_t[sum];
362 //printf("allocates subdat[ %d ]\n", sum);
363 // init the subdat matrix
364 for (int iii = 0; iii < sum; ++iii) subdat[iii] = -999;
365 RDraw(subdat, nsub, indsubdat);
366 for (int iii = 0; iii < sum; ++iii) {
367 if (subdat[iii] < 0 || subdat[iii] >= fN ) {
368 Error("Evaluate","subdat index is invalid subdat[%d] = %d",iii, subdat[iii] );
369 R__ASSERT(0);
370 }
371 }
372 //now the indexes of selected cases are in the array subdat
373 //matrices to store best means and covariances
374 Int_t nbestsub=nbest*nsub;
375 TMatrixD mstockbig(nbestsub, fNvar);
376 TMatrixD cstockbig(fNvar, fNvar*nbestsub);
377 TMatrixD hyperplane(nbestsub, fNvar);
378 for (i=0; i<nbestsub; i++) {
379 for(j=0; j<fNvar; j++)
380 hyperplane(i,j)=0;
381 }
382 Double_t *detibig = new Double_t[nbestsub];
383 Int_t maxind;
384 maxind=TMath::LocMax(5, indsubdat);
385 TMatrixD dattemp(indsubdat[maxind], fNvar);
386
387 Int_t k2=Int_t(k1/nsub);
388 //construct h-subsets and perform 2 CSteps in subgroups
389
390 for (Int_t kgroup=0; kgroup<nsub; kgroup++) {
391 //printf("group #%d\n", kgroup);
392 Int_t ntemp=indsubdat[kgroup];
393 Int_t temp=0;
394 for (i=0; i<kgroup; i++)
395 temp+=indsubdat[i];
396 Int_t par;
397
398
399 for(i=0; i<ntemp; i++) {
400 for (j=0; j<fNvar; j++) {
401 dattemp(i,j)=fData[subdat[temp+i]][j];
402 }
403 }
404 Int_t htemp=Int_t(fH*ntemp/fN);
405
406 for (i=0; i<nbest; i++)
407 deti[i]=1e16;
408
409 for(k=0; k<k2; k++) {
410 CreateSubset(ntemp, htemp, fNvar, index, dattemp, sscp, ndist);
411 ClearSscp(sscp);
412 for (i=0; i<htemp; i++) {
413 for(j=0; j<fNvar; j++) {
414 vec(j)=dattemp(index[i],j);
415 }
416 AddToSscp(sscp, vec);
417 }
418 Covar(sscp, fMean, fCovariance, fSd, htemp);
419 det = fCovariance.Determinant();
420 if (det<kEps) {
421 par =Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
422 if(par==nbest+1) {
423
424 delete [] detibig;
425 delete [] deti;
426 delete [] subdat;
427 delete [] ndist;
428 delete [] index;
429 return;
430 } else
431 deti[par]=det;
432 } else {
433 det = CStep(ntemp, htemp, index, dattemp, sscp, ndist);
434 if (det<kEps) {
435 par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp, ndist);
436 if(par==nbest+1) {
437
438 delete [] detibig;
439 delete [] deti;
440 delete [] subdat;
441 delete [] ndist;
442 delete [] index;
443 return;
444 } else
445 deti[par]=det;
446 } else {
447 det=CStep(ntemp,htemp, index, dattemp, sscp, ndist);
448 if(det<kEps){
449 par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
450 if(par==nbest+1) {
451
452 delete [] detibig;
453 delete [] deti;
454 delete [] subdat;
455 delete [] ndist;
456 delete [] index;
457 return;
458 } else {
459 deti[par]=det;
460 }
461 } else {
462 maxind=TMath::LocMax(nbest, deti);
463 if(det<deti[maxind]) {
464 deti[maxind]=det;
465 for(i=0; i<fNvar; i++) {
466 mstockbig(nbest*kgroup+maxind,i)=fMean(i);
467 for(j=0; j<fNvar; j++) {
468 cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
469
470 }
471 }
472 }
473
474 }
475 }
476 }
477
478 maxind=TMath::LocMax(nbest, deti);
479 if (deti[maxind]<kEps)
480 break;
481 }
482
483
484 for(i=0; i<nbest; i++) {
485 detibig[kgroup*nbest + i]=deti[i];
486
487 }
488
489 }
490
491 //now the arrays mstockbig and cstockbig store nbest*nsub best means and covariances
492 //detibig stores nbest*nsub their determinants
493 //merge the subsets and carry out 2 CSteps on the merged set for all 50 best solutions
494
495 TMatrixD datmerged(sum, fNvar);
496 for(i=0; i<sum; i++) {
497 for (j=0; j<fNvar; j++)
498 datmerged(i,j)=fData[subdat[i]][j];
499 }
500 // printf("performing calculations for merged set\n");
501 Int_t hmerged=Int_t(sum*fH/fN);
502
503 Int_t nh;
504 for(k=0; k<nbestsub; k++) {
505 //for all best solutions perform 2 CSteps and then choose the very best
506 for(ii=0; ii<fNvar; ii++) {
507 fMean(ii)=mstockbig(k,ii);
508 for(jj=0; jj<fNvar; jj++)
509 fCovariance(ii, jj)=cstockbig(ii,k*fNvar+jj);
510 }
511 if(detibig[k]==0) {
512 for(i=0; i<fNvar; i++)
513 fHyperplane(i)=hyperplane(k,i);
514 CreateOrtSubset(datmerged,index, hmerged, sum, sscp, ndist);
515
516 }
517 det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
518 if (det<kEps) {
519 nh= Exact(ndist);
520 if (nh>=fH) {
521 fExact = nh;
522
523 delete [] detibig;
524 delete [] deti;
525 delete [] subdat;
526 delete [] ndist;
527 delete [] index;
528 return;
529 } else {
530 CreateOrtSubset(datmerged, index, hmerged, sum, sscp, ndist);
531 }
532 }
533
534 det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
535 if (det<kEps) {
536 nh=Exact(ndist);
537 if (nh>=fH) {
538 fExact = nh;
539 delete [] detibig;
540 delete [] deti;
541 delete [] subdat;
542 delete [] ndist;
543 delete [] index;
544 return;
545 }
546 }
547 detibig[k]=det;
548 for(i=0; i<fNvar; i++) {
549 mstockbig(k,i)=fMean(i);
550 for(j=0; j<fNvar; j++) {
551 cstockbig(i,k*fNvar+j)=fCovariance(i, j);
552 }
553 }
554 }
555 //now for the subset with the smallest determinant
556 //repeat CSteps until convergence
557 Int_t minind=TMath::LocMin(nbestsub, detibig);
558 det=detibig[minind];
559 for(i=0; i<fNvar; i++) {
560 fMean(i)=mstockbig(minind,i);
561 fHyperplane(i)=hyperplane(minind,i);
562 for(j=0; j<fNvar; j++)
563 fCovariance(i, j)=cstockbig(i,minind*fNvar + j);
564 }
565 if(det<kEps)
566 CreateOrtSubset(fData, index, fH, fN, sscp, ndist);
567 det=1;
568 while (det>kEps) {
569 det=CStep(fN, fH, index, fData, sscp, ndist);
570 if(TMath::Abs(det-detibig[minind])<kEps) {
571 break;
572 } else {
573 detibig[minind]=det;
574 }
575 }
576 if(det<kEps) {
577 Exact(ndist);
579 }
580 Int_t nout = RDist(sscp);
581 Double_t cutoff=kChiQuant[fNvar-1];
582
583 fOut.Set(nout);
584
585 j=0;
586 for (i=0; i<fN; i++) {
587 if(fRd(i)>cutoff) {
588 fOut[j]=i;
589 j++;
590 }
591 }
592
593 delete [] detibig;
594 delete [] deti;
595 delete [] subdat;
596 delete [] ndist;
597 delete [] index;
598 return;
599}
600
601////////////////////////////////////////////////////////////////////////////////
602///for the univariate case
603///estimates of location and scatter are returned in mean and sigma parameters
604///the algorithm works on the same principle as in multivariate case -
605///it finds a subset of size hh with smallest sigma, and then returns mean and
606///sigma of this subset
607
609{
610 if (hh==0)
611 hh=(nvectors+2)/2;
612 Double_t faclts[]={2.6477,2.5092,2.3826,2.2662,2.1587,2.0589,1.9660,1.879,1.7973,1.7203,1.6473};
613 Int_t *index=new Int_t[nvectors];
614 TMath::Sort(nvectors, data, index, kFALSE);
615
616 Int_t nquant;
617 nquant=TMath::Min(Int_t(Double_t(((hh*1./nvectors)-0.5)*40))+1, 11);
618 Double_t factor=faclts[nquant-1];
619
620 Double_t *aw=new Double_t[nvectors];
621 Double_t *aw2=new Double_t[nvectors];
622 Double_t sq=0;
623 Double_t sqmin=0;
624 Int_t ndup=0;
625 Int_t len=nvectors-hh;
626 Double_t *slutn=new Double_t[len];
627 for(Int_t i=0; i<len; i++)
628 slutn[i]=0;
629 for(Int_t jint=0; jint<len; jint++) {
630 aw[jint]=0;
631 for (Int_t j=0; j<hh; j++) {
632 aw[jint]+=data[index[j+jint]];
633 if(jint==0)
634 sq+=data[index[j]]*data[index[j]];
635 }
636 aw2[jint]=aw[jint]*aw[jint]/hh;
637
638 if(jint==0) {
639 sq=sq-aw2[jint];
640 sqmin=sq;
641 slutn[ndup]=aw[jint];
642
643 } else {
644 sq=sq - data[index[jint-1]]*data[index[jint-1]]+
645 data[index[jint+hh]]*data[index[jint+hh]]-
646 aw2[jint]+aw2[jint-1];
647 if(sq<sqmin) {
648 ndup=0;
649 sqmin=sq;
650 slutn[ndup]=aw[jint];
651
652 } else {
653 if(sq==sqmin) {
654 ndup++;
655 slutn[ndup]=aw[jint];
656 }
657 }
658 }
659 }
660
661 slutn[0]=slutn[Int_t((ndup)/2)]/hh;
662 Double_t bstd=factor*TMath::Sqrt(sqmin/hh);
663 mean=slutn[0];
664 sigma=bstd;
665 delete [] aw;
666 delete [] aw2;
667 delete [] slutn;
668 delete [] index;
669}
670
671////////////////////////////////////////////////////////////////////////////////
672///returns the breakdown point of the algorithm
673
675{
676 Int_t n;
677 n=(fN-fH+1)/fN;
678 return n;
679}
680
681////////////////////////////////////////////////////////////////////////////////
682///returns the chi2 quantiles
683
685{
686 if (i < 0 || i >= 50) return 0;
687 return kChiQuant[i];
688}
689
690////////////////////////////////////////////////////////////////////////////////
691///returns the covariance matrix
692
694{
695 if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar){
696 Warning("GetCovariance","provided matrix is of the wrong size, it will be resized");
697 matr.ResizeTo(fNvar, fNvar);
698 }
699 matr=fCovariance;
700}
701
702////////////////////////////////////////////////////////////////////////////////
703///returns the correlation matrix
704
706{
707 if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar) {
708 Warning("GetCorrelation","provided matrix is of the wrong size, it will be resized");
709 matr.ResizeTo(fNvar, fNvar);
710 }
711 matr=fCorrelation;
712}
713
714////////////////////////////////////////////////////////////////////////////////
715///if the points are on a hyperplane, returns this hyperplane
716
718{
719 if (fExact==0) {
720 Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
721 return 0;
722 } else {
723 return &fHyperplane;
724 }
725}
726
727////////////////////////////////////////////////////////////////////////////////
728///if the points are on a hyperplane, returns this hyperplane
729
731{
732 if (fExact==0){
733 Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
734 return;
735 }
736 if (vec.GetNoElements()!=fNvar) {
737 Warning("GetHyperPlane","provided vector is of the wrong size, it will be resized");
738 vec.ResizeTo(fNvar);
739 }
740 vec=fHyperplane;
741}
742
743////////////////////////////////////////////////////////////////////////////////
744///return the estimate of the mean
745
747{
748 if (means.GetNoElements()!=fNvar) {
749 Warning("GetMean","provided vector is of the wrong size, it will be resized");
750 means.ResizeTo(fNvar);
751 }
752 means=fMean;
753}
754
755////////////////////////////////////////////////////////////////////////////////
756///returns the robust distances (helps to find outliers)
757
759{
760 if (rdist.GetNoElements()!=fN) {
761 Warning("GetRDistances","provided vector is of the wrong size, it will be resized");
762 rdist.ResizeTo(fN);
763 }
764 rdist=fRd;
765}
766
767////////////////////////////////////////////////////////////////////////////////
768///returns the number of outliers
769
771{
772 return fOut.GetSize();
773}
774
775////////////////////////////////////////////////////////////////////////////////
776///update the sscp matrix with vector vec
777
779{
780 Int_t i, j;
781 for (j=1; j<fNvar+1; j++) {
782 sscp(0, j) +=vec(j-1);
783 sscp(j, 0) = sscp(0, j);
784 }
785 for (i=1; i<fNvar+1; i++) {
786 for (j=1; j<fNvar+1; j++) {
787 sscp(i, j) += vec(i-1)*vec(j-1);
788 }
789 }
790}
791
792////////////////////////////////////////////////////////////////////////////////
793///clear the sscp matrix, used for covariance and mean calculation
794
796{
797 for (Int_t i=0; i<fNvar+1; i++) {
798 for (Int_t j=0; j<fNvar+1; j++) {
799 sscp(i, j)=0;
800 }
801 }
802}
803
804////////////////////////////////////////////////////////////////////////////////
805///called when h=n. Returns classic covariance matrix
806///and mean
807
809{
810 TMatrixD sscp(fNvar+1, fNvar+1);
811 TVectorD temp(fNvar);
812 ClearSscp(sscp);
813 for (Int_t i=0; i<fN; i++) {
814 for (Int_t j=0; j<fNvar; j++)
815 temp(j)=fData(i, j);
816 AddToSscp(sscp, temp);
817 }
818 Covar(sscp, fMean, fCovariance, fSd, fN);
819 Correl();
820
821}
822
823////////////////////////////////////////////////////////////////////////////////
824///calculates mean and covariance
825
827{
828 Int_t i, j;
829 Double_t f;
830 for (i=0; i<fNvar; i++) {
831 m(i)=sscp(0, i+1);
832 sd[i]=sscp(i+1, i+1);
833 f=(sd[i]-m(i)*m(i)/nvec)/(nvec-1);
834 if (f>1e-14) sd[i]=TMath::Sqrt(f);
835 else sd[i]=0;
836 m(i)/=nvec;
837 }
838 for (i=0; i<fNvar; i++) {
839 for (j=0; j<fNvar; j++) {
840 cov(i, j)=sscp(i+1, j+1)-nvec*m(i)*m(j);
841 cov(i, j)/=nvec-1;
842 }
843 }
844}
845
846////////////////////////////////////////////////////////////////////////////////
847///transforms covariance matrix into correlation matrix
848
850{
851 Int_t i, j;
852 Double_t *sd=new Double_t[fNvar];
853 for(j=0; j<fNvar; j++)
854 sd[j]=1./TMath::Sqrt(fCovariance(j, j));
855 for(i=0; i<fNvar; i++) {
856 for (j=0; j<fNvar; j++) {
857 if (i==j)
858 fCorrelation(i, j)=1.;
859 else
860 fCorrelation(i, j)=fCovariance(i, j)*sd[i]*sd[j];
861 }
862 }
863 delete [] sd;
864}
865
866////////////////////////////////////////////////////////////////////////////////
867///creates a subset of htotal elements from ntotal elements
868///first, p+1 elements are drawn randomly(without repetitions)
869///if their covariance matrix is singular, more elements are
870///added one by one, until their covariance matrix becomes regular
871///or it becomes clear that htotal observations lie on a hyperplane
872///If covariance matrix determinant!=0, distances of all ntotal elements
873///are calculated, using formula d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where
874///M is mean and S_inv is the inverse of the covariance matrix
875///htotal points with smallest distances are included in the returned subset.
876
877void TRobustEstimator::CreateSubset(Int_t ntotal, Int_t htotal, Int_t p, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
878{
879 Double_t kEps = 1e-14;
880 Int_t i, j;
881 Bool_t repeat=kFALSE;
882 Int_t nindex=0;
883 Int_t num;
884 for(i=0; i<ntotal; i++)
885 index[i]=ntotal+1;
886
887 for (i=0; i<p+1; i++) {
888 num=Int_t(gRandom->Uniform(0, 1)*(ntotal-1));
889 if (i>0){
890 for(j=0; j<=i-1; j++) {
891 if(index[j]==num)
892 repeat=kTRUE;
893 }
894 }
895 if(repeat==kTRUE) {
896 i--;
897 repeat=kFALSE;
898 } else {
899 index[i]=num;
900 nindex++;
901 }
902 }
903
904 ClearSscp(sscp);
905
906 TVectorD vec(fNvar);
907 Double_t det;
908 for (i=0; i<p+1; i++) {
909 for (j=0; j<fNvar; j++) {
910 vec[j]=data[index[i]][j];
911
912 }
913 AddToSscp(sscp, vec);
914 }
915
916 Covar(sscp, fMean, fCovariance, fSd, p+1);
918 while((det<kEps)&&(nindex < htotal)) {
919 //if covariance matrix is singular,another vector is added until
920 //the matrix becomes regular or it becomes clear that all
921 //vectors of the group lie on a hyperplane
922 repeat=kFALSE;
923 do{
924 num=Int_t(gRandom->Uniform(0,1)*(ntotal-1));
925 repeat=kFALSE;
926 for(i=0; i<nindex; i++) {
927 if(index[i]==num) {
928 repeat=kTRUE;
929 break;
930 }
931 }
932 }while(repeat==kTRUE);
933
934 index[nindex]=num;
935 nindex++;
936 //check if covariance matrix is singular
937 for(j=0; j<fNvar; j++)
938 vec[j]=data[index[nindex-1]][j];
939 AddToSscp(sscp, vec);
940 Covar(sscp, fMean, fCovariance, fSd, nindex);
942 }
943
944 if(nindex!=htotal) {
946 fInvcovariance = chol.Invert();
947
948 TVectorD temp(fNvar);
949 for(j=0; j<ntotal; j++) {
950 ndist[j]=0;
951 for(i=0; i<fNvar; i++)
952 temp[i]=data[j][i] - fMean(i);
953 temp*=fInvcovariance;
954 for(i=0; i<fNvar; i++)
955 ndist[j]+=(data[j][i]-fMean(i))*temp[i];
956 }
957 KOrdStat(ntotal, ndist, htotal-1,index);
958 }
959
960}
961
962////////////////////////////////////////////////////////////////////////////////
963///creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane
964///hyp[1]*(x1-mean[1])+...+hyp[nvar]*(xnvar-mean[nvar])=0
965///This function is called in case when less than fH samples lie on a hyperplane.
966
967void TRobustEstimator::CreateOrtSubset(TMatrixD &dat,Int_t *index, Int_t hmerged, Int_t nmerged, TMatrixD &sscp, Double_t *ndist)
968{
969 Int_t i, j;
970 TVectorD vec(fNvar);
971 for (i=0; i<nmerged; i++) {
972 ndist[i]=0;
973 for(j=0; j<fNvar; j++) {
974 ndist[i]+=fHyperplane[j]*(dat[i][j]-fMean[j]);
975 ndist[i]=TMath::Abs(ndist[i]);
976 }
977 }
978 KOrdStat(nmerged, ndist, hmerged-1, index);
979 ClearSscp(sscp);
980 for (i=0; i<hmerged; i++) {
981 for(j=0; j<fNvar; j++)
982 vec[j]=dat[index[i]][j];
983 AddToSscp(sscp, vec);
984 }
985 Covar(sscp, fMean, fCovariance, fSd, hmerged);
986}
987
988////////////////////////////////////////////////////////////////////////////////
989///from the input htotal-subset constructs another htotal subset with lower determinant
990///
991///As proven by Peter J.Rousseeuw and Katrien Van Driessen, if distances for all elements
992///are calculated, using the formula:d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where M is the mean
993///of the input htotal-subset, and S_inv - the inverse of its covariance matrix, then
994///htotal elements with smallest distances will have covariance matrix with determinant
995///less or equal to the determinant of the input subset covariance matrix.
996///
997///determinant for this htotal-subset with smallest distances is returned
998
999Double_t TRobustEstimator::CStep(Int_t ntotal, Int_t htotal, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
1000{
1001 Int_t i, j;
1002 TVectorD vec(fNvar);
1003 Double_t det;
1004
1006 fInvcovariance = chol.Invert();
1007
1008 TVectorD temp(fNvar);
1009 for(j=0; j<ntotal; j++) {
1010 ndist[j]=0;
1011 for(i=0; i<fNvar; i++)
1012 temp[i]=data[j][i]-fMean[i];
1013 temp*=fInvcovariance;
1014 for(i=0; i<fNvar; i++)
1015 ndist[j]+=(data[j][i]-fMean[i])*temp[i];
1016 }
1017
1018 //taking h smallest
1019 KOrdStat(ntotal, ndist, htotal-1, index);
1020 //writing their mean and covariance
1021 ClearSscp(sscp);
1022 for (i=0; i<htotal; i++) {
1023 for (j=0; j<fNvar; j++)
1024 temp[j]=data[index[i]][j];
1025 AddToSscp(sscp, temp);
1026 }
1027 Covar(sscp, fMean, fCovariance, fSd, htotal);
1028 det = fCovariance.Determinant();
1029 return det;
1030}
1031
1032////////////////////////////////////////////////////////////////////////////////
1033///for the exact fit situations
1034///returns number of observations on the hyperplane
1035
1037{
1038 Int_t i, j;
1039
1041 TVectorD eigenValues=eigen.GetEigenValues();
1042 TMatrixD eigenMatrix=eigen.GetEigenVectors();
1043
1044 for (j=0; j<fNvar; j++) {
1045 fHyperplane[j]=eigenMatrix(j,fNvar-1);
1046 }
1047 //calculate and return how many observations lie on the hyperplane
1048 for (i=0; i<fN; i++) {
1049 ndist[i]=0;
1050 for(j=0; j<fNvar; j++) {
1051 ndist[i]+=fHyperplane[j]*(fData[i][j]-fMean[j]);
1052 ndist[i]=TMath::Abs(ndist[i]);
1053 }
1054 }
1055 Int_t nhyp=0;
1056
1057 for (i=0; i<fN; i++) {
1058 if(ndist[i] < 1e-14) nhyp++;
1059 }
1060 return nhyp;
1061
1062}
1063
1064////////////////////////////////////////////////////////////////////////////////
1065///This function is called if determinant of the covariance matrix of a subset=0.
1066///
1067///If there are more then fH vectors on a hyperplane,
1068///returns this hyperplane and stops
1069///else stores the hyperplane coordinates in hyperplane matrix
1070
1071Int_t TRobustEstimator::Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane,
1072 Double_t *deti, Int_t nbest, Int_t kgroup,
1073 TMatrixD &sscp, Double_t *ndist)
1074{
1075 Int_t i, j;
1076
1077 TVectorD vec(fNvar);
1078 Int_t maxind = TMath::LocMax(nbest, deti);
1079 Int_t nh=Exact(ndist);
1080 //now nh is the number of observation on the hyperplane
1081 //ndist stores distances of observation from this hyperplane
1082 if(nh>=fH) {
1083 ClearSscp(sscp);
1084 for (i=0; i<fN; i++) {
1085 if(ndist[i]<1e-14) {
1086 for (j=0; j<fNvar; j++)
1087 vec[j]=fData[i][j];
1088 AddToSscp(sscp, vec);
1089 }
1090 }
1091 Covar(sscp, fMean, fCovariance, fSd, nh);
1092
1093 fExact=nh;
1094 return nbest+1;
1095
1096 } else {
1097 //if less than fH observations lie on a hyperplane,
1098 //mean and covariance matrix are stored in mstockbig
1099 //and cstockbig in place of the previous maximum determinant
1100 //mean and covariance
1101 for(i=0; i<fNvar; i++) {
1102 mstockbig(nbest*kgroup+maxind,i)=fMean(i);
1103 hyperplane(nbest*kgroup+maxind,i)=fHyperplane(i);
1104 for(j=0; j<fNvar; j++) {
1105 cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
1106 }
1107 }
1108 return maxind;
1109 }
1110}
1111
1112
1113////////////////////////////////////////////////////////////////////////////////
1114///divides the elements into approximately equal subgroups
1115///number of elements in each subgroup is stored in indsubdat
1116///number of subgroups is returned
1117
1119{
1120 Int_t nsub;
1121 if ((fN>=2*nmini) && (fN<=(3*nmini-1))) {
1122 if (fN%2==1){
1123 indsubdat[0]=Int_t(fN*0.5);
1124 indsubdat[1]=Int_t(fN*0.5)+1;
1125 } else
1126 indsubdat[0]=indsubdat[1]=Int_t(fN/2);
1127 nsub=2;
1128 }
1129 else{
1130 if((fN>=3*nmini) && (fN<(4*nmini -1))) {
1131 if(fN%3==0){
1132 indsubdat[0]=indsubdat[1]=indsubdat[2]=Int_t(fN/3);
1133 } else {
1134 indsubdat[0]=Int_t(fN/3);
1135 indsubdat[1]=Int_t(fN/3)+1;
1136 if (fN%3==1) indsubdat[2]=Int_t(fN/3);
1137 else indsubdat[2]=Int_t(fN/3)+1;
1138 }
1139 nsub=3;
1140 }
1141 else{
1142 if((fN>=4*nmini)&&(fN<=(5*nmini-1))){
1143 if (fN%4==0) indsubdat[0]=indsubdat[1]=indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1144 else {
1145 indsubdat[0]=Int_t(fN/4);
1146 indsubdat[1]=Int_t(fN/4)+1;
1147 if(fN%4==1) indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1148 if(fN%4==2) {
1149 indsubdat[2]=Int_t(fN/4)+1;
1150 indsubdat[3]=Int_t(fN/4);
1151 }
1152 if(fN%4==3) indsubdat[2]=indsubdat[3]=Int_t(fN/4)+1;
1153 }
1154 nsub=4;
1155 } else {
1156 for(Int_t i=0; i<5; i++)
1157 indsubdat[i]=nmini;
1158 nsub=5;
1159 }
1160 }
1161 }
1162 return nsub;
1163}
1164
1165////////////////////////////////////////////////////////////////////////////////
1166///Calculates robust distances.Then the samples with robust distances
1167///greater than a cutoff value (0.975 quantile of chi2 distribution with
1168///fNvar degrees of freedom, multiplied by a correction factor), are given
1169///weiht=0, and new, reweighted estimates of location and scatter are calculated
1170///The function returns the number of outliers.
1171
1173{
1174 Int_t i, j;
1175 Int_t nout=0;
1176
1177 TVectorD temp(fNvar);
1179 fInvcovariance = chol.Invert();
1180
1181
1182 for (i=0; i<fN; i++) {
1183 fRd[i]=0;
1184 for(j=0; j<fNvar; j++) {
1185 temp[j]=fData[i][j]-fMean[j];
1186 }
1187 temp*=fInvcovariance;
1188 for(j=0; j<fNvar; j++) {
1189 fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1190 }
1191 }
1192
1193 Double_t med;
1194 Double_t chi = kChiMedian[fNvar-1];
1195
1197 med/=chi;
1198 fCovariance*=med;
1199 TDecompChol chol2(fCovariance);
1200 fInvcovariance = chol2.Invert();
1201
1202 for (i=0; i<fN; i++) {
1203 fRd[i]=0;
1204 for(j=0; j<fNvar; j++) {
1205 temp[j]=fData[i][j]-fMean[j];
1206 }
1207
1208 temp*=fInvcovariance;
1209 for(j=0; j<fNvar; j++) {
1210 fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1211 }
1212 }
1213
1214 Double_t cutoff = kChiQuant[fNvar-1];
1215
1216 ClearSscp(sscp);
1217 for(i=0; i<fN; i++) {
1218 if (fRd[i]<=cutoff) {
1219 for(j=0; j<fNvar; j++)
1220 temp[j]=fData[i][j];
1221 AddToSscp(sscp,temp);
1222 } else {
1223 nout++;
1224 }
1225 }
1226
1227 Covar(sscp, fMean, fCovariance, fSd, fN-nout);
1228 return nout;
1229}
1230
1231////////////////////////////////////////////////////////////////////////////////
1232///Draws ngroup nonoverlapping subdatasets out of a dataset of size n
1233///such that the selected case numbers are uniformly distributed from 1 to n
1234
1235void TRobustEstimator::RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
1236{
1237 Int_t jndex = 0;
1238 Int_t nrand;
1239 Int_t i, k, m, j;
1240 for (k=1; k<=ngroup; k++) {
1241 for (m=1; m<=indsubdat[k-1]; m++) {
1242 nrand = Int_t(gRandom->Uniform(0, 1) * double(fN-jndex))+1;
1243 //printf("nrand = %d - jndex %d\n",nrand,jndex);
1244 jndex++;
1245 if (jndex==1) {
1246 subdat[0]=nrand-1; // in case nrand is equal to fN
1247 } else {
1248 subdat[jndex-1]=nrand+jndex-2;
1249 for (i=1; i<=jndex-1; i++) {
1250 if(subdat[i-1] > nrand+i-2) {
1251 for(j=jndex; j>=i+1; j--) {
1252 subdat[j-1]=subdat[j-2];
1253 }
1254 //printf("subdata[] i = %d - nrand %d\n",i,nrand);
1255 subdat[i-1]=nrand+i-2;
1256 break; //breaking the loop for(i=1...
1257 }
1258 }
1259 }
1260 }
1261 }
1262}
1263
1264////////////////////////////////////////////////////////////////////////////////
1265///because I need an Int_t work array
1266
1268 Bool_t isAllocated = kFALSE;
1269 const Int_t kWorkMax=100;
1270 Int_t i, ir, j, l, mid;
1271 Int_t arr;
1272 Int_t *ind;
1273 Int_t workLocal[kWorkMax];
1274 Int_t temp;
1275
1276
1277 if (work) {
1278 ind = work;
1279 } else {
1280 ind = workLocal;
1281 if (ntotal > kWorkMax) {
1282 isAllocated = kTRUE;
1283 ind = new Int_t[ntotal];
1284 }
1285 }
1286
1287 for (Int_t ii=0; ii<ntotal; ii++) {
1288 ind[ii]=ii;
1289 }
1290 Int_t rk = k;
1291 l=0;
1292 ir = ntotal-1;
1293 for(;;) {
1294 if (ir<=l+1) { //active partition contains 1 or 2 elements
1295 if (ir == l+1 && a[ind[ir]]<a[ind[l]])
1296 {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1297 Double_t tmp = a[ind[rk]];
1298 if (isAllocated)
1299 delete [] ind;
1300 return tmp;
1301 } else {
1302 mid = (l+ir) >> 1; //choose median of left, center and right
1303 {temp = ind[mid]; ind[mid]=ind[l+1]; ind[l+1]=temp;}//elements as partitioning element arr.
1304 if (a[ind[l]]>a[ind[ir]]) //also rearrange so that a[l]<=a[l+1]
1305 {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1306
1307 if (a[ind[l+1]]>a[ind[ir]])
1308 {temp=ind[l+1]; ind[l+1]=ind[ir]; ind[ir]=temp;}
1309
1310 if (a[ind[l]]>a[ind[l+1]])
1311 {temp = ind[l]; ind[l]=ind[l+1]; ind[l+1]=temp;}
1312
1313 i=l+1; //initialize pointers for partitioning
1314 j=ir;
1315 arr = ind[l+1];
1316 for (;;) {
1317 do i++; while (a[ind[i]]<a[arr]);
1318 do j--; while (a[ind[j]]>a[arr]);
1319 if (j<i) break; //pointers crossed, partitioning complete
1320 {temp=ind[i]; ind[i]=ind[j]; ind[j]=temp;}
1321 }
1322 ind[l+1]=ind[j];
1323 ind[j]=arr;
1324 if (j>=rk) ir = j-1; //keep active the partition that
1325 if (j<=rk) l=i; //contains the k_th element
1326 }
1327 }
1328}
#define f(i)
Definition RSha256.hxx:104
#define a(i)
Definition RSha256.hxx:99
#define e(i)
Definition RSha256.hxx:103
int Int_t
Definition RtypesCore.h:45
const Bool_t kFALSE
Definition RtypesCore.h:92
double Double_t
Definition RtypesCore.h:59
const Bool_t kTRUE
Definition RtypesCore.h:91
#define ClassImp(name)
Definition Rtypes.h:364
#define R__ASSERT(e)
Definition TError.h:120
R__EXTERN TRandom * gRandom
Definition TRandom.h:62
const Double_t kChiMedian[50]
const Double_t kChiQuant[50]
void Set(Int_t n)
Set size of this array to n ints.
Definition TArrayI.cxx:105
Int_t GetSize() const
Definition TArray.h:47
Cholesky Decomposition class.
Definition TDecompChol.h:25
Bool_t Invert(TMatrixDSym &inv)
For a symmetric matrix A(m,m), its inverse A_inv(m,m) is returned .
TMatrixDSymEigen.
const TVectorD & GetEigenValues() const
const TMatrixD & GetEigenVectors() const
Int_t GetNrows() const
Int_t GetNcols() const
virtual Double_t Determinant() const
virtual TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1)
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
virtual TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1)
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
virtual void Warning(const char *method, const char *msgfmt,...) const
Issue warning message.
Definition TObject.cxx:879
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition TObject.cxx:893
virtual Double_t Uniform(Double_t x1=1)
Returns a uniform deviate on the interval (0, x1).
Definition TRandom.cxx:672
Minimum Covariance Determinant Estimator - a Fast Algorithm invented by Peter J.Rousseeuw and Katrien...
Int_t RDist(TMatrixD &sscp)
Calculates robust distances.Then the samples with robust distances greater than a cutoff value (0....
TMatrixDSym fCovariance
Double_t CStep(Int_t ntotal, Int_t htotal, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
from the input htotal-subset constructs another htotal subset with lower determinant
const TMatrixDSym * GetCovariance() const
TMatrixDSym fInvcovariance
void CreateSubset(Int_t ntotal, Int_t htotal, Int_t p, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
creates a subset of htotal elements from ntotal elements first, p+1 elements are drawn randomly(witho...
void Covar(TMatrixD &sscp, TVectorD &m, TMatrixDSym &cov, TVectorD &sd, Int_t nvec)
calculates mean and covariance
Int_t Exact(Double_t *ndist)
for the exact fit situations returns number of observations on the hyperplane
Double_t KOrdStat(Int_t ntotal, Double_t *arr, Int_t k, Int_t *work)
because I need an Int_t work array
Int_t GetNOut()
returns the number of outliers
TRobustEstimator()
this constructor should be used in a univariate case: first call this constructor,...
void Correl()
transforms covariance matrix into correlation matrix
void AddColumn(Double_t *col)
adds a column to the data matrix it is assumed that the column has size fN variable fVarTemp keeps th...
void RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
Draws ngroup nonoverlapping subdatasets out of a dataset of size n such that the selected case number...
void Classic()
called when h=n.
void Evaluate()
Finds the estimate of multivariate mean and variance.
const TVectorD * GetMean() const
Int_t Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane, Double_t *deti, Int_t nbest, Int_t kgroup, TMatrixD &sscp, Double_t *ndist)
This function is called if determinant of the covariance matrix of a subset=0.
const TVectorD * GetRDistances() const
Int_t GetBDPoint()
returns the breakdown point of the algorithm
void CreateOrtSubset(TMatrixD &dat, Int_t *index, Int_t hmerged, Int_t nmerged, TMatrixD &sscp, Double_t *ndist)
creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane hyp[1]*(x1-m...
void AddRow(Double_t *row)
adds a vector to the data matrix it is supposed that the vector is of size fNvar
TMatrixDSym fCorrelation
const TVectorD * GetHyperplane() const
if the points are on a hyperplane, returns this hyperplane
Int_t Partition(Int_t nmini, Int_t *indsubdat)
divides the elements into approximately equal subgroups number of elements in each subgroup is stored...
const TMatrixDSym * GetCorrelation() const
Double_t GetChiQuant(Int_t i) const
returns the chi2 quantiles
void EvaluateUni(Int_t nvectors, Double_t *data, Double_t &mean, Double_t &sigma, Int_t hh=0)
for the univariate case estimates of location and scatter are returned in mean and sigma parameters t...
void AddToSscp(TMatrixD &sscp, TVectorD &vec)
update the sscp matrix with vector vec
void ClearSscp(TMatrixD &sscp)
clear the sscp matrix, used for covariance and mean calculation
TVectorT< Element > & ResizeTo(Int_t lwb, Int_t upb)
Resize the vector to [lwb:upb] .
Definition TVectorT.cxx:294
Int_t GetNoElements() const
Definition TVectorT.h:76
Element * GetMatrixArray()
Definition TVectorT.h:78
const Double_t sigma
const Int_t n
Definition legend1.C:16
Long64_t LocMin(Long64_t n, const T *a)
Return index of array with the minimum element.
Definition TMath.h:972
Long64_t LocMax(Long64_t n, const T *a)
Return index of array with the maximum element.
Definition TMath.h:1000
Double_t Sqrt(Double_t x)
Definition TMath.h:691
Short_t Min(Short_t a, Short_t b)
Definition TMathBase.h:180
void Sort(Index n, const Element *a, Index *index, Bool_t down=kTRUE)
Definition TMathBase.h:362
Double_t Median(Long64_t n, const T *a, const Double_t *w=0, Long64_t *work=0)
Return the median of the array a where each entry i has weight w[i] .
Definition TMath.h:1247
Short_t Abs(Short_t d)
Definition TMathBase.h:120
auto * m
Definition textangle.C:8
auto * l
Definition textangle.C:4
static uint64_t sum(uint64_t i)
Definition Factory.cxx:2345