Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
TRobustEstimator.cxx
Go to the documentation of this file.
1// @(#)root/physics:$Id$
2// Author: Anna Kreshuk 08/10/2004
3
4/*************************************************************************
5 * Copyright (C) 1995-2004, Rene Brun and Fons Rademakers. *
6 * All rights reserved. *
7 * *
8 * For the licensing terms see $ROOTSYS/LICENSE. *
9 * For the list of contributors see $ROOTSYS/README/CREDITS. *
10 *************************************************************************/
11
12/** \class TRobustEstimator
13 \note Despite being in the group of Legacy statistics classes, TRobustEstimator is still useful and no drop-in replacement exists for it.
14 \ingroup Physics
15Minimum Covariance Determinant Estimator - a Fast Algorithm
16invented by Peter J.Rousseeuw and Katrien Van Dreissen
17"A Fast Algorithm for the Minimum covariance Determinant Estimator"
18Technometrics, August 1999, Vol.41, NO.3
19
20What are robust estimators?
21"An important property of an estimator is its robustness. An estimator
22is called robust if it is insensitive to measurements that deviate
23from the expected behaviour. There are 2 ways to treat such deviating
24measurements: one may either try to recognise them and then remove
25them from the data sample; or one may leave them in the sample, taking
26care that they do not influence the estimate unduly. In both cases robust
27estimators are needed...Robust procedures compensate for systematic errors
28as much as possible, and indicate any situation in which a danger of not being
29able to operate reliably is detected."
30R.Fruhwirth, M.Regler, R.K.Bock, H.Grote, D.Notz
31"Data Analysis Techniques for High-Energy Physics", 2nd edition
32
33What does this algorithm do?
34It computes a highly robust estimator of multivariate location and scatter.
35Then, it takes those estimates to compute robust distances of all the
36data vectors. Those with large robust distances are considered outliers.
37Robust distances can then be plotted for better visualization of the data.
38
39How does this algorithm do it?
40The MCD objective is to find h observations(out of n) whose classical
41covariance matrix has the lowest determinant. The MCD estimator of location
42is then the average of those h points and the MCD estimate of scatter
43is their covariance matrix. The minimum(and default) h = (n+nvariables+1)/2
44so the algorithm is effective when less than (n+nvar+1)/2 variables are outliers.
45The algorithm also allows for exact fit situations - that is, when h or more
46observations lie on a hyperplane. Then the algorithm still yields the MCD location T
47and scatter matrix S, the latter being singular as it should be. From (T,S) the
48program then computes the equation of the hyperplane.
49
50How can this algorithm be used?
51In any case, when contamination of data is suspected, that might influence
52the classical estimates.
53Also, robust estimation of location and scatter is a tool to robustify
54other multivariate techniques such as, for example, principal-component analysis
55and discriminant analysis.
56
57Technical details of the algorithm:
58
591. The default h = (n+nvariables+1)/2, but the user may choose any integer h with
60 (n+nvariables+1)/2<=h<=n. The program then reports the MCD's breakdown value
61 (n-h+1)/n. If you are sure that the dataset contains less than 25% contamination
62 which is usually the case, a good compromise between breakdown value and
63 efficiency is obtained by putting h=[.75*n].
642. If h=n,the MCD location estimate is the average of the whole dataset, and
65 the MCD scatter estimate is its covariance matrix. Report this and stop
663. If nvariables=1 (univariate data), compute the MCD estimate by the exact
67 algorithm of Rousseeuw and Leroy (1987, pp.171-172) in O(nlogn)time and stop
684. From here on, h<n and nvariables>=2.
69 1. If n is small:
70 - repeat (say) 500 times:
71 - construct an initial h-subset, starting from a random (nvar+1)-subset
72 - carry out 2 C-steps (described in the comments of CStep function)
73 - for the 10 results with lowest det(S):
74 - carry out C-steps until convergence
75 - report the solution (T, S) with the lowest det(S)
76 2. If n is larger (say, n>600), then
77 - construct up to 5 disjoint random subsets of size nsub (say, nsub=300)
78 - inside each subset repeat 500/5 times:
79 - construct an initial subset of size hsub=[nsub*h/n]
80 - carry out 2 C-steps
81 - keep the best 10 results (Tsub, Ssub)
82 - pool the subsets, yielding the merged set (say, of size nmerged=1500)
83 - in the merged set, repeat for each of the 50 solutions (Tsub, Ssub)
84 - carry out 2 C-steps
85 - keep the 10 best results
86 - in the full dataset, repeat for those best results:
87 - take several C-steps, using n and h
88 - report the best final result (T, S)
895. To obtain consistency when the data comes from a multivariate normal
90 distribution, covariance matrix is multiplied by a correction factor
916. Robust distances for all elements, using the final (T, S) are calculated
92 Then the very final mean and covariance estimates are calculated only for
93 values, whose robust distances are less than a cutoff value (0.975 quantile
94 of chi2 distribution with nvariables degrees of freedom)
95*/
96
97#include "TRobustEstimator.h"
98#include "TMatrixDSymEigen.h"
99#include "TRandom.h"
100#include "TMath.h"
101#include "TDecompChol.h"
102
104
106 0.454937, 1.38629, 2.36597, 3.35670, 4.35146, 5.34812, 6.34581, 7.34412, 8.34283,
107 9.34182, 10.34, 11.34, 12.34, 13.34, 14.34, 15.34, 16.34, 17.34, 18.34, 19.34,
108 20.34, 21.34, 22.34, 23.34, 24.34, 25.34, 26.34, 27.34, 28.34, 29.34, 30.34,
109 31.34, 32.34, 33.34, 34.34, 35.34, 36.34, 37.34, 38.34, 39.34, 40.34,
110 41.34, 42.34, 43.34, 44.34, 45.34, 46.34, 47.34, 48.34, 49.33};
111
113 5.02389, 7.3776,9.34840,11.1433,12.8325,
114 14.4494,16.0128,17.5346,19.0228,20.4831,21.920,23.337,
115 24.736,26.119,27.488,28.845,30.191,31.526,32.852,34.170,
116 35.479,36.781,38.076,39.364,40.646,41.923,43.194,44.461,
117 45.722,46.979,48.232,49.481,50.725,51.966,53.203,54.437,
118 55.668,56.896,58.120,59.342,60.561,61.777,62.990,64.201,
119 65.410,66.617,67.821,69.022,70.222,71.420};
120
121////////////////////////////////////////////////////////////////////////////////
122///this constructor should be used in a univariate case:
123///first call this constructor, then - the EvaluateUni(..) function
124
126}
127
128////////////////////////////////////////////////////////////////////////////////
129///constructor
130
132 :fMean(nvariables),
133 fCovariance(nvariables),
134 fInvcovariance(nvariables),
135 fCorrelation(nvariables),
136 fRd(nvectors),
137 fSd(nvectors),
138 fOut(1),
139 fHyperplane(nvariables),
140 fData(nvectors, nvariables)
141{
142 if ((nvectors<=1)||(nvariables<=0)){
143 Error("TRobustEstimator","Not enough vectors or variables");
144 return;
145 }
146 if (nvariables==1){
147 Error("TRobustEstimator","For the univariate case, use the default constructor and EvaluateUni() function");
148 return;
149 }
150
151 fN=nvectors;
152 fNvar=nvariables;
153 if (hh<(fN+fNvar+1)/2){
154 if (hh>0)
155 Warning("TRobustEstimator","chosen h is too small, default h is taken instead");
156 fH=(fN+fNvar+1)/2;
157 } else
158 fH=hh;
159
160 fVarTemp=0;
161 fVecTemp=0;
162 fExact=0;
163}
164
165////////////////////////////////////////////////////////////////////////////////
166///adds a column to the data matrix
167///it is assumed that the column has size fN
168///variable fVarTemp keeps the number of columns l
169///already added
170
172{
173 if (fVarTemp==fNvar) {
174 fNvar++;
181 }
182 for (Int_t i=0; i<fN; i++) {
183 fData(i, fVarTemp)=col[i];
184 }
185 fVarTemp++;
186}
187
188////////////////////////////////////////////////////////////////////////////////
189///adds a vector to the data matrix
190///it is supposed that the vector is of size fNvar
191
193{
194 if(fVecTemp==fN) {
195 fN++;
196 fRd.ResizeTo(fN);
197 fSd.ResizeTo(fN);
199 }
200 for (Int_t i=0; i<fNvar; i++)
201 fData(fVecTemp, i)=row[i];
202
203 fVecTemp++;
204}
205
206////////////////////////////////////////////////////////////////////////////////
207///Finds the estimate of multivariate mean and variance
208
210{
211 Double_t kEps=1e-14;
212
213 if (fH==fN){
214 Warning("Evaluate","Chosen h = #observations, so classic estimates of location and scatter will be calculated");
215 Classic();
216 return;
217 }
218
219 Int_t i, j, k;
220 Int_t ii, jj;
221 Int_t nmini = 300;
222 Int_t k1=500;
223 Int_t nbest=10;
224 TMatrixD sscp(fNvar+1, fNvar+1);
226
227 Int_t *index = new Int_t[fN];
228 Double_t *ndist = new Double_t[fN];
229 Double_t det;
230 Double_t *deti=new Double_t[nbest];
231 for (i=0; i<nbest; i++)
232 deti[i]=1e16;
233
234 for (i=0; i<fN; i++)
235 fRd(i)=0;
236 ////////////////////////////
237 //for small n
238 ////////////////////////////
239 if (fN<nmini*2) {
240 //for storing the best fMeans and covariances
241
242 TMatrixD mstock(nbest, fNvar);
243 TMatrixD cstock(fNvar, fNvar*nbest);
244
245 for (k=0; k<k1; k++) {
246 CreateSubset(fN, fH, fNvar, index, fData, sscp, ndist);
247 //calculate the mean and covariance of the created subset
248 ClearSscp(sscp);
249 for (i=0; i<fH; i++) {
250 for(j=0; j<fNvar; j++)
251 vec(j)=fData[index[i]][j];
252 AddToSscp(sscp, vec);
253 }
254 Covar(sscp, fMean, fCovariance, fSd, fH);
255 det = fCovariance.Determinant();
256 if (det < kEps) {
257 fExact = Exact(ndist);
258 delete [] index;
259 delete [] ndist;
260 delete [] deti;
261 return;
262 }
263 //make 2 CSteps
264 det = CStep(fN, fH, index, fData, sscp, ndist);
265 if (det < kEps) {
266 fExact = Exact(ndist);
267 delete [] index;
268 delete [] ndist;
269 delete [] deti;
270 return;
271 }
272 det = CStep(fN, fH, index, fData, sscp, ndist);
273 if (det < kEps) {
274 fExact = Exact(ndist);
275 delete [] index;
276 delete [] ndist;
277 delete [] deti;
278 return;
279 } else {
280 Int_t maxind=TMath::LocMax(nbest, deti);
281 if(det<deti[maxind]) {
282 deti[maxind]=det;
283 for(ii=0; ii<fNvar; ii++) {
284 mstock(maxind, ii)=fMean(ii);
285 for(jj=0; jj<fNvar; jj++)
286 cstock(ii, jj+maxind*fNvar)=fCovariance(ii, jj);
287 }
288 }
289 }
290 }
291
292 //now for nbest best results perform CSteps until convergence
293
294 for (i=0; i<nbest; i++) {
295 for(ii=0; ii<fNvar; ii++) {
296 fMean(ii)=mstock(i, ii);
297 for (jj=0; jj<fNvar; jj++)
298 fCovariance(ii, jj)=cstock(ii, jj+i*fNvar);
299 }
300
301 det=1;
302 while (det>kEps) {
303 det=CStep(fN, fH, index, fData, sscp, ndist);
304 if(TMath::Abs(det-deti[i])<kEps)
305 break;
306 else
307 deti[i]=det;
308 }
309 for(ii=0; ii<fNvar; ii++) {
310 mstock(i,ii)=fMean(ii);
311 for (jj=0; jj<fNvar; jj++)
312 cstock(ii,jj+i*fNvar)=fCovariance(ii, jj);
313 }
314 }
315
316 Int_t detind=TMath::LocMin(nbest, deti);
317 for(ii=0; ii<fNvar; ii++) {
318 fMean(ii)=mstock(detind,ii);
319
320 for(jj=0; jj<fNvar; jj++)
321 fCovariance(ii, jj)=cstock(ii,jj+detind*fNvar);
322 }
323
324 if (deti[detind]!=0) {
325 //calculate robust distances and throw out the bad points
326 Int_t nout = RDist(sscp);
327 Double_t cutoff=kChiQuant[fNvar-1];
328
329 fOut.Set(nout);
330
331 j=0;
332 for (i=0; i<fN; i++) {
333 if(fRd(i)>cutoff) {
334 fOut[j]=i;
335 j++;
336 }
337 }
338
339 } else {
340 fExact=Exact(ndist);
341 }
342 delete [] index;
343 delete [] ndist;
344 delete [] deti;
345 return;
346
347 }
348 /////////////////////////////////////////////////
349 //if n>nmini, the dataset should be partitioned
350 //partitioning
351 ////////////////////////////////////////////////
352 Int_t indsubdat[5];
353 Int_t nsub;
354 for (ii=0; ii<5; ii++)
355 indsubdat[ii]=0;
356
357 nsub = Partition(nmini, indsubdat);
358
359 Int_t sum=0;
360 for (ii=0; ii<5; ii++)
361 sum+=indsubdat[ii];
362 Int_t *subdat=new Int_t[sum];
363 //printf("allocates subdat[ %d ]\n", sum);
364 // init the subdat matrix
365 for (int iii = 0; iii < sum; ++iii) subdat[iii] = -999;
366 RDraw(subdat, nsub, indsubdat);
367 for (int iii = 0; iii < sum; ++iii) {
368 if (subdat[iii] < 0 || subdat[iii] >= fN ) {
369 Error("Evaluate","subdat index is invalid subdat[%d] = %d",iii, subdat[iii] );
370 R__ASSERT(0);
371 }
372 }
373 //now the indexes of selected cases are in the array subdat
374 //matrices to store best means and covariances
375 Int_t nbestsub=nbest*nsub;
376 TMatrixD mstockbig(nbestsub, fNvar);
377 TMatrixD cstockbig(fNvar, fNvar*nbestsub);
378 TMatrixD hyperplane(nbestsub, fNvar);
379 for (i=0; i<nbestsub; i++) {
380 for(j=0; j<fNvar; j++)
381 hyperplane(i,j)=0;
382 }
383 Double_t *detibig = new Double_t[nbestsub];
384 Int_t maxind;
385 maxind=TMath::LocMax(5, indsubdat);
386 TMatrixD dattemp(indsubdat[maxind], fNvar);
387
388 Int_t k2=Int_t(k1/nsub);
389 //construct h-subsets and perform 2 CSteps in subgroups
390
391 for (Int_t kgroup=0; kgroup<nsub; kgroup++) {
392 //printf("group #%d\n", kgroup);
393 Int_t ntemp=indsubdat[kgroup];
394 Int_t temp=0;
395 for (i=0; i<kgroup; i++)
396 temp+=indsubdat[i];
397 Int_t par;
398
399
400 for(i=0; i<ntemp; i++) {
401 for (j=0; j<fNvar; j++) {
402 dattemp(i,j)=fData[subdat[temp+i]][j];
403 }
404 }
405 Int_t htemp=Int_t(fH*ntemp/fN);
406
407 for (i=0; i<nbest; i++)
408 deti[i]=1e16;
409
410 for(k=0; k<k2; k++) {
411 CreateSubset(ntemp, htemp, fNvar, index, dattemp, sscp, ndist);
412 ClearSscp(sscp);
413 for (i=0; i<htemp; i++) {
414 for(j=0; j<fNvar; j++) {
415 vec(j)=dattemp(index[i],j);
416 }
417 AddToSscp(sscp, vec);
418 }
419 Covar(sscp, fMean, fCovariance, fSd, htemp);
420 det = fCovariance.Determinant();
421 if (det<kEps) {
422 par =Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
423 if(par==nbest+1) {
424
425 delete [] detibig;
426 delete [] deti;
427 delete [] subdat;
428 delete [] ndist;
429 delete [] index;
430 return;
431 } else
432 deti[par]=det;
433 } else {
434 det = CStep(ntemp, htemp, index, dattemp, sscp, ndist);
435 if (det<kEps) {
436 par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp, ndist);
437 if(par==nbest+1) {
438
439 delete [] detibig;
440 delete [] deti;
441 delete [] subdat;
442 delete [] ndist;
443 delete [] index;
444 return;
445 } else
446 deti[par]=det;
447 } else {
448 det=CStep(ntemp,htemp, index, dattemp, sscp, ndist);
449 if(det<kEps){
450 par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
451 if(par==nbest+1) {
452
453 delete [] detibig;
454 delete [] deti;
455 delete [] subdat;
456 delete [] ndist;
457 delete [] index;
458 return;
459 } else {
460 deti[par]=det;
461 }
462 } else {
463 maxind=TMath::LocMax(nbest, deti);
464 if(det<deti[maxind]) {
465 deti[maxind]=det;
466 for(i=0; i<fNvar; i++) {
467 mstockbig(nbest*kgroup+maxind,i)=fMean(i);
468 for(j=0; j<fNvar; j++) {
469 cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
470
471 }
472 }
473 }
474
475 }
476 }
477 }
478
479 maxind=TMath::LocMax(nbest, deti);
480 if (deti[maxind]<kEps)
481 break;
482 }
483
484
485 for(i=0; i<nbest; i++) {
486 detibig[kgroup*nbest + i]=deti[i];
487
488 }
489
490 }
491
492 //now the arrays mstockbig and cstockbig store nbest*nsub best means and covariances
493 //detibig stores nbest*nsub their determinants
494 //merge the subsets and carry out 2 CSteps on the merged set for all 50 best solutions
495
496 TMatrixD datmerged(sum, fNvar);
497 for(i=0; i<sum; i++) {
498 for (j=0; j<fNvar; j++)
499 datmerged(i,j)=fData[subdat[i]][j];
500 }
501 // printf("performing calculations for merged set\n");
502 Int_t hmerged=Int_t(sum*fH/fN);
503
504 Int_t nh;
505 for(k=0; k<nbestsub; k++) {
506 //for all best solutions perform 2 CSteps and then choose the very best
507 for(ii=0; ii<fNvar; ii++) {
508 fMean(ii)=mstockbig(k,ii);
509 for(jj=0; jj<fNvar; jj++)
510 fCovariance(ii, jj)=cstockbig(ii,k*fNvar+jj);
511 }
512 if(detibig[k]==0) {
513 for(i=0; i<fNvar; i++)
514 fHyperplane(i)=hyperplane(k,i);
515 CreateOrtSubset(datmerged,index, hmerged, sum, sscp, ndist);
516
517 }
518 det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
519 if (det<kEps) {
520 nh= Exact(ndist);
521 if (nh>=fH) {
522 fExact = nh;
523
524 delete [] detibig;
525 delete [] deti;
526 delete [] subdat;
527 delete [] ndist;
528 delete [] index;
529 return;
530 } else {
531 CreateOrtSubset(datmerged, index, hmerged, sum, sscp, ndist);
532 }
533 }
534
535 det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
536 if (det<kEps) {
537 nh=Exact(ndist);
538 if (nh>=fH) {
539 fExact = nh;
540 delete [] detibig;
541 delete [] deti;
542 delete [] subdat;
543 delete [] ndist;
544 delete [] index;
545 return;
546 }
547 }
548 detibig[k]=det;
549 for(i=0; i<fNvar; i++) {
550 mstockbig(k,i)=fMean(i);
551 for(j=0; j<fNvar; j++) {
552 cstockbig(i,k*fNvar+j)=fCovariance(i, j);
553 }
554 }
555 }
556 //now for the subset with the smallest determinant
557 //repeat CSteps until convergence
558 Int_t minind=TMath::LocMin(nbestsub, detibig);
559 det=detibig[minind];
560 for(i=0; i<fNvar; i++) {
561 fMean(i)=mstockbig(minind,i);
562 fHyperplane(i)=hyperplane(minind,i);
563 for(j=0; j<fNvar; j++)
564 fCovariance(i, j)=cstockbig(i,minind*fNvar + j);
565 }
566 if(det<kEps)
567 CreateOrtSubset(fData, index, fH, fN, sscp, ndist);
568 det=1;
569 while (det>kEps) {
570 det=CStep(fN, fH, index, fData, sscp, ndist);
571 if(TMath::Abs(det-detibig[minind])<kEps) {
572 break;
573 } else {
574 detibig[minind]=det;
575 }
576 }
577 if(det<kEps) {
578 Exact(ndist);
580 }
581 Int_t nout = RDist(sscp);
582 Double_t cutoff=kChiQuant[fNvar-1];
583
584 fOut.Set(nout);
585
586 j=0;
587 for (i=0; i<fN; i++) {
588 if(fRd(i)>cutoff) {
589 fOut[j]=i;
590 j++;
591 }
592 }
593
594 delete [] detibig;
595 delete [] deti;
596 delete [] subdat;
597 delete [] ndist;
598 delete [] index;
599 return;
600}
601
602////////////////////////////////////////////////////////////////////////////////
603///for the univariate case
604///estimates of location and scatter are returned in mean and sigma parameters
605///the algorithm works on the same principle as in multivariate case -
606///it finds a subset of size hh with smallest sigma, and then returns mean and
607///sigma of this subset
608
610{
611 if (hh==0)
612 hh=(nvectors+2)/2;
613 Double_t faclts[]={2.6477,2.5092,2.3826,2.2662,2.1587,2.0589,1.9660,1.879,1.7973,1.7203,1.6473};
614 Int_t *index=new Int_t[nvectors];
615 TMath::Sort(nvectors, data, index, kFALSE);
616
617 Int_t nquant;
618 nquant=TMath::Min(Int_t(Double_t(((hh*1./nvectors)-0.5)*40))+1, 11);
619 Double_t factor=faclts[nquant-1];
620
621 Double_t *aw=new Double_t[nvectors];
622 Double_t *aw2=new Double_t[nvectors];
623 Double_t sq=0;
624 Double_t sqmin=0;
625 Int_t ndup=0;
626 Int_t len=nvectors-hh;
627 Double_t *slutn=new Double_t[len];
628 for(Int_t i=0; i<len; i++)
629 slutn[i]=0;
630 for(Int_t jint=0; jint<len; jint++) {
631 aw[jint]=0;
632 for (Int_t j=0; j<hh; j++) {
633 aw[jint]+=data[index[j+jint]];
634 if(jint==0)
635 sq+=data[index[j]]*data[index[j]];
636 }
637 aw2[jint]=aw[jint]*aw[jint]/hh;
638
639 if(jint==0) {
640 sq=sq-aw2[jint];
641 sqmin=sq;
642 slutn[ndup]=aw[jint];
643
644 } else {
645 sq=sq - data[index[jint-1]]*data[index[jint-1]]+
646 data[index[jint+hh]]*data[index[jint+hh]]-
647 aw2[jint]+aw2[jint-1];
648 if(sq<sqmin) {
649 ndup=0;
650 sqmin=sq;
651 slutn[ndup]=aw[jint];
652
653 } else {
654 if(sq==sqmin) {
655 ndup++;
656 slutn[ndup]=aw[jint];
657 }
658 }
659 }
660 }
661
662 slutn[0]=slutn[Int_t((ndup)/2)]/hh;
663 Double_t bstd=factor*TMath::Sqrt(sqmin/hh);
664 mean=slutn[0];
665 sigma=bstd;
666 delete [] aw;
667 delete [] aw2;
668 delete [] slutn;
669 delete [] index;
670}
671
672////////////////////////////////////////////////////////////////////////////////
673///returns the breakdown point of the algorithm
674
676{
677 Int_t n;
678 n=(fN-fH+1)/fN;
679 return n;
680}
681
682////////////////////////////////////////////////////////////////////////////////
683///returns the chi2 quantiles
684
686{
687 if (i < 0 || i >= 50) return 0;
688 return kChiQuant[i];
689}
690
691////////////////////////////////////////////////////////////////////////////////
692///returns the covariance matrix
693
695{
696 if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar){
697 Warning("GetCovariance","provided matrix is of the wrong size, it will be resized");
698 matr.ResizeTo(fNvar, fNvar);
699 }
700 matr=fCovariance;
701}
702
703////////////////////////////////////////////////////////////////////////////////
704///returns the correlation matrix
705
707{
708 if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar) {
709 Warning("GetCorrelation","provided matrix is of the wrong size, it will be resized");
710 matr.ResizeTo(fNvar, fNvar);
711 }
712 matr=fCorrelation;
713}
714
715////////////////////////////////////////////////////////////////////////////////
716///if the points are on a hyperplane, returns this hyperplane
717
719{
720 if (fExact==0) {
721 Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
722 return nullptr;
723 } else {
724 return &fHyperplane;
725 }
726}
727
728////////////////////////////////////////////////////////////////////////////////
729///if the points are on a hyperplane, returns this hyperplane
730
732{
733 if (fExact==0){
734 Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
735 return;
736 }
737 if (vec.GetNoElements()!=fNvar) {
738 Warning("GetHyperPlane","provided vector is of the wrong size, it will be resized");
739 vec.ResizeTo(fNvar);
740 }
742}
743
744////////////////////////////////////////////////////////////////////////////////
745///return the estimate of the mean
746
748{
749 if (means.GetNoElements()!=fNvar) {
750 Warning("GetMean","provided vector is of the wrong size, it will be resized");
751 means.ResizeTo(fNvar);
752 }
753 means=fMean;
754}
755
756////////////////////////////////////////////////////////////////////////////////
757///returns the robust distances (helps to find outliers)
758
760{
761 if (rdist.GetNoElements()!=fN) {
762 Warning("GetRDistances","provided vector is of the wrong size, it will be resized");
763 rdist.ResizeTo(fN);
764 }
765 rdist=fRd;
766}
767
768////////////////////////////////////////////////////////////////////////////////
769///returns the number of outliers
770
772{
773 return fOut.GetSize();
774}
775
776////////////////////////////////////////////////////////////////////////////////
777///update the sscp matrix with vector vec
778
780{
781 Int_t i, j;
782 for (j=1; j<fNvar+1; j++) {
783 sscp(0, j) +=vec(j-1);
784 sscp(j, 0) = sscp(0, j);
785 }
786 for (i=1; i<fNvar+1; i++) {
787 for (j=1; j<fNvar+1; j++) {
788 sscp(i, j) += vec(i-1)*vec(j-1);
789 }
790 }
791}
792
793////////////////////////////////////////////////////////////////////////////////
794///clear the sscp matrix, used for covariance and mean calculation
795
797{
798 for (Int_t i=0; i<fNvar+1; i++) {
799 for (Int_t j=0; j<fNvar+1; j++) {
800 sscp(i, j)=0;
801 }
802 }
803}
804
805////////////////////////////////////////////////////////////////////////////////
806///called when h=n. Returns classic covariance matrix
807///and mean
808
810{
811 TMatrixD sscp(fNvar+1, fNvar+1);
812 TVectorD temp(fNvar);
813 ClearSscp(sscp);
814 for (Int_t i=0; i<fN; i++) {
815 for (Int_t j=0; j<fNvar; j++)
816 temp(j)=fData(i, j);
817 AddToSscp(sscp, temp);
818 }
819 Covar(sscp, fMean, fCovariance, fSd, fN);
820 Correl();
821
822}
823
824////////////////////////////////////////////////////////////////////////////////
825///calculates mean and covariance
826
828{
829 Int_t i, j;
830 Double_t f;
831 for (i=0; i<fNvar; i++) {
832 m(i)=sscp(0, i+1);
833 sd[i]=sscp(i+1, i+1);
834 f=(sd[i]-m(i)*m(i)/nvec)/(nvec-1);
835 if (f>1e-14) sd[i]=TMath::Sqrt(f);
836 else sd[i]=0;
837 m(i)/=nvec;
838 }
839 for (i=0; i<fNvar; i++) {
840 for (j=0; j<fNvar; j++) {
841 cov(i, j)=sscp(i+1, j+1)-nvec*m(i)*m(j);
842 cov(i, j)/=nvec-1;
843 }
844 }
845}
846
847////////////////////////////////////////////////////////////////////////////////
848///transforms covariance matrix into correlation matrix
849
851{
852 Int_t i, j;
853 Double_t *sd=new Double_t[fNvar];
854 for(j=0; j<fNvar; j++)
855 sd[j]=1./TMath::Sqrt(fCovariance(j, j));
856 for(i=0; i<fNvar; i++) {
857 for (j=0; j<fNvar; j++) {
858 if (i==j)
859 fCorrelation(i, j)=1.;
860 else
861 fCorrelation(i, j)=fCovariance(i, j)*sd[i]*sd[j];
862 }
863 }
864 delete [] sd;
865}
866
867////////////////////////////////////////////////////////////////////////////////
868///creates a subset of htotal elements from ntotal elements
869///first, p+1 elements are drawn randomly(without repetitions)
870///if their covariance matrix is singular, more elements are
871///added one by one, until their covariance matrix becomes regular
872///or it becomes clear that htotal observations lie on a hyperplane
873///If covariance matrix determinant!=0, distances of all ntotal elements
874///are calculated, using formula d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where
875///M is mean and S_inv is the inverse of the covariance matrix
876///htotal points with smallest distances are included in the returned subset.
877
879{
880 Double_t kEps = 1e-14;
881 Int_t i, j;
882 Bool_t repeat=kFALSE;
883 Int_t nindex=0;
884 Int_t num;
885 for(i=0; i<ntotal; i++)
886 index[i]=ntotal+1;
887
888 for (i=0; i<p+1; i++) {
889 num=Int_t(gRandom->Uniform(0, 1)*(ntotal-1));
890 if (i>0){
891 for(j=0; j<=i-1; j++) {
892 if(index[j]==num)
893 repeat=kTRUE;
894 }
895 }
896 if(repeat==kTRUE) {
897 i--;
898 repeat=kFALSE;
899 } else {
900 index[i]=num;
901 nindex++;
902 }
903 }
904
905 ClearSscp(sscp);
906
908 Double_t det;
909 for (i=0; i<p+1; i++) {
910 for (j=0; j<fNvar; j++) {
911 vec[j]=data[index[i]][j];
912
913 }
914 AddToSscp(sscp, vec);
915 }
916
917 Covar(sscp, fMean, fCovariance, fSd, p+1);
919 while((det<kEps)&&(nindex < htotal)) {
920 //if covariance matrix is singular,another vector is added until
921 //the matrix becomes regular or it becomes clear that all
922 //vectors of the group lie on a hyperplane
923 repeat=kFALSE;
924 do{
925 num=Int_t(gRandom->Uniform(0,1)*(ntotal-1));
926 repeat=kFALSE;
927 for(i=0; i<nindex; i++) {
928 if(index[i]==num) {
929 repeat=kTRUE;
930 break;
931 }
932 }
933 }while(repeat==kTRUE);
934
935 index[nindex]=num;
936 nindex++;
937 //check if covariance matrix is singular
938 for(j=0; j<fNvar; j++)
939 vec[j]=data[index[nindex-1]][j];
940 AddToSscp(sscp, vec);
941 Covar(sscp, fMean, fCovariance, fSd, nindex);
943 }
944
945 if(nindex!=htotal) {
947 fInvcovariance = chol.Invert();
948
949 TVectorD temp(fNvar);
950 for(j=0; j<ntotal; j++) {
951 ndist[j]=0;
952 for(i=0; i<fNvar; i++)
953 temp[i]=data[j][i] - fMean(i);
954 temp*=fInvcovariance;
955 for(i=0; i<fNvar; i++)
956 ndist[j]+=(data[j][i]-fMean(i))*temp[i];
957 }
958 KOrdStat(ntotal, ndist, htotal-1,index);
959 }
960
961}
962
963////////////////////////////////////////////////////////////////////////////////
964///creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane
965///hyp[1]*(x1-mean[1])+...+hyp[nvar]*(xnvar-mean[nvar])=0
966///This function is called in case when less than fH samples lie on a hyperplane.
967
969{
970 Int_t i, j;
972 for (i=0; i<nmerged; i++) {
973 ndist[i]=0;
974 for(j=0; j<fNvar; j++) {
975 ndist[i]+=fHyperplane[j]*(dat[i][j]-fMean[j]);
976 ndist[i]=TMath::Abs(ndist[i]);
977 }
978 }
979 KOrdStat(nmerged, ndist, hmerged-1, index);
980 ClearSscp(sscp);
981 for (i=0; i<hmerged; i++) {
982 for(j=0; j<fNvar; j++)
983 vec[j]=dat[index[i]][j];
984 AddToSscp(sscp, vec);
985 }
986 Covar(sscp, fMean, fCovariance, fSd, hmerged);
987}
988
989////////////////////////////////////////////////////////////////////////////////
990///from the input htotal-subset constructs another htotal subset with lower determinant
991///
992///As proven by Peter J.Rousseeuw and Katrien Van Driessen, if distances for all elements
993///are calculated, using the formula:d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where M is the mean
994///of the input htotal-subset, and S_inv - the inverse of its covariance matrix, then
995///htotal elements with smallest distances will have covariance matrix with determinant
996///less or equal to the determinant of the input subset covariance matrix.
997///
998///determinant for this htotal-subset with smallest distances is returned
999
1001{
1002 Int_t i, j;
1004 Double_t det;
1005
1007 fInvcovariance = chol.Invert();
1008
1009 TVectorD temp(fNvar);
1010 for(j=0; j<ntotal; j++) {
1011 ndist[j]=0;
1012 for(i=0; i<fNvar; i++)
1013 temp[i]=data[j][i]-fMean[i];
1014 temp*=fInvcovariance;
1015 for(i=0; i<fNvar; i++)
1016 ndist[j]+=(data[j][i]-fMean[i])*temp[i];
1017 }
1018
1019 //taking h smallest
1020 KOrdStat(ntotal, ndist, htotal-1, index);
1021 //writing their mean and covariance
1022 ClearSscp(sscp);
1023 for (i=0; i<htotal; i++) {
1024 for (j=0; j<fNvar; j++)
1025 temp[j]=data[index[i]][j];
1026 AddToSscp(sscp, temp);
1027 }
1028 Covar(sscp, fMean, fCovariance, fSd, htotal);
1029 det = fCovariance.Determinant();
1030 return det;
1031}
1032
1033////////////////////////////////////////////////////////////////////////////////
1034///for the exact fit situations
1035///returns number of observations on the hyperplane
1036
1038{
1039 Int_t i, j;
1040
1042 TMatrixD eigenMatrix=eigen.GetEigenVectors();
1043
1044 for (j=0; j<fNvar; j++) {
1045 fHyperplane[j]=eigenMatrix(j,fNvar-1);
1046 }
1047 //calculate and return how many observations lie on the hyperplane
1048 for (i=0; i<fN; i++) {
1049 ndist[i]=0;
1050 for(j=0; j<fNvar; j++) {
1051 ndist[i]+=fHyperplane[j]*(fData[i][j]-fMean[j]);
1052 ndist[i]=TMath::Abs(ndist[i]);
1053 }
1054 }
1055 Int_t nhyp=0;
1056
1057 for (i=0; i<fN; i++) {
1058 if(ndist[i] < 1e-14) nhyp++;
1059 }
1060 return nhyp;
1061
1062}
1063
1064////////////////////////////////////////////////////////////////////////////////
1065///This function is called if determinant of the covariance matrix of a subset=0.
1066///
1067///If there are more then fH vectors on a hyperplane,
1068///returns this hyperplane and stops
1069///else stores the hyperplane coordinates in hyperplane matrix
1070
1071Int_t TRobustEstimator::Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane,
1072 Double_t *deti, Int_t nbest, Int_t kgroup,
1073 TMatrixD &sscp, Double_t *ndist)
1074{
1075 Int_t i, j;
1076
1078 Int_t maxind = TMath::LocMax(nbest, deti);
1079 Int_t nh=Exact(ndist);
1080 //now nh is the number of observation on the hyperplane
1081 //ndist stores distances of observation from this hyperplane
1082 if(nh>=fH) {
1083 ClearSscp(sscp);
1084 for (i=0; i<fN; i++) {
1085 if(ndist[i]<1e-14) {
1086 for (j=0; j<fNvar; j++)
1087 vec[j]=fData[i][j];
1088 AddToSscp(sscp, vec);
1089 }
1090 }
1091 Covar(sscp, fMean, fCovariance, fSd, nh);
1092
1093 fExact=nh;
1094 return nbest+1;
1095
1096 } else {
1097 //if less than fH observations lie on a hyperplane,
1098 //mean and covariance matrix are stored in mstockbig
1099 //and cstockbig in place of the previous maximum determinant
1100 //mean and covariance
1101 for(i=0; i<fNvar; i++) {
1102 mstockbig(nbest*kgroup+maxind,i)=fMean(i);
1103 hyperplane(nbest*kgroup+maxind,i)=fHyperplane(i);
1104 for(j=0; j<fNvar; j++) {
1105 cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
1106 }
1107 }
1108 return maxind;
1109 }
1110}
1111
1112
1113////////////////////////////////////////////////////////////////////////////////
1114///divides the elements into approximately equal subgroups
1115///number of elements in each subgroup is stored in indsubdat
1116///number of subgroups is returned
1117
1119{
1120 Int_t nsub;
1121 if ((fN>=2*nmini) && (fN<=(3*nmini-1))) {
1122 if (fN%2==1){
1123 indsubdat[0]=Int_t(fN*0.5);
1124 indsubdat[1]=Int_t(fN*0.5)+1;
1125 } else
1126 indsubdat[0]=indsubdat[1]=Int_t(fN/2);
1127 nsub=2;
1128 }
1129 else{
1130 if((fN>=3*nmini) && (fN<(4*nmini -1))) {
1131 if(fN%3==0){
1132 indsubdat[0]=indsubdat[1]=indsubdat[2]=Int_t(fN/3);
1133 } else {
1134 indsubdat[0]=Int_t(fN/3);
1135 indsubdat[1]=Int_t(fN/3)+1;
1136 if (fN%3==1) indsubdat[2]=Int_t(fN/3);
1137 else indsubdat[2]=Int_t(fN/3)+1;
1138 }
1139 nsub=3;
1140 }
1141 else{
1142 if((fN>=4*nmini)&&(fN<=(5*nmini-1))){
1143 if (fN%4==0) indsubdat[0]=indsubdat[1]=indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1144 else {
1145 indsubdat[0]=Int_t(fN/4);
1146 indsubdat[1]=Int_t(fN/4)+1;
1147 if(fN%4==1) indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1148 if(fN%4==2) {
1149 indsubdat[2]=Int_t(fN/4)+1;
1150 indsubdat[3]=Int_t(fN/4);
1151 }
1152 if(fN%4==3) indsubdat[2]=indsubdat[3]=Int_t(fN/4)+1;
1153 }
1154 nsub=4;
1155 } else {
1156 for(Int_t i=0; i<5; i++)
1157 indsubdat[i]=nmini;
1158 nsub=5;
1159 }
1160 }
1161 }
1162 return nsub;
1163}
1164
1165////////////////////////////////////////////////////////////////////////////////
1166///Calculates robust distances.Then the samples with robust distances
1167///greater than a cutoff value (0.975 quantile of chi2 distribution with
1168///fNvar degrees of freedom, multiplied by a correction factor), are given
1169///weiht=0, and new, reweighted estimates of location and scatter are calculated
1170///The function returns the number of outliers.
1171
1173{
1174 Int_t i, j;
1175 Int_t nout=0;
1176
1177 TVectorD temp(fNvar);
1179 fInvcovariance = chol.Invert();
1180
1181
1182 for (i=0; i<fN; i++) {
1183 fRd[i]=0;
1184 for(j=0; j<fNvar; j++) {
1185 temp[j]=fData[i][j]-fMean[j];
1186 }
1187 temp*=fInvcovariance;
1188 for(j=0; j<fNvar; j++) {
1189 fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1190 }
1191 }
1192
1193 Double_t med;
1194 Double_t chi = kChiMedian[fNvar-1];
1195
1197 med/=chi;
1198 fCovariance*=med;
1199 TDecompChol chol2(fCovariance);
1200 fInvcovariance = chol2.Invert();
1201
1202 for (i=0; i<fN; i++) {
1203 fRd[i]=0;
1204 for(j=0; j<fNvar; j++) {
1205 temp[j]=fData[i][j]-fMean[j];
1206 }
1207
1208 temp*=fInvcovariance;
1209 for(j=0; j<fNvar; j++) {
1210 fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1211 }
1212 }
1213
1214 Double_t cutoff = kChiQuant[fNvar-1];
1215
1216 ClearSscp(sscp);
1217 for(i=0; i<fN; i++) {
1218 if (fRd[i]<=cutoff) {
1219 for(j=0; j<fNvar; j++)
1220 temp[j]=fData[i][j];
1221 AddToSscp(sscp,temp);
1222 } else {
1223 nout++;
1224 }
1225 }
1226
1227 Covar(sscp, fMean, fCovariance, fSd, fN-nout);
1228 return nout;
1229}
1230
1231////////////////////////////////////////////////////////////////////////////////
1232///Draws ngroup nonoverlapping subdatasets out of a dataset of size n
1233///such that the selected case numbers are uniformly distributed from 1 to n
1234
1235void TRobustEstimator::RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
1236{
1237 Int_t jndex = 0;
1238 Int_t nrand;
1239 Int_t i, k, m, j;
1240 for (k=1; k<=ngroup; k++) {
1241 for (m=1; m<=indsubdat[k-1]; m++) {
1242 nrand = Int_t(gRandom->Uniform(0, 1) * double(fN-jndex))+1;
1243 //printf("nrand = %d - jndex %d\n",nrand,jndex);
1244 jndex++;
1245 if (jndex==1) {
1246 subdat[0]=nrand-1; // in case nrand is equal to fN
1247 } else {
1248 subdat[jndex-1]=nrand+jndex-2;
1249 for (i=1; i<=jndex-1; i++) {
1250 if(subdat[i-1] > nrand+i-2) {
1251 for(j=jndex; j>=i+1; j--) {
1252 subdat[j-1]=subdat[j-2];
1253 }
1254 //printf("subdata[] i = %d - nrand %d\n",i,nrand);
1255 subdat[i-1]=nrand+i-2;
1256 break; //breaking the loop for(i=1...
1257 }
1258 }
1259 }
1260 }
1261 }
1262}
1263
1264////////////////////////////////////////////////////////////////////////////////
1265///because I need an Int_t work array
1266
1268 Bool_t isAllocated = kFALSE;
1269 const Int_t kWorkMax=100;
1270 Int_t i, ir, j, l, mid;
1271 Int_t arr;
1272 Int_t *ind;
1273 Int_t workLocal[kWorkMax];
1274 Int_t temp;
1275
1276
1277 if (work) {
1278 ind = work;
1279 } else {
1280 ind = workLocal;
1281 if (ntotal > kWorkMax) {
1282 isAllocated = kTRUE;
1283 ind = new Int_t[ntotal];
1284 }
1285 }
1286
1287 for (Int_t ii=0; ii<ntotal; ii++) {
1288 ind[ii]=ii;
1289 }
1290 Int_t rk = k;
1291 l=0;
1292 ir = ntotal-1;
1293 for(;;) {
1294 if (ir<=l+1) { //active partition contains 1 or 2 elements
1295 if (ir == l+1 && a[ind[ir]]<a[ind[l]])
1296 {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1297 Double_t tmp = a[ind[rk]];
1298 if (isAllocated)
1299 delete [] ind;
1300 return tmp;
1301 } else {
1302 mid = (l+ir) >> 1; //choose median of left, center and right
1303 {temp = ind[mid]; ind[mid]=ind[l+1]; ind[l+1]=temp;}//elements as partitioning element arr.
1304 if (a[ind[l]]>a[ind[ir]]) //also rearrange so that a[l]<=a[l+1]
1305 {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1306
1307 if (a[ind[l+1]]>a[ind[ir]])
1308 {temp=ind[l+1]; ind[l+1]=ind[ir]; ind[ir]=temp;}
1309
1310 if (a[ind[l]]>a[ind[l+1]])
1311 {temp = ind[l]; ind[l]=ind[l+1]; ind[l+1]=temp;}
1312
1313 i=l+1; //initialize pointers for partitioning
1314 j=ir;
1315 arr = ind[l+1];
1316 for (;;) {
1317 do i++; while (a[ind[i]]<a[arr]);
1318 do j--; while (a[ind[j]]>a[arr]);
1319 if (j<i) break; //pointers crossed, partitioning complete
1320 {temp=ind[i]; ind[i]=ind[j]; ind[j]=temp;}
1321 }
1322 ind[l+1]=ind[j];
1323 ind[j]=arr;
1324 if (j>=rk) ir = j-1; //keep active the partition that
1325 if (j<=rk) l=i; //contains the k_th element
1326 }
1327 }
1328}
#define f(i)
Definition RSha256.hxx:104
#define a(i)
Definition RSha256.hxx:99
#define e(i)
Definition RSha256.hxx:103
int Int_t
Definition RtypesCore.h:45
constexpr Bool_t kFALSE
Definition RtypesCore.h:101
double Double_t
Definition RtypesCore.h:59
constexpr Bool_t kTRUE
Definition RtypesCore.h:100
#define ClassImp(name)
Definition Rtypes.h:377
#define R__ASSERT(e)
Definition TError.h:118
winID h TVirtualViewer3D TVirtualGLPainter p
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t index
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
R__EXTERN TRandom * gRandom
Definition TRandom.h:62
const Double_t kChiMedian[50]
const Double_t kChiQuant[50]
void Set(Int_t n) override
Set size of this array to n ints.
Definition TArrayI.cxx:105
Int_t GetSize() const
Definition TArray.h:47
Cholesky Decomposition class.
Definition TDecompChol.h:25
Bool_t Invert(TMatrixDSym &inv)
For a symmetric matrix A(m,m), its inverse A_inv(m,m) is returned .
TMatrixDSymEigen.
const TMatrixD & GetEigenVectors() const
Int_t GetNrows() const
Int_t GetNcols() const
TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1) override
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
Double_t Determinant() const override
TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1) override
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
virtual void Warning(const char *method, const char *msgfmt,...) const
Issue warning message.
Definition TObject.cxx:962
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition TObject.cxx:976
virtual Double_t Uniform(Double_t x1=1)
Returns a uniform deviate on the interval (0, x1).
Definition TRandom.cxx:682
Int_t RDist(TMatrixD &sscp)
Calculates robust distances.Then the samples with robust distances greater than a cutoff value (0....
TMatrixDSym fCovariance
Double_t CStep(Int_t ntotal, Int_t htotal, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
from the input htotal-subset constructs another htotal subset with lower determinant
const TMatrixDSym * GetCovariance() const
TMatrixDSym fInvcovariance
void CreateSubset(Int_t ntotal, Int_t htotal, Int_t p, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
creates a subset of htotal elements from ntotal elements first, p+1 elements are drawn randomly(witho...
void Covar(TMatrixD &sscp, TVectorD &m, TMatrixDSym &cov, TVectorD &sd, Int_t nvec)
calculates mean and covariance
Int_t Exact(Double_t *ndist)
for the exact fit situations returns number of observations on the hyperplane
Double_t KOrdStat(Int_t ntotal, Double_t *arr, Int_t k, Int_t *work)
because I need an Int_t work array
Int_t GetNOut()
returns the number of outliers
TRobustEstimator()
this constructor should be used in a univariate case: first call this constructor,...
void Correl()
transforms covariance matrix into correlation matrix
void AddColumn(Double_t *col)
adds a column to the data matrix it is assumed that the column has size fN variable fVarTemp keeps th...
void RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
Draws ngroup nonoverlapping subdatasets out of a dataset of size n such that the selected case number...
void Classic()
called when h=n.
void Evaluate()
Finds the estimate of multivariate mean and variance.
const TVectorD * GetMean() const
Int_t Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane, Double_t *deti, Int_t nbest, Int_t kgroup, TMatrixD &sscp, Double_t *ndist)
This function is called if determinant of the covariance matrix of a subset=0.
const TVectorD * GetRDistances() const
Int_t GetBDPoint()
returns the breakdown point of the algorithm
void CreateOrtSubset(TMatrixD &dat, Int_t *index, Int_t hmerged, Int_t nmerged, TMatrixD &sscp, Double_t *ndist)
creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane hyp[1]*(x1-m...
void AddRow(Double_t *row)
adds a vector to the data matrix it is supposed that the vector is of size fNvar
TMatrixDSym fCorrelation
const TVectorD * GetHyperplane() const
if the points are on a hyperplane, returns this hyperplane
Int_t Partition(Int_t nmini, Int_t *indsubdat)
divides the elements into approximately equal subgroups number of elements in each subgroup is stored...
const TMatrixDSym * GetCorrelation() const
Double_t GetChiQuant(Int_t i) const
returns the chi2 quantiles
void EvaluateUni(Int_t nvectors, Double_t *data, Double_t &mean, Double_t &sigma, Int_t hh=0)
for the univariate case estimates of location and scatter are returned in mean and sigma parameters t...
void AddToSscp(TMatrixD &sscp, TVectorD &vec)
update the sscp matrix with vector vec
void ClearSscp(TMatrixD &sscp)
clear the sscp matrix, used for covariance and mean calculation
TVectorT< Element > & ResizeTo(Int_t lwb, Int_t upb)
Resize the vector to [lwb:upb] .
Definition TVectorT.cxx:294
Int_t GetNoElements() const
Definition TVectorT.h:74
Element * GetMatrixArray()
Definition TVectorT.h:76
const Double_t sigma
const Int_t n
Definition legend1.C:16
Long64_t LocMin(Long64_t n, const T *a)
Returns index of array with the minimum element.
Definition TMath.h:982
Double_t Median(Long64_t n, const T *a, const Double_t *w=nullptr, Long64_t *work=nullptr)
Same as RMS.
Definition TMath.h:1272
Long64_t LocMax(Long64_t n, const T *a)
Returns index of array with the maximum element.
Definition TMath.h:1012
Double_t Sqrt(Double_t x)
Returns the square root of x.
Definition TMath.h:662
Short_t Min(Short_t a, Short_t b)
Returns the smallest of a and b.
Definition TMathBase.h:198
void Sort(Index n, const Element *a, Index *index, Bool_t down=kTRUE)
Sort the n elements of the array a of generic templated type Element.
Definition TMathBase.h:431
Short_t Abs(Short_t d)
Returns the absolute value of parameter Short_t d.
Definition TMathBase.h:123
TMarker m
Definition textangle.C:8
TLine l
Definition textangle.C:4
static uint64_t sum(uint64_t i)
Definition Factory.cxx:2345