ROOT  6.07/01
Reference Guide
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
TRobustEstimator.cxx
Go to the documentation of this file.
1 // @(#)root/physics:$Id$
2 // Author: Anna Kreshuk 08/10/2004
3 
4 /*************************************************************************
5  * Copyright (C) 1995-2004, Rene Brun and Fons Rademakers. *
6  * All rights reserved. *
7  * *
8  * For the licensing terms see $ROOTSYS/LICENSE. *
9  * For the list of contributors see $ROOTSYS/README/CREDITS. *
10  *************************************************************************/
11 
12 //////////////////////////////////////////////////////////////////////////////
13 //
14 // TRobustEstimator
15 //
16 // Minimum Covariance Determinant Estimator - a Fast Algorithm
17 // invented by Peter J.Rousseeuw and Katrien Van Dreissen
18 // "A Fast Algorithm for the Minimum covariance Determinant Estimator"
19 // Technometrics, August 1999, Vol.41, NO.3
20 //
21 // What are robust estimators?
22 // "An important property of an estimator is its robustness. An estimator
23 // is called robust if it is insensitive to measurements that deviate
24 // from the expected behaviour. There are 2 ways to treat such deviating
25 // measurements: one may either try to recongize them and then remove
26 // them from the data sample; or one may leave them in the sample, taking
27 // care that they do not influence the estimate unduly. In both cases robust
28 // estimators are needed...Robust procedures compensate for systematic errors
29 // as much as possible, and indicate any situation in which a danger of not being
30 // able to operate reliably is detected."
31 // R.Fruhwirth, M.Regler, R.K.Bock, H.Grote, D.Notz
32 // "Data Analysis Techniques for High-Energy Physics", 2nd edition
33 //
34 // What does this algorithm do?
35 // It computes a highly robust estimator of multivariate location and scatter.
36 // Then, it takes those estimates to compute robust distances of all the
37 // data vectors. Those with large robust distances are considered outliers.
38 // Robust distances can then be plotted for better visualization of the data.
39 //
40 // How does this algorithm do it?
41 // The MCD objective is to find h observations(out of n) whose classical
42 // covariance matrix has the lowest determinant. The MCD estimator of location
43 // is then the average of those h points and the MCD estimate of scatter
44 // is their covariance matrix. The minimum(and default) h = (n+nvariables+1)/2
45 // so the algorithm is effective when less than (n+nvar+1)/2 variables are outliers.
46 // The algorithm also allows for exact fit situations - that is, when h or more
47 // observations lie on a hyperplane. Then the algorithm still yields the MCD location T
48 // and scatter matrix S, the latter being singular as it should be. From (T,S) the
49 // program then computes the equation of the hyperplane.
50 //
51 // How can this algorithm be used?
52 // In any case, when contamination of data is suspected, that might influence
53 // the classical estimates.
54 // Also, robust estimation of location and scatter is a tool to robustify
55 // other multivariate techniques such as, for example, principal-component analysis
56 // and discriminant analysis.
57 //
58 //
59 //
60 //
61 // Technical details of the algorithm:
62 // 0.The default h = (n+nvariables+1)/2, but the user may choose any interger h with
63 // (n+nvariables+1)/2<=h<=n. The program then reports the MCD's breakdown value
64 // (n-h+1)/n. If you are sure that the dataset contains less than 25% contamination
65 // which is usually the case, a good compromise between breakdown value and
66 // efficiency is obtained by putting h=[.75*n].
67 // 1.If h=n,the MCD location estimate is the average of the whole dataset, and
68 // the MCD scatter estimate is its covariance matrix. Report this and stop
69 // 2.If nvariables=1 (univariate data), compute the MCD estimate by the exact
70 // algorithm of Rousseeuw and Leroy (1987, pp.171-172) in O(nlogn)time and stop
71 // 3.From here on, h<n and nvariables>=2.
72 // 3a.If n is small:
73 // - repeat (say) 500 times:
74 // -- construct an initial h-subset, starting from a random (nvar+1)-subset
75 // -- carry out 2 C-steps (described in the comments of CStep function)
76 // - for the 10 results with lowest det(S):
77 // -- carry out C-steps until convergence
78 // - report the solution (T, S) with the lowest det(S)
79 // 3b.If n is larger (say, n>600), then
80 // - construct up to 5 disjoint random subsets of size nsub (say, nsub=300)
81 // - inside each subset repeat 500/5 times:
82 // -- construct an initial subset of size hsub=[nsub*h/n]
83 // -- carry out 2 C-steps
84 // -- keep the best 10 results (Tsub, Ssub)
85 // - pool the subsets, yielding the merged set (say, of size nmerged=1500)
86 // - in the merged set, repeat for each of the 50 solutions (Tsub, Ssub)
87 // -- carry out 2 C-steps
88 // -- keep the 10 best results
89 // - in the full dataset, repeat for those best results:
90 // -- take several C-steps, using n and h
91 // -- report the best final result (T, S)
92 // 4.To obtain consistency when the data comes from a multivariate normal
93 // distribution, covariance matrix is multiplied by a correction factor
94 // 5.Robust distances for all elements, using the final (T, S) are calculated
95 // Then the very final mean and covariance estimates are calculated only for
96 // values, whose robust distances are less than a cutoff value (0.975 quantile
97 // of chi2 distribution with nvariables degrees of freedom)
98 //
99 //////////////////////////////////////////////////////////////////////////////
100 
101 #include "TRobustEstimator.h"
102 #include "TRandom.h"
103 #include "TMath.h"
104 #include "TH1D.h"
105 #include "TPaveLabel.h"
106 #include "TDecompChol.h"
107 
109 
110 const Double_t kChiMedian[50]= {
111  0.454937, 1.38629, 2.36597, 3.35670, 4.35146, 5.34812, 6.34581, 7.34412, 8.34283,
112  9.34182, 10.34, 11.34, 12.34, 13.34, 14.34, 15.34, 16.34, 17.34, 18.34, 19.34,
113  20.34, 21.34, 22.34, 23.34, 24.34, 25.34, 26.34, 27.34, 28.34, 29.34, 30.34,
114  31.34, 32.34, 33.34, 34.34, 35.34, 36.34, 37.34, 38.34, 39.34, 40.34,
115  41.34, 42.34, 43.34, 44.34, 45.34, 46.34, 47.34, 48.34, 49.33};
116 
117 const Double_t kChiQuant[50]={
118  5.02389, 7.3776,9.34840,11.1433,12.8325,
119  14.4494,16.0128,17.5346,19.0228,20.4831,21.920,23.337,
120  24.736,26.119,27.488,28.845,30.191,31.526,32.852,34.170,
121  35.479,36.781,38.076,39.364,40.646,41.923,43.194,44.461,
122  45.722,46.979,48.232,49.481,50.725,51.966,53.203,54.437,
123  55.668,56.896,58.120,59.342,60.561,61.777,62.990,64.201,
124  65.410,66.617,67.821,69.022,70.222,71.420};
125 
126 ////////////////////////////////////////////////////////////////////////////////
127 ///this constructor should be used in a univariate case:
128 ///first call this constructor, then - the EvaluateUni(..) function
129 
131 }
132 
133 ////////////////////////////////////////////////////////////////////////////////
134 ///constructor
135 
137  :fMean(nvariables),
138  fCovariance(nvariables),
139  fInvcovariance(nvariables),
140  fCorrelation(nvariables),
141  fRd(nvectors),
142  fSd(nvectors),
143  fOut(1),
144  fHyperplane(nvariables),
145  fData(nvectors, nvariables)
146 {
147  if ((nvectors<=1)||(nvariables<=0)){
148  Error("TRobustEstimator","Not enough vectors or variables");
149  return;
150  }
151  if (nvariables==1){
152  Error("TRobustEstimator","For the univariate case, use the default constructor and EvaluateUni() function");
153  return;
154  }
155 
156  fN=nvectors;
157  fNvar=nvariables;
158  if (hh<(fN+fNvar+1)/2){
159  if (hh>0)
160  Warning("TRobustEstimator","chosen h is too small, default h is taken instead");
161  fH=(fN+fNvar+1)/2;
162  } else
163  fH=hh;
164 
165  fVarTemp=0;
166  fVecTemp=0;
167  fExact=0;
168 }
169 
170 ////////////////////////////////////////////////////////////////////////////////
171 ///adds a column to the data matrix
172 ///it is assumed that the column has size fN
173 ///variable fVarTemp keeps the number of columns l
174 ///already added
175 
177 {
178  if (fVarTemp==fNvar) {
179  fNvar++;
186  }
187  for (Int_t i=0; i<fN; i++) {
188  fData(i, fVarTemp)=col[i];
189  }
190  fVarTemp++;
191 }
192 
193 ////////////////////////////////////////////////////////////////////////////////
194 ///adds a vector to the data matrix
195 ///it is supposed that the vector is of size fNvar
196 
198 {
199  if(fVecTemp==fN) {
200  fN++;
201  fRd.ResizeTo(fN);
202  fSd.ResizeTo(fN);
204  }
205  for (Int_t i=0; i<fNvar; i++)
206  fData(fVecTemp, i)=row[i];
207 
208  fVecTemp++;
209 }
210 
211 ////////////////////////////////////////////////////////////////////////////////
212 ///Finds the estimate of multivariate mean and variance
213 
215 {
216  Double_t kEps=1e-14;
217 
218  if (fH==fN){
219  Warning("Evaluate","Chosen h = #observations, so classic estimates of location and scatter will be calculated");
220  Classic();
221  return;
222  }
223 
224  Int_t i, j, k;
225  Int_t ii, jj;
226  Int_t nmini = 300;
227  Int_t k1=500;
228  Int_t nbest=10;
229  TMatrixD sscp(fNvar+1, fNvar+1);
230  TVectorD vec(fNvar);
231 
232  Int_t *index = new Int_t[fN];
233  Double_t *ndist = new Double_t[fN];
234  Double_t det;
235  Double_t *deti=new Double_t[nbest];
236  for (i=0; i<nbest; i++)
237  deti[i]=1e16;
238 
239  for (i=0; i<fN; i++)
240  fRd(i)=0;
241  ////////////////////////////
242  //for small n
243  ////////////////////////////
244  if (fN<nmini*2) {
245  //for storing the best fMeans and covariances
246 
247  TMatrixD mstock(nbest, fNvar);
248  TMatrixD cstock(fNvar, fNvar*nbest);
249 
250  for (k=0; k<k1; k++) {
251  CreateSubset(fN, fH, fNvar, index, fData, sscp, ndist);
252  //calculate the mean and covariance of the created subset
253  ClearSscp(sscp);
254  for (i=0; i<fH; i++) {
255  for(j=0; j<fNvar; j++)
256  vec(j)=fData[index[i]][j];
257  AddToSscp(sscp, vec);
258  }
259  Covar(sscp, fMean, fCovariance, fSd, fH);
260  det = fCovariance.Determinant();
261  if (det < kEps) {
262  fExact = Exact(ndist);
263  delete [] index;
264  delete [] ndist;
265  delete [] deti;
266  return;
267  }
268  //make 2 CSteps
269  det = CStep(fN, fH, index, fData, sscp, ndist);
270  if (det < kEps) {
271  fExact = Exact(ndist);
272  delete [] index;
273  delete [] ndist;
274  delete [] deti;
275  return;
276  }
277  det = CStep(fN, fH, index, fData, sscp, ndist);
278  if (det < kEps) {
279  fExact = Exact(ndist);
280  delete [] index;
281  delete [] ndist;
282  delete [] deti;
283  return;
284  } else {
285  Int_t maxind=TMath::LocMax(nbest, deti);
286  if(det<deti[maxind]) {
287  deti[maxind]=det;
288  for(ii=0; ii<fNvar; ii++) {
289  mstock(maxind, ii)=fMean(ii);
290  for(jj=0; jj<fNvar; jj++)
291  cstock(ii, jj+maxind*fNvar)=fCovariance(ii, jj);
292  }
293  }
294  }
295  }
296 
297  //now for nbest best results perform CSteps until convergence
298 
299  for (i=0; i<nbest; i++) {
300  for(ii=0; ii<fNvar; ii++) {
301  fMean(ii)=mstock(i, ii);
302  for (jj=0; jj<fNvar; jj++)
303  fCovariance(ii, jj)=cstock(ii, jj+i*fNvar);
304  }
305 
306  det=1;
307  while (det>kEps) {
308  det=CStep(fN, fH, index, fData, sscp, ndist);
309  if(TMath::Abs(det-deti[i])<kEps)
310  break;
311  else
312  deti[i]=det;
313  }
314  for(ii=0; ii<fNvar; ii++) {
315  mstock(i,ii)=fMean(ii);
316  for (jj=0; jj<fNvar; jj++)
317  cstock(ii,jj+i*fNvar)=fCovariance(ii, jj);
318  }
319  }
320 
321  Int_t detind=TMath::LocMin(nbest, deti);
322  for(ii=0; ii<fNvar; ii++) {
323  fMean(ii)=mstock(detind,ii);
324 
325  for(jj=0; jj<fNvar; jj++)
326  fCovariance(ii, jj)=cstock(ii,jj+detind*fNvar);
327  }
328 
329  if (deti[detind]!=0) {
330  //calculate robust distances and throw out the bad points
331  Int_t nout = RDist(sscp);
332  Double_t cutoff=kChiQuant[fNvar-1];
333 
334  fOut.Set(nout);
335 
336  j=0;
337  for (i=0; i<fN; i++) {
338  if(fRd(i)>cutoff) {
339  fOut[j]=i;
340  j++;
341  }
342  }
343 
344  } else {
345  fExact=Exact(ndist);
346  }
347  delete [] index;
348  delete [] ndist;
349  delete [] deti;
350  return;
351 
352  }
353  /////////////////////////////////////////////////
354  //if n>nmini, the dataset should be partitioned
355  //partitioning
356  ////////////////////////////////////////////////
357  Int_t indsubdat[5];
358  Int_t nsub;
359  for (ii=0; ii<5; ii++)
360  indsubdat[ii]=0;
361 
362  nsub = Partition(nmini, indsubdat);
363 
364  Int_t sum=0;
365  for (ii=0; ii<5; ii++)
366  sum+=indsubdat[ii];
367  Int_t *subdat=new Int_t[sum];
368  //printf("allocates subdat[ %d ]\n", sum);
369  // init the subdat matrix
370  for (int iii = 0; iii < sum; ++iii) subdat[iii] = -999;
371  RDraw(subdat, nsub, indsubdat);
372  for (int iii = 0; iii < sum; ++iii) {
373  if (subdat[iii] < 0 || subdat[iii] >= fN ) {
374  Error("Evaluate","subdat index is invalid subdat[%d] = %d",iii, subdat[iii] );
375  R__ASSERT(0);
376  }
377  }
378  //now the indexes of selected cases are in the array subdat
379  //matrices to store best means and covariances
380  Int_t nbestsub=nbest*nsub;
381  TMatrixD mstockbig(nbestsub, fNvar);
382  TMatrixD cstockbig(fNvar, fNvar*nbestsub);
383  TMatrixD hyperplane(nbestsub, fNvar);
384  for (i=0; i<nbestsub; i++) {
385  for(j=0; j<fNvar; j++)
386  hyperplane(i,j)=0;
387  }
388  Double_t *detibig = new Double_t[nbestsub];
389  Int_t maxind;
390  maxind=TMath::LocMax(5, indsubdat);
391  TMatrixD dattemp(indsubdat[maxind], fNvar);
392 
393  Int_t k2=Int_t(k1/nsub);
394  //construct h-subsets and perform 2 CSteps in subgroups
395 
396  for (Int_t kgroup=0; kgroup<nsub; kgroup++) {
397  //printf("group #%d\n", kgroup);
398  Int_t ntemp=indsubdat[kgroup];
399  Int_t temp=0;
400  for (i=0; i<kgroup; i++)
401  temp+=indsubdat[i];
402  Int_t par;
403 
404 
405  for(i=0; i<ntemp; i++) {
406  for (j=0; j<fNvar; j++) {
407  dattemp(i,j)=fData[subdat[temp+i]][j];
408  }
409  }
410  Int_t htemp=Int_t(fH*ntemp/fN);
411 
412  for (i=0; i<nbest; i++)
413  deti[i]=1e16;
414 
415  for(k=0; k<k2; k++) {
416  CreateSubset(ntemp, htemp, fNvar, index, dattemp, sscp, ndist);
417  ClearSscp(sscp);
418  for (i=0; i<htemp; i++) {
419  for(j=0; j<fNvar; j++) {
420  vec(j)=dattemp(index[i],j);
421  }
422  AddToSscp(sscp, vec);
423  }
424  Covar(sscp, fMean, fCovariance, fSd, htemp);
425  det = fCovariance.Determinant();
426  if (det<kEps) {
427  par =Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
428  if(par==nbest+1) {
429 
430  delete [] detibig;
431  delete [] deti;
432  delete [] subdat;
433  delete [] ndist;
434  delete [] index;
435  return;
436  } else
437  deti[par]=det;
438  } else {
439  det = CStep(ntemp, htemp, index, dattemp, sscp, ndist);
440  if (det<kEps) {
441  par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp, ndist);
442  if(par==nbest+1) {
443 
444  delete [] detibig;
445  delete [] deti;
446  delete [] subdat;
447  delete [] ndist;
448  delete [] index;
449  return;
450  } else
451  deti[par]=det;
452  } else {
453  det=CStep(ntemp,htemp, index, dattemp, sscp, ndist);
454  if(det<kEps){
455  par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
456  if(par==nbest+1) {
457 
458  delete [] detibig;
459  delete [] deti;
460  delete [] subdat;
461  delete [] ndist;
462  delete [] index;
463  return;
464  } else {
465  deti[par]=det;
466  }
467  } else {
468  maxind=TMath::LocMax(nbest, deti);
469  if(det<deti[maxind]) {
470  deti[maxind]=det;
471  for(i=0; i<fNvar; i++) {
472  mstockbig(nbest*kgroup+maxind,i)=fMean(i);
473  for(j=0; j<fNvar; j++) {
474  cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
475 
476  }
477  }
478  }
479 
480  }
481  }
482  }
483 
484  maxind=TMath::LocMax(nbest, deti);
485  if (deti[maxind]<kEps)
486  break;
487  }
488 
489 
490  for(i=0; i<nbest; i++) {
491  detibig[kgroup*nbest + i]=deti[i];
492 
493  }
494 
495  }
496 
497  //now the arrays mstockbig and cstockbig store nbest*nsub best means and covariances
498  //detibig stores nbest*nsub their determinants
499  //merge the subsets and carry out 2 CSteps on the merged set for all 50 best solutions
500 
501  TMatrixD datmerged(sum, fNvar);
502  for(i=0; i<sum; i++) {
503  for (j=0; j<fNvar; j++)
504  datmerged(i,j)=fData[subdat[i]][j];
505  }
506  // printf("performing calculations for merged set\n");
507  Int_t hmerged=Int_t(sum*fH/fN);
508 
509  Int_t nh;
510  for(k=0; k<nbestsub; k++) {
511  //for all best solutions perform 2 CSteps and then choose the very best
512  for(ii=0; ii<fNvar; ii++) {
513  fMean(ii)=mstockbig(k,ii);
514  for(jj=0; jj<fNvar; jj++)
515  fCovariance(ii, jj)=cstockbig(ii,k*fNvar+jj);
516  }
517  if(detibig[k]==0) {
518  for(i=0; i<fNvar; i++)
519  fHyperplane(i)=hyperplane(k,i);
520  CreateOrtSubset(datmerged,index, hmerged, sum, sscp, ndist);
521 
522  }
523  det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
524  if (det<kEps) {
525  nh= Exact(ndist);
526  if (nh>=fH) {
527  fExact = nh;
528 
529  delete [] detibig;
530  delete [] deti;
531  delete [] subdat;
532  delete [] ndist;
533  delete [] index;
534  return;
535  } else {
536  CreateOrtSubset(datmerged, index, hmerged, sum, sscp, ndist);
537  }
538  }
539 
540  det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
541  if (det<kEps) {
542  nh=Exact(ndist);
543  if (nh>=fH) {
544  fExact = nh;
545  delete [] detibig;
546  delete [] deti;
547  delete [] subdat;
548  delete [] ndist;
549  delete [] index;
550  return;
551  }
552  }
553  detibig[k]=det;
554  for(i=0; i<fNvar; i++) {
555  mstockbig(k,i)=fMean(i);
556  for(j=0; j<fNvar; j++) {
557  cstockbig(i,k*fNvar+j)=fCovariance(i, j);
558  }
559  }
560  }
561  //now for the subset with the smallest determinant
562  //repeat CSteps until convergence
563  Int_t minind=TMath::LocMin(nbestsub, detibig);
564  det=detibig[minind];
565  for(i=0; i<fNvar; i++) {
566  fMean(i)=mstockbig(minind,i);
567  fHyperplane(i)=hyperplane(minind,i);
568  for(j=0; j<fNvar; j++)
569  fCovariance(i, j)=cstockbig(i,minind*fNvar + j);
570  }
571  if(det<kEps)
572  CreateOrtSubset(fData, index, fH, fN, sscp, ndist);
573  det=1;
574  while (det>kEps) {
575  det=CStep(fN, fH, index, fData, sscp, ndist);
576  if(TMath::Abs(det-detibig[minind])<kEps) {
577  break;
578  } else {
579  detibig[minind]=det;
580  }
581  }
582  if(det<kEps) {
583  Exact(ndist);
584  fExact=kTRUE;
585  }
586  Int_t nout = RDist(sscp);
587  Double_t cutoff=kChiQuant[fNvar-1];
588 
589  fOut.Set(nout);
590 
591  j=0;
592  for (i=0; i<fN; i++) {
593  if(fRd(i)>cutoff) {
594  fOut[j]=i;
595  j++;
596  }
597  }
598 
599  delete [] detibig;
600  delete [] deti;
601  delete [] subdat;
602  delete [] ndist;
603  delete [] index;
604  return;
605 }
606 
607 ////////////////////////////////////////////////////////////////////////////////
608 ///for the univariate case
609 ///estimates of location and scatter are returned in mean and sigma parameters
610 ///the algorithm works on the same principle as in multivariate case -
611 ///it finds a subset of size hh with smallest sigma, and then returns mean and
612 ///sigma of this subset
613 
615 {
616  if (hh==0)
617  hh=(nvectors+2)/2;
618  Double_t faclts[]={2.6477,2.5092,2.3826,2.2662,2.1587,2.0589,1.9660,1.879,1.7973,1.7203,1.6473};
619  Int_t *index=new Int_t[nvectors];
620  TMath::Sort(nvectors, data, index, kFALSE);
621 
622  Int_t nquant;
623  nquant=TMath::Min(Int_t(Double_t(((hh*1./nvectors)-0.5)*40))+1, 11);
624  Double_t factor=faclts[nquant-1];
625 
626  Double_t *aw=new Double_t[nvectors];
627  Double_t *aw2=new Double_t[nvectors];
628  Double_t sq=0;
629  Double_t sqmin=0;
630  Int_t ndup=0;
631  Int_t len=nvectors-hh;
632  Double_t *slutn=new Double_t[len];
633  for(Int_t i=0; i<len; i++)
634  slutn[i]=0;
635  for(Int_t jint=0; jint<len; jint++) {
636  aw[jint]=0;
637  for (Int_t j=0; j<hh; j++) {
638  aw[jint]+=data[index[j+jint]];
639  if(jint==0)
640  sq+=data[index[j]]*data[index[j]];
641  }
642  aw2[jint]=aw[jint]*aw[jint]/hh;
643 
644  if(jint==0) {
645  sq=sq-aw2[jint];
646  sqmin=sq;
647  slutn[ndup]=aw[jint];
648 
649  } else {
650  sq=sq - data[index[jint-1]]*data[index[jint-1]]+
651  data[index[jint+hh]]*data[index[jint+hh]]-
652  aw2[jint]+aw2[jint-1];
653  if(sq<sqmin) {
654  ndup=0;
655  sqmin=sq;
656  slutn[ndup]=aw[jint];
657 
658  } else {
659  if(sq==sqmin) {
660  ndup++;
661  slutn[ndup]=aw[jint];
662  }
663  }
664  }
665  }
666 
667  slutn[0]=slutn[Int_t((ndup)/2)]/hh;
668  Double_t bstd=factor*TMath::Sqrt(sqmin/hh);
669  mean=slutn[0];
670  sigma=bstd;
671  delete [] aw;
672  delete [] aw2;
673  delete [] slutn;
674  delete [] index;
675 }
676 
677 ////////////////////////////////////////////////////////////////////////////////
678 ///returns the breakdown point of the algorithm
679 
681 {
682  Int_t n;
683  n=(fN-fH+1)/fN;
684  return n;
685 }
686 
687 ////////////////////////////////////////////////////////////////////////////////
688 ///returns the chi2 quantiles
689 
691 {
692  if (i < 0 || i >= 50) return 0;
693  return kChiQuant[i];
694 }
695 
696 ////////////////////////////////////////////////////////////////////////////////
697 ///returns the covariance matrix
698 
700 {
701  if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar){
702  Warning("GetCovariance","provided matrix is of the wrong size, it will be resized");
703  matr.ResizeTo(fNvar, fNvar);
704  }
705  matr=fCovariance;
706 }
707 
708 ////////////////////////////////////////////////////////////////////////////////
709 ///returns the correlation matrix
710 
712 {
713  if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar) {
714  Warning("GetCorrelation","provided matrix is of the wrong size, it will be resized");
715  matr.ResizeTo(fNvar, fNvar);
716  }
717  matr=fCorrelation;
718 }
719 
720 ////////////////////////////////////////////////////////////////////////////////
721 ///if the points are on a hyperplane, returns this hyperplane
722 
724 {
725  if (fExact==0) {
726  Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
727  return 0;
728  } else {
729  return &fHyperplane;
730  }
731 }
732 
733 ////////////////////////////////////////////////////////////////////////////////
734 ///if the points are on a hyperplane, returns this hyperplane
735 
737 {
738  if (fExact==0){
739  Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
740  return;
741  }
742  if (vec.GetNoElements()!=fNvar) {
743  Warning("GetHyperPlane","provided vector is of the wrong size, it will be resized");
744  vec.ResizeTo(fNvar);
745  }
746  vec=fHyperplane;
747 }
748 
749 ////////////////////////////////////////////////////////////////////////////////
750 ///return the estimate of the mean
751 
753 {
754  if (means.GetNoElements()!=fNvar) {
755  Warning("GetMean","provided vector is of the wrong size, it will be resized");
756  means.ResizeTo(fNvar);
757  }
758  means=fMean;
759 }
760 
761 ////////////////////////////////////////////////////////////////////////////////
762 ///returns the robust distances (helps to find outliers)
763 
765 {
766  if (rdist.GetNoElements()!=fN) {
767  Warning("GetRDistances","provided vector is of the wrong size, it will be resized");
768  rdist.ResizeTo(fN);
769  }
770  rdist=fRd;
771 }
772 
773 ////////////////////////////////////////////////////////////////////////////////
774 ///returns the number of outliers
775 
777 {
778  return fOut.GetSize();
779 }
780 
781 ////////////////////////////////////////////////////////////////////////////////
782 ///update the sscp matrix with vector vec
783 
785 {
786  Int_t i, j;
787  for (j=1; j<fNvar+1; j++) {
788  sscp(0, j) +=vec(j-1);
789  sscp(j, 0) = sscp(0, j);
790  }
791  for (i=1; i<fNvar+1; i++) {
792  for (j=1; j<fNvar+1; j++) {
793  sscp(i, j) += vec(i-1)*vec(j-1);
794  }
795  }
796 }
797 
798 ////////////////////////////////////////////////////////////////////////////////
799 ///clear the sscp matrix, used for covariance and mean calculation
800 
802 {
803  for (Int_t i=0; i<fNvar+1; i++) {
804  for (Int_t j=0; j<fNvar+1; j++) {
805  sscp(i, j)=0;
806  }
807  }
808 }
809 
810 ////////////////////////////////////////////////////////////////////////////////
811 ///called when h=n. Returns classic covariance matrix
812 ///and mean
813 
815 {
816  TMatrixD sscp(fNvar+1, fNvar+1);
817  TVectorD temp(fNvar);
818  ClearSscp(sscp);
819  for (Int_t i=0; i<fN; i++) {
820  for (Int_t j=0; j<fNvar; j++)
821  temp(j)=fData(i, j);
822  AddToSscp(sscp, temp);
823  }
824  Covar(sscp, fMean, fCovariance, fSd, fN);
825  Correl();
826 
827 }
828 
829 ////////////////////////////////////////////////////////////////////////////////
830 ///calculates mean and covariance
831 
833 {
834  Int_t i, j;
835  Double_t f;
836  for (i=0; i<fNvar; i++) {
837  m(i)=sscp(0, i+1);
838  sd[i]=sscp(i+1, i+1);
839  f=(sd[i]-m(i)*m(i)/nvec)/(nvec-1);
840  if (f>1e-14) sd[i]=TMath::Sqrt(f);
841  else sd[i]=0;
842  m(i)/=nvec;
843  }
844  for (i=0; i<fNvar; i++) {
845  for (j=0; j<fNvar; j++) {
846  cov(i, j)=sscp(i+1, j+1)-nvec*m(i)*m(j);
847  cov(i, j)/=nvec-1;
848  }
849  }
850 }
851 
852 ////////////////////////////////////////////////////////////////////////////////
853 ///transforms covariance matrix into correlation matrix
854 
856 {
857  Int_t i, j;
858  Double_t *sd=new Double_t[fNvar];
859  for(j=0; j<fNvar; j++)
860  sd[j]=1./TMath::Sqrt(fCovariance(j, j));
861  for(i=0; i<fNvar; i++) {
862  for (j=0; j<fNvar; j++) {
863  if (i==j)
864  fCorrelation(i, j)=1.;
865  else
866  fCorrelation(i, j)=fCovariance(i, j)*sd[i]*sd[j];
867  }
868  }
869  delete [] sd;
870 }
871 
872 ////////////////////////////////////////////////////////////////////////////////
873 ///creates a subset of htotal elements from ntotal elements
874 ///first, p+1 elements are drawn randomly(without repetitions)
875 ///if their covariance matrix is singular, more elements are
876 ///added one by one, until their covariance matrix becomes regular
877 ///or it becomes clear that htotal observations lie on a hyperplane
878 ///If covariance matrix determinant!=0, distances of all ntotal elements
879 ///are calculated, using formula d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where
880 ///M is mean and S_inv is the inverse of the covariance matrix
881 ///htotal points with smallest distances are included in the returned subset.
882 
883 void TRobustEstimator::CreateSubset(Int_t ntotal, Int_t htotal, Int_t p, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
884 {
885  Double_t kEps = 1e-14;
886  Int_t i, j;
887  Bool_t repeat=kFALSE;
888  Int_t nindex=0;
889  Int_t num;
890  for(i=0; i<ntotal; i++)
891  index[i]=ntotal+1;
892 
893  for (i=0; i<p+1; i++) {
894  num=Int_t(gRandom->Uniform(0, 1)*(ntotal-1));
895  if (i>0){
896  for(j=0; j<=i-1; j++) {
897  if(index[j]==num)
898  repeat=kTRUE;
899  }
900  }
901  if(repeat==kTRUE) {
902  i--;
903  repeat=kFALSE;
904  } else {
905  index[i]=num;
906  nindex++;
907  }
908  }
909 
910  ClearSscp(sscp);
911 
912  TVectorD vec(fNvar);
913  Double_t det;
914  for (i=0; i<p+1; i++) {
915  for (j=0; j<fNvar; j++) {
916  vec[j]=data[index[i]][j];
917 
918  }
919  AddToSscp(sscp, vec);
920  }
921 
922  Covar(sscp, fMean, fCovariance, fSd, p+1);
923  det=fCovariance.Determinant();
924  while((det<kEps)&&(nindex < htotal)) {
925  //if covariance matrix is singular,another vector is added until
926  //the matrix becomes regular or it becomes clear that all
927  //vectors of the group lie on a hyperplane
928  repeat=kFALSE;
929  do{
930  num=Int_t(gRandom->Uniform(0,1)*(ntotal-1));
931  repeat=kFALSE;
932  for(i=0; i<nindex; i++) {
933  if(index[i]==num) {
934  repeat=kTRUE;
935  break;
936  }
937  }
938  }while(repeat==kTRUE);
939 
940  index[nindex]=num;
941  nindex++;
942  //check if covariance matrix is singular
943  for(j=0; j<fNvar; j++)
944  vec[j]=data[index[nindex-1]][j];
945  AddToSscp(sscp, vec);
946  Covar(sscp, fMean, fCovariance, fSd, nindex);
947  det=fCovariance.Determinant();
948  }
949 
950  if(nindex!=htotal) {
951  TDecompChol chol(fCovariance);
952  fInvcovariance = chol.Invert();
953 
954  TVectorD temp(fNvar);
955  for(j=0; j<ntotal; j++) {
956  ndist[j]=0;
957  for(i=0; i<fNvar; i++)
958  temp[i]=data[j][i] - fMean(i);
959  temp*=fInvcovariance;
960  for(i=0; i<fNvar; i++)
961  ndist[j]+=(data[j][i]-fMean(i))*temp[i];
962  }
963  KOrdStat(ntotal, ndist, htotal-1,index);
964  }
965 
966 }
967 
968 ////////////////////////////////////////////////////////////////////////////////
969 ///creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane
970 ///hyp[1]*(x1-mean[1])+...+hyp[nvar]*(xnvar-mean[nvar])=0
971 ///This function is called in case when less than fH samples lie on a hyperplane.
972 
973 void TRobustEstimator::CreateOrtSubset(TMatrixD &dat,Int_t *index, Int_t hmerged, Int_t nmerged, TMatrixD &sscp, Double_t *ndist)
974 {
975  Int_t i, j;
976  TVectorD vec(fNvar);
977  for (i=0; i<nmerged; i++) {
978  ndist[i]=0;
979  for(j=0; j<fNvar; j++) {
980  ndist[i]+=fHyperplane[j]*(dat[i][j]-fMean[j]);
981  ndist[i]=TMath::Abs(ndist[i]);
982  }
983  }
984  KOrdStat(nmerged, ndist, hmerged-1, index);
985  ClearSscp(sscp);
986  for (i=0; i<hmerged; i++) {
987  for(j=0; j<fNvar; j++)
988  vec[j]=dat[index[i]][j];
989  AddToSscp(sscp, vec);
990  }
991  Covar(sscp, fMean, fCovariance, fSd, hmerged);
992 }
993 
994 ////////////////////////////////////////////////////////////////////////////////
995 ///from the input htotal-subset constructs another htotal subset with lower determinant
996 ///
997 ///As proven by Peter J.Rousseeuw and Katrien Van Driessen, if distances for all elements
998 ///are calculated, using the formula:d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where M is the mean
999 ///of the input htotal-subset, and S_inv - the inverse of its covariance matrix, then
1000 ///htotal elements with smallest distances will have covariance matrix with determinant
1001 ///less or equal to the determinant of the input subset covariance matrix.
1002 ///
1003 ///determinant for this htotal-subset with smallest distances is returned
1004 
1005 Double_t TRobustEstimator::CStep(Int_t ntotal, Int_t htotal, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
1006 {
1007  Int_t i, j;
1008  TVectorD vec(fNvar);
1009  Double_t det;
1010 
1011  TDecompChol chol(fCovariance);
1012  fInvcovariance = chol.Invert();
1013 
1014  TVectorD temp(fNvar);
1015  for(j=0; j<ntotal; j++) {
1016  ndist[j]=0;
1017  for(i=0; i<fNvar; i++)
1018  temp[i]=data[j][i]-fMean[i];
1019  temp*=fInvcovariance;
1020  for(i=0; i<fNvar; i++)
1021  ndist[j]+=(data[j][i]-fMean[i])*temp[i];
1022  }
1023 
1024  //taking h smallest
1025  KOrdStat(ntotal, ndist, htotal-1, index);
1026  //writing their mean and covariance
1027  ClearSscp(sscp);
1028  for (i=0; i<htotal; i++) {
1029  for (j=0; j<fNvar; j++)
1030  temp[j]=data[index[i]][j];
1031  AddToSscp(sscp, temp);
1032  }
1033  Covar(sscp, fMean, fCovariance, fSd, htotal);
1034  det = fCovariance.Determinant();
1035  return det;
1036 }
1037 
1038 ////////////////////////////////////////////////////////////////////////////////
1039 ///for the exact fit situaions
1040 ///returns number of observations on the hyperplane
1041 
1043 {
1044  Int_t i, j;
1045 
1047  TVectorD eigenValues=eigen.GetEigenValues();
1048  TMatrixD eigenMatrix=eigen.GetEigenVectors();
1049 
1050  for (j=0; j<fNvar; j++) {
1051  fHyperplane[j]=eigenMatrix(j,fNvar-1);
1052  }
1053  //calculate and return how many observations lie on the hyperplane
1054  for (i=0; i<fN; i++) {
1055  ndist[i]=0;
1056  for(j=0; j<fNvar; j++) {
1057  ndist[i]+=fHyperplane[j]*(fData[i][j]-fMean[j]);
1058  ndist[i]=TMath::Abs(ndist[i]);
1059  }
1060  }
1061  Int_t nhyp=0;
1062 
1063  for (i=0; i<fN; i++) {
1064  if(ndist[i] < 1e-14) nhyp++;
1065  }
1066  return nhyp;
1067 
1068 }
1069 
1070 ////////////////////////////////////////////////////////////////////////////////
1071 ///This function is called if determinant of the covariance matrix of a subset=0.
1072 ///
1073 ///If there are more then fH vectors on a hyperplane,
1074 ///returns this hyperplane and stops
1075 ///else stores the hyperplane coordinates in hyperplane matrix
1076 
1077 Int_t TRobustEstimator::Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane,
1078  Double_t *deti, Int_t nbest, Int_t kgroup,
1079  TMatrixD &sscp, Double_t *ndist)
1080 {
1081  Int_t i, j;
1082 
1083  TVectorD vec(fNvar);
1084  Int_t maxind = TMath::LocMax(nbest, deti);
1085  Int_t nh=Exact(ndist);
1086  //now nh is the number of observation on the hyperplane
1087  //ndist stores distances of observation from this hyperplane
1088  if(nh>=fH) {
1089  ClearSscp(sscp);
1090  for (i=0; i<fN; i++) {
1091  if(ndist[i]<1e-14) {
1092  for (j=0; j<fNvar; j++)
1093  vec[j]=fData[i][j];
1094  AddToSscp(sscp, vec);
1095  }
1096  }
1097  Covar(sscp, fMean, fCovariance, fSd, nh);
1098 
1099  fExact=nh;
1100  return nbest+1;
1101 
1102  } else {
1103  //if less than fH observations lie on a hyperplane,
1104  //mean and covariance matrix are stored in mstockbig
1105  //and cstockbig in place of the previous maximum determinant
1106  //mean and covariance
1107  for(i=0; i<fNvar; i++) {
1108  mstockbig(nbest*kgroup+maxind,i)=fMean(i);
1109  hyperplane(nbest*kgroup+maxind,i)=fHyperplane(i);
1110  for(j=0; j<fNvar; j++) {
1111  cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
1112  }
1113  }
1114  return maxind;
1115  }
1116 }
1117 
1118 
1119 ////////////////////////////////////////////////////////////////////////////////
1120 ///divides the elements into approximately equal subgroups
1121 ///number of elements in each subgroup is stored in indsubdat
1122 ///number of subgroups is returned
1123 
1125 {
1126  Int_t nsub;
1127  if ((fN>=2*nmini) && (fN<=(3*nmini-1))) {
1128  if (fN%2==1){
1129  indsubdat[0]=Int_t(fN*0.5);
1130  indsubdat[1]=Int_t(fN*0.5)+1;
1131  } else
1132  indsubdat[0]=indsubdat[1]=Int_t(fN/2);
1133  nsub=2;
1134  }
1135  else{
1136  if((fN>=3*nmini) && (fN<(4*nmini -1))) {
1137  if(fN%3==0){
1138  indsubdat[0]=indsubdat[1]=indsubdat[2]=Int_t(fN/3);
1139  } else {
1140  indsubdat[0]=Int_t(fN/3);
1141  indsubdat[1]=Int_t(fN/3)+1;
1142  if (fN%3==1) indsubdat[2]=Int_t(fN/3);
1143  else indsubdat[2]=Int_t(fN/3)+1;
1144  }
1145  nsub=3;
1146  }
1147  else{
1148  if((fN>=4*nmini)&&(fN<=(5*nmini-1))){
1149  if (fN%4==0) indsubdat[0]=indsubdat[1]=indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1150  else {
1151  indsubdat[0]=Int_t(fN/4);
1152  indsubdat[1]=Int_t(fN/4)+1;
1153  if(fN%4==1) indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1154  if(fN%4==2) {
1155  indsubdat[2]=Int_t(fN/4)+1;
1156  indsubdat[3]=Int_t(fN/4);
1157  }
1158  if(fN%4==3) indsubdat[2]=indsubdat[3]=Int_t(fN/4)+1;
1159  }
1160  nsub=4;
1161  } else {
1162  for(Int_t i=0; i<5; i++)
1163  indsubdat[i]=nmini;
1164  nsub=5;
1165  }
1166  }
1167  }
1168  return nsub;
1169 }
1170 
1171 ////////////////////////////////////////////////////////////////////////////////
1172 ///Calculates robust distances.Then the samples with robust distances
1173 ///greater than a cutoff value (0.975 quantile of chi2 distribution with
1174 ///fNvar degrees of freedom, multiplied by a correction factor), are given
1175 ///weiht=0, and new, reweighted estimates of location and scatter are calculated
1176 ///The function returns the number of outliers.
1177 
1179 {
1180  Int_t i, j;
1181  Int_t nout=0;
1182 
1183  TVectorD temp(fNvar);
1184  TDecompChol chol(fCovariance);
1185  fInvcovariance = chol.Invert();
1186 
1187 
1188  for (i=0; i<fN; i++) {
1189  fRd[i]=0;
1190  for(j=0; j<fNvar; j++) {
1191  temp[j]=fData[i][j]-fMean[j];
1192  }
1193  temp*=fInvcovariance;
1194  for(j=0; j<fNvar; j++) {
1195  fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1196  }
1197  }
1198 
1199  Double_t med;
1200  Double_t chi = kChiMedian[fNvar-1];
1201 
1202  med=TMath::Median(fN, fRd.GetMatrixArray());
1203  med/=chi;
1204  fCovariance*=med;
1205  TDecompChol chol2(fCovariance);
1206  fInvcovariance = chol2.Invert();
1207 
1208  for (i=0; i<fN; i++) {
1209  fRd[i]=0;
1210  for(j=0; j<fNvar; j++) {
1211  temp[j]=fData[i][j]-fMean[j];
1212  }
1213 
1214  temp*=fInvcovariance;
1215  for(j=0; j<fNvar; j++) {
1216  fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1217  }
1218  }
1219 
1220  Double_t cutoff = kChiQuant[fNvar-1];
1221 
1222  ClearSscp(sscp);
1223  for(i=0; i<fN; i++) {
1224  if (fRd[i]<=cutoff) {
1225  for(j=0; j<fNvar; j++)
1226  temp[j]=fData[i][j];
1227  AddToSscp(sscp,temp);
1228  } else {
1229  nout++;
1230  }
1231  }
1232 
1233  Covar(sscp, fMean, fCovariance, fSd, fN-nout);
1234  return nout;
1235 }
1236 
1237 ////////////////////////////////////////////////////////////////////////////////
1238 ///Draws ngroup nonoverlapping subdatasets out of a dataset of size n
1239 ///such that the selected case numbers are uniformly distributed from 1 to n
1240 
1241 void TRobustEstimator::RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
1242 {
1243  Int_t jndex = 0;
1244  Int_t nrand;
1245  Int_t i, k, m, j;
1246  for (k=1; k<=ngroup; k++) {
1247  for (m=1; m<=indsubdat[k-1]; m++) {
1248  nrand = Int_t(gRandom->Uniform(0, 1) * double(fN-jndex))+1;
1249  //printf("nrand = %d - jndex %d\n",nrand,jndex);
1250  jndex++;
1251  if (jndex==1) {
1252  subdat[0]=nrand-1; // in case nrand is equal to fN
1253  } else {
1254  subdat[jndex-1]=nrand+jndex-2;
1255  for (i=1; i<=jndex-1; i++) {
1256  if(subdat[i-1] > nrand+i-2) {
1257  for(j=jndex; j>=i+1; j--) {
1258  subdat[j-1]=subdat[j-2];
1259  }
1260  //printf("subdata[] i = %d - nrand %d\n",i,nrand);
1261  subdat[i-1]=nrand+i-2;
1262  break; //breaking the loop for(i=1...
1263  }
1264  }
1265  }
1266  }
1267  }
1268 }
1269 
1270 ////////////////////////////////////////////////////////////////////////////////
1271 ///because I need an Int_t work array
1272 
1274  Bool_t isAllocated = kFALSE;
1275  const Int_t kWorkMax=100;
1276  Int_t i, ir, j, l, mid;
1277  Int_t arr;
1278  Int_t *ind;
1279  Int_t workLocal[kWorkMax];
1280  Int_t temp;
1281 
1282 
1283  if (work) {
1284  ind = work;
1285  } else {
1286  ind = workLocal;
1287  if (ntotal > kWorkMax) {
1288  isAllocated = kTRUE;
1289  ind = new Int_t[ntotal];
1290  }
1291  }
1292 
1293  for (Int_t ii=0; ii<ntotal; ii++) {
1294  ind[ii]=ii;
1295  }
1296  Int_t rk = k;
1297  l=0;
1298  ir = ntotal-1;
1299  for(;;) {
1300  if (ir<=l+1) { //active partition contains 1 or 2 elements
1301  if (ir == l+1 && a[ind[ir]]<a[ind[l]])
1302  {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1303  Double_t tmp = a[ind[rk]];
1304  if (isAllocated)
1305  delete [] ind;
1306  return tmp;
1307  } else {
1308  mid = (l+ir) >> 1; //choose median of left, center and right
1309  {temp = ind[mid]; ind[mid]=ind[l+1]; ind[l+1]=temp;}//elements as partitioning element arr.
1310  if (a[ind[l]]>a[ind[ir]]) //also rearrange so that a[l]<=a[l+1]
1311  {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1312 
1313  if (a[ind[l+1]]>a[ind[ir]])
1314  {temp=ind[l+1]; ind[l+1]=ind[ir]; ind[ir]=temp;}
1315 
1316  if (a[ind[l]]>a[ind[l+1]])
1317  {temp = ind[l]; ind[l]=ind[l+1]; ind[l+1]=temp;}
1318 
1319  i=l+1; //initialize pointers for partitioning
1320  j=ir;
1321  arr = ind[l+1];
1322  for (;;) {
1323  do i++; while (a[ind[i]]<a[arr]);
1324  do j--; while (a[ind[j]]>a[arr]);
1325  if (j<i) break; //pointers crossed, partitioning complete
1326  {temp=ind[i]; ind[i]=ind[j]; ind[j]=temp;}
1327  }
1328  ind[l+1]=ind[j];
1329  ind[j]=arr;
1330  if (j>=rk) ir = j-1; //keep active the partition that
1331  if (j<=rk) l=i; //contains the k_th element
1332  }
1333  }
1334 }
const TVectorD & GetEigenValues() const
tuple row
Definition: mrt.py:26
void Classic()
called when h=n.
Int_t GetNOut()
returns the number of outliers
double par[1]
Definition: unuranDistr.cxx:38
TVectorT< Element > & ResizeTo(Int_t lwb, Int_t upb)
Resize the vector to [lwb:upb] .
Definition: TVectorT.cxx:291
virtual TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1)
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
void CreateOrtSubset(TMatrixD &dat, Int_t *index, Int_t hmerged, Int_t nmerged, TMatrixD &sscp, Double_t *ndist)
creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane hyp[1]*(x1-m...
Long64_t LocMax(Long64_t n, const T *a)
Definition: TMath.h:724
void CreateSubset(Int_t ntotal, Int_t htotal, Int_t p, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
creates a subset of htotal elements from ntotal elements first, p+1 elements are drawn randomly(witho...
void AddColumn(Double_t *col)
adds a column to the data matrix it is assumed that the column has size fN variable fVarTemp keeps th...
void Evaluate()
Finds the estimate of multivariate mean and variance.
Int_t Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane, Double_t *deti, Int_t nbest, Int_t kgroup, TMatrixD &sscp, Double_t *ndist)
This function is called if determinant of the covariance matrix of a subset=0.
void ClearSscp(TMatrixD &sscp)
clear the sscp matrix, used for covariance and mean calculation
Double_t CStep(Int_t ntotal, Int_t htotal, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
from the input htotal-subset constructs another htotal subset with lower determinant ...
#define R__ASSERT(e)
Definition: TError.h:98
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:170
int Int_t
Definition: RtypesCore.h:41
bool Bool_t
Definition: RtypesCore.h:59
TArc * a
Definition: textangle.C:12
const Bool_t kFALSE
Definition: Rtypes.h:92
Double_t KOrdStat(Int_t ntotal, Double_t *arr, Int_t k, Int_t *work)
because I need an Int_t work array
void AddRow(Double_t *row)
adds a vector to the data matrix it is supposed that the vector is of size fNvar
virtual TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1)
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
Definition: TMatrixT.cxx:1202
const TVectorD * GetHyperplane() const
if the points are on a hyperplane, returns this hyperplane
Short_t Abs(Short_t d)
Definition: TMathBase.h:110
TFile * f
Double_t GetChiQuant(Int_t i) const
returns the chi2 quantiles
TMatrixDSym fInvcovariance
virtual Double_t Determinant() const
const Double_t kChiQuant[50]
void Sort(Index n, const Element *a, Index *index, Bool_t down=kTRUE)
Definition: TMath.h:1002
const TVectorD * GetRDistances() const
Bool_t Invert(TMatrixDSym &inv)
For a symmetric matrix A(m,m), its inverse A_inv(m,m) is returned .
const Double_t sigma
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition: TObject.cxx:918
void Set(Int_t n)
Set size of this array to n ints.
Definition: TArrayI.cxx:105
Int_t GetNrows() const
Definition: TMatrixTBase.h:134
Element * GetMatrixArray()
Definition: TVectorT.h:84
void RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
Draws ngroup nonoverlapping subdatasets out of a dataset of size n such that the selected case number...
TMarker * m
Definition: textangle.C:8
void AddToSscp(TMatrixD &sscp, TVectorD &vec)
update the sscp matrix with vector vec
TLine * l
Definition: textangle.C:4
Int_t Exact(Double_t *ndist)
for the exact fit situaions returns number of observations on the hyperplane
Int_t Partition(Int_t nmini, Int_t *indsubdat)
divides the elements into approximately equal subgroups number of elements in each subgroup is stored...
TMatrixDSym fCovariance
R__EXTERN TRandom * gRandom
Definition: TRandom.h:62
Int_t GetSize() const
Definition: TArray.h:49
const TMatrixDSym * GetCorrelation() const
double Double_t
Definition: RtypesCore.h:55
const TMatrixD & GetEigenVectors() const
Double_t Median(Long64_t n, const T *a, const Double_t *w=0, Long64_t *work=0)
Definition: TMath.h:1077
Int_t GetNcols() const
Definition: TMatrixTBase.h:137
Int_t RDist(TMatrixD &sscp)
Calculates robust distances.Then the samples with robust distances greater than a cutoff value (0...
void EvaluateUni(Int_t nvectors, Double_t *data, Double_t &mean, Double_t &sigma, Int_t hh=0)
for the univariate case estimates of location and scatter are returned in mean and sigma parameters t...
ClassImp(TRobustEstimator) const Double_t kChiMedian[50]
virtual Double_t Uniform(Double_t x1=1)
Returns a uniform deviate on the interval (0, x1).
Definition: TRandom.cxx:606
Int_t GetBDPoint()
returns the breakdown point of the algorithm
void Correl()
transforms covariance matrix into correlation matrix
Int_t GetNoElements() const
Definition: TVectorT.h:82
TMatrixDSym fCorrelation
const TVectorD * GetMean() const
const TMatrixDSym * GetCovariance() const
void Covar(TMatrixD &sscp, TVectorD &m, TMatrixDSym &cov, TVectorD &sd, Int_t nvec)
calculates mean and covariance
TRobustEstimator()
this constructor should be used in a univariate case: first call this constructor, then - the EvaluateUni(..) function
Double_t Sqrt(Double_t x)
Definition: TMath.h:464
Long64_t LocMin(Long64_t n, const T *a)
Definition: TMath.h:695
const Bool_t kTRUE
Definition: Rtypes.h:91
const Int_t n
Definition: legend1.C:16
int ii
Definition: hprod.C:34
virtual void Warning(const char *method, const char *msgfmt,...) const
Issue warning message.
Definition: TObject.cxx:904