ROOT  6.06/09
Reference Guide
TPrincipal.cxx
Go to the documentation of this file.
1 // @(#)root/hist:$Id$
2 // Author: Christian Holm Christensen 1/8/2000
3 
4 /*************************************************************************
5  * Copyright (C) 1995-2004, Rene Brun and Fons Rademakers. *
6  * All rights reserved. *
7  * *
8  * For the licensing terms see $ROOTSYS/LICENSE. *
9  * For the list of contributors see $ROOTSYS/README/CREDITS. *
10  *************************************************************************/
11 
12 /** \class TPrincipal
13  \ingroup Hist
14 Principal Components Analysis (PCA)
15 
16 The current implementation is based on the LINTRA package from CERNLIB
17 by R. Brun, H. Hansroul, and J. Kubler.
18 The class has been implemented by Christian Holm Christensen in August 2000.
19 
20 ## Introduction
21 
22 In many applications of various fields of research, the treatment of
23 large amounts of data requires powerful techniques capable of rapid
24 data reduction and analysis. Usually, the quantities most
25 conveniently measured by the experimentalist, are not necessarily the
26 most significant for classification and analysis of the data. It is
27 then useful to have a way of selecting an optimal set of variables
28 necessary for the recognition process and reducing the dimensionality
29 of the problem, resulting in an easier classification procedure.
30 
31 This paper describes the implementation of one such method of
32 feature selection, namely the principal components analysis. This
33 multidimensional technique is well known in the field of pattern
34 recognition and and its use in Particle Physics has been documented
35 elsewhere (cf. H. Wind, <I>Function Parameterization</I>, CERN
36 72-21).
37 
38 ## Overview
39 Suppose we have prototypes which are trajectories of particles,
40 passing through a spectrometer. If one measures the passage of the
41 particle at say 8 fixed planes, the trajectory is described by an
42 8-component vector:
43 \f[
44  \mathbf{x} = \left(x_0, x_1, \ldots, x_7\right)
45 \f]
46 in 8-dimensional pattern space.
47 
48 One proceeds by generating a a representative tracks sample and
49 building up the covariance matrix \f$\mathsf{C}\f$. Its eigenvectors and
50 eigenvalues are computed by standard methods, and thus a new basis is
51 obtained for the original 8-dimensional space the expansion of the
52 prototypes,
53 \f[
54  \mathbf{x}_m = \sum^7_{i=0} a_{m_i} \mathbf{e}_i
55  \quad
56  \mbox{where}
57  \quad
58  a_{m_i} = \mathbf{x}^T\bullet\mathbf{e}_i
59 \f]
60 allows the study of the behavior of the coefficients \f$a_{m_i}\f$ for all
61 the tracks of the sample. The eigenvectors which are insignificant for
62 the trajectory description in the expansion will have their
63 corresponding coefficients \f$a_{m_i}\f$ close to zero for all the
64 prototypes.
65 
66 On one hand, a reduction of the dimensionality is then obtained by
67 omitting these least significant vectors in the subsequent analysis.
68 
69 On the other hand, in the analysis of real data, these least
70 significant variables(?) can be used for the pattern
71 recognition problem of extracting the valid combinations of
72 coordinates describing a true trajectory from the set of all possible
73 wrong combinations.
74 
75 The program described here performs this principal components analysis
76 on a sample of data provided by the user. It computes the covariance
77 matrix, its eigenvalues ands corresponding eigenvectors and exhibits
78 the behavior of the principal components \f$a_{m_i}\f$, thus providing
79 to the user all the means of understanding their data.
80 
81 ## Principal Components Method
82 Let's consider a sample of \f$M\f$ prototypes each being characterized by
83 \f$P\f$ variables \f$x_0, x_1, \ldots, x_{P-1}\f$. Each prototype is a point, or a
84 column vector, in a \f$P\f$-dimensional *Pattern space*.
85 \f[
86  \mathbf{x} = \left[\begin{array}{c}
87  x_0\\x_1\\\vdots\\x_{P-1}\end{array}\right]\,,
88 \f]
89 where each \f$x_n\f$ represents the particular value associated with the
90 \f$n\f$-dimension.
91 
92 Those \f$P\f$ variables are the quantities accessible to the
93 experimentalist, but are not necessarily the most significant for the
94 classification purpose.
95 
96 The *Principal Components Method* consists of applying a
97 *linear* transformation to the original variables. This
98 transformation is described by an orthogonal matrix and is equivalent
99 to a rotation of the original pattern space into a new set of
100 coordinate vectors, which hopefully provide easier feature
101 identification and dimensionality reduction.
102 
103 Let's define the covariance matrix:
104 \f[
105  \mathsf{C} = \left\langle\mathbf{y}\mathbf{y}^T\right\rangle
106  \quad\mbox{where}\quad
107  \mathbf{y} = \mathbf{x} - \left\langle\mathbf{x}\right\rangle\,,
108 \f]
109 and the brackets indicate mean value over the sample of \f$M\f$
110 prototypes.
111 
112 This matrix \f$\mathsf{C}\f$ is real, positive definite, symmetric, and will
113 have all its eigenvalues greater then zero. It will now be show that
114 among the family of all the complete orthonormal bases of the pattern
115 space, the base formed by the eigenvectors of the covariance matrix
116 and belonging to the largest eigenvalues, corresponds to the most
117 significant features of the description of the original prototypes.
118 
119 let the prototypes be expanded on into a set of \f$N\f$ basis vectors
120 \f$\mathbf{e}_n, n=0,\ldots,N,N+1, \ldots, P-1\f$
121 \f[
122  \mathbf{y}_i = \sum^N_{i=0} a_{i_n} \mathbf{e}_n,
123  \quad
124  i = 1, \ldots, M,
125  \quad
126  N < P-1
127 \f]
128 The `best' feature coordinates \f$\mathbf{e}_n\f$, spanning a *feature
129 space*, will be obtained by minimizing the error due to this
130 truncated expansion, i.e.,
131 \f[
132  \min\left(E_N\right) =
133  \min\left[\left\langle\left(\mathbf{y}_i - \sum^N_{i=0} a_{i_n} \mathbf{e}_n\right)^2\right\rangle\right]
134 \f]
135 with the conditions:
136 \f[
137  \mathbf{e}_k\bullet\mathbf{e}_j = \delta_{jk} =
138  \left\{\begin{array}{rcl}
139  1 & \mbox{for} & k = j\\
140  0 & \mbox{for} & k \neq j
141  \end{array}\right.
142 \f]
143 Multiplying (3) by \f$\mathbf{e}^T_n\f$ using (5),
144 we get
145 \f[
146  a_{i_n} = \mathbf{y}_i^T\bullet\mathbf{e}_n\,,
147 \f]
148 so the error becomes
149 \f{eqnarray*}{
150  E_N &=&
151  \left\langle\left[\sum_{n=N+1}^{P-1} a_{i_n}\mathbf{e}_n\right]^2\right\rangle\nonumber\\
152  &=&
153  \left\langle\left[\sum_{n=N+1}^{P-1} \mathbf{y}_i^T\bullet\mathbf{e}_n\mathbf{e}_n\right]^2\right\rangle\nonumber\\
154  &=&
155  \left\langle\sum_{n=N+1}^{P-1} \mathbf{e}_n^T\mathbf{y}_i\mathbf{y}_i^T\mathbf{e}_n\right\rangle\nonumber\\
156  &=&
157  \sum_{n=N+1}^{P-1} \mathbf{e}_n^T\mathsf{C}\mathbf{e}_n
158 \f}
159 The minimization of the sum in (7) is obtained when each
160 term \f$\mathbf{e}_n^\mathsf{C}\mathbf{e}_n\f$ is minimum, since \f$\mathsf{C}\f$ is
161 positive definite. By the method of Lagrange multipliers, and the
162 condition (5), we get
163 \f[
164  E_N = \sum^{P-1}_{n=N+1} \left(\mathbf{e}_n^T\mathsf{C}\mathbf{e}_n -
165  l_n\mathbf{e}_n^T\bullet\mathbf{e}_n + l_n\right)
166 \f]
167 The minimum condition \f$\frac{dE_N}{d\mathbf{e}^T_n} = 0\f$ leads to the
168 equation
169 \f[
170  \mathsf{C}\mathbf{e}_n = l_n\mathbf{e}_n\,,
171 \f]
172 which shows that \f$\mathbf{e}_n\f$ is an eigenvector of the covariance
173 matrix \f$\mathsf{C}\f$ with eigenvalue \f$l_n\f$. The estimated minimum error is
174 then given by
175 \f[
176  E_N \sim \sum^{P-1}_{n=N+1} \mathbf{e}_n^T\bullet l_n\mathbf{e}_n
177  = \sum^{P-1}_{n=N+1} l_n\,,
178 \f]
179 where \f$l_n,\,n=N+1,\ldots,P\f$ \f$l_n,\,n=N+1,\ldots,P-1\f$ are the eigenvalues associated with the
180 omitted eigenvectors in the expansion (3). Thus, by choosing
181 the \f$N\f$ largest eigenvalues, and their associated eigenvectors, the
182 error \f$E_N\f$ is minimized.
183 
184 The transformation matrix to go from the pattern space to the feature
185 space consists of the ordered eigenvectors \f$\mathbf{e}_1,\ldots,\mathbf{e}_P\f$
186 \f$\mathbf{e}_0,\ldots,\mathbf{e}_{P-1}\f$ for its columns
187 \f[
188  \mathsf{T} = \left[
189  \begin{array}{cccc}
190  \mathbf{e}_0 &
191  \mathbf{e}_1 &
192  \vdots &
193  \mathbf{e}_{P-1}
194  \end{array}\right]
195  = \left[
196  \begin{array}{cccc}
197  \mathbf{e}_{0_0} & \mathbf{e}_{1_0} & \cdots & \mathbf{e}_{{P-1}_0}\\
198  \mathbf{e}_{0_1} & \mathbf{e}_{1_1} & \cdots & \mathbf{e}_{{P-1}_1}\\
199  \vdots & \vdots & \ddots & \vdots \\
200  \mathbf{e}_{0_{P-1}} & \mathbf{e}_{1_{P-1}} & \cdots & \mathbf{e}_{{P-1}_{P-1}}\\
201  \end{array}\right]
202 \f]
203 This is an orthogonal transformation, or rotation, of the pattern
204 space and feature selection results in ignoring certain coordinates
205 in the transformed space.
206 
207 Christian Holm August 2000, CERN
208 */
209 
210 // $Id$
211 // $Date: 2006/05/24 14:55:26 $
212 // $Author: brun $
213 
214 #include "TPrincipal.h"
215 
216 #include "TVectorD.h"
217 #include "TMatrixD.h"
218 #include "TMatrixDSymEigen.h"
219 #include "TMath.h"
220 #include "TList.h"
221 #include "TH2.h"
222 #include "TDatime.h"
223 #include "TBrowser.h"
224 #include "TROOT.h"
225 #include "Riostream.h"
226 
227 
229 
230 ////////////////////////////////////////////////////////////////////////////////
231 /// Empty CTOR, Do not use.
232 
234  : fMeanValues(0),
235  fSigmas(0),
236  fCovarianceMatrix(1,1),
237  fEigenVectors(1,1),
238  fEigenValues(0),
239  fOffDiagonal(0),
240  fStoreData(kFALSE)
241 {
242  fTrace = 0;
243  fHistograms = 0;
246  fNumberOfVariables = 0;
247 }
248 
249 ////////////////////////////////////////////////////////////////////////////////
250 /// Ctor. Argument is number of variables in the sample of data
251 /// Options are:
252 /// N Normalize the covariance matrix (default)
253 /// D Store input data (default)
254 ///
255 /// The created object is named "principal" by default.
256 
258  : fMeanValues(nVariables),
259  fSigmas(nVariables),
260  fCovarianceMatrix(nVariables,nVariables),
261  fEigenVectors(nVariables,nVariables),
262  fEigenValues(nVariables),
263  fOffDiagonal(nVariables),
264  fStoreData(kFALSE)
265 {
266  if (nVariables <= 1) {
267  Error("TPrincipal", "You can't be serious - nVariables == 1!!!");
268  return;
269  }
270 
271  SetName("principal");
272 
273  fTrace = 0;
274  fHistograms = 0;
277  fNumberOfVariables = nVariables;
278  while (strlen(opt) > 0) {
279  switch(*opt++) {
280  case 'N':
281  case 'n':
283  break;
284  case 'D':
285  case 'd':
286  fStoreData = kTRUE;
287  break;
288  default:
289  break;
290  }
291  }
292 
293  if (!fMeanValues.IsValid())
294  Error("TPrincipal","Couldn't create vector mean values");
295  if (!fSigmas.IsValid())
296  Error("TPrincipal","Couldn't create vector sigmas");
297  if (!fCovarianceMatrix.IsValid())
298  Error("TPrincipal","Couldn't create covariance matrix");
299  if (!fEigenVectors.IsValid())
300  Error("TPrincipal","Couldn't create eigenvector matrix");
301  if (!fEigenValues.IsValid())
302  Error("TPrincipal","Couldn't create eigenvalue vector");
303  if (!fOffDiagonal.IsValid())
304  Error("TPrincipal","Couldn't create offdiagonal vector");
305  if (fStoreData) {
306  fUserData.ResizeTo(nVariables*1000);
307  fUserData.Zero();
308  if (!fUserData.IsValid())
309  Error("TPrincipal","Couldn't create user data vector");
310  }
311 }
312 
313 ////////////////////////////////////////////////////////////////////////////////
314 ///copy constructor
315 
317  TNamed(pr),
318  fNumberOfDataPoints(pr.fNumberOfDataPoints),
319  fNumberOfVariables(pr.fNumberOfVariables),
320  fMeanValues(pr.fMeanValues),
321  fSigmas(pr.fSigmas),
322  fCovarianceMatrix(pr.fCovarianceMatrix),
323  fEigenVectors(pr.fEigenVectors),
324  fEigenValues(pr.fEigenValues),
325  fOffDiagonal(pr.fOffDiagonal),
326  fUserData(pr.fUserData),
327  fTrace(pr.fTrace),
328  fHistograms(pr.fHistograms),
329  fIsNormalised(pr.fIsNormalised),
330  fStoreData(pr.fStoreData)
331 {
332 }
333 
334 ////////////////////////////////////////////////////////////////////////////////
335 ///assignement operator
336 
338 {
339  if(this!=&pr) {
340  TNamed::operator=(pr);
344  fSigmas=pr.fSigmas;
349  fUserData=pr.fUserData;
350  fTrace=pr.fTrace;
354  }
355  return *this;
356 }
357 
358 ////////////////////////////////////////////////////////////////////////////////
359 /// destructor
360 
362 {
363  if (fHistograms) {
364  fHistograms->Delete();
365  delete fHistograms;
366  }
367 }
368 
369 ////////////////////////////////////////////////////////////////////////////////
370 /// Begin_Html
371 
373 {
374  /*
375  </PRE>
376 Add a data point and update the covariance matrix. The input
377 array must be <TT>fNumberOfVariables</TT> long.
378 
379 <P>
380 The Covariance matrix and mean values of the input data is caculated
381 on the fly by the following equations:
382 <BR><P></P>
383 <DIV ALIGN="CENTER">
384 
385 <!-- MATH
386  \begin{displaymath}
387 \left<x_i\right>^{(0)} = x_{i0}
388 \end{displaymath}
389  -->
390 
391 
392 <IMG
393  WIDTH="90" HEIGHT="31" BORDER="0"
394  SRC="gif/principal_img36.gif"
395  ALT="\begin{displaymath}
396 \left&lt;x_i\right&gt;^{(0)} = x_{i0}
397 \end{displaymath}">
398 </DIV>
399 <BR CLEAR="ALL">
400 <P></P>
401 <BR><P></P>
402 <DIV ALIGN="CENTER">
403 
404 <!-- MATH
405  \begin{displaymath}
406 \left<x_i\right>^{(n)} = \left<x_i\right>^{(n-1)}
407 + \frac1n \left(x_{in} - \left<x_i\right>^{(n-1)}\right)
408 \end{displaymath}
409  -->
410 
411 
412 <IMG
413  WIDTH="302" HEIGHT="42" BORDER="0"
414  SRC="gif/principal_img37.gif"
415  ALT="\begin{displaymath}
416 \left&lt;x_i\right&gt;^{(n)} = \left&lt;x_i\right&gt;^{(n-1)}
417 + \frac1n \left(x_{in} - \left&lt;x_i\right&gt;^{(n-1)}\right)
418 \end{displaymath}">
419 </DIV>
420 <BR CLEAR="ALL">
421 <P></P>
422 <BR><P></P>
423 <DIV ALIGN="CENTER">
424 
425 <!-- MATH
426  \begin{displaymath}
427 C_{ij}^{(0)} = 0
428 \end{displaymath}
429  -->
430 
431 
432 <IMG
433  WIDTH="62" HEIGHT="34" BORDER="0"
434  SRC="gif/principal_img38.gif"
435  ALT="\begin{displaymath}
436 C_{ij}^{(0)} = 0
437 \end{displaymath}">
438 </DIV>
439 <BR CLEAR="ALL">
440 <P></P>
441 <BR><P></P>
442 <DIV ALIGN="CENTER">
443 
444 <!-- MATH
445  \begin{displaymath}
446 C_{ij}^{(n)} = C_{ij}^{(n-1)}
447 + \frac1{n-1}\left[\left(x_{in} - \left<x_i\right>^{(n)}\right)
448  \left(x_{jn} - \left<x_j\right>^{(n)}\right)\right]
449 - \frac1n C_{ij}^{(n-1)}
450 \end{displaymath}
451  -->
452 
453 
454 <IMG
455  WIDTH="504" HEIGHT="43" BORDER="0"
456  SRC="gif/principal_img39.gif"
457  ALT="\begin{displaymath}
458 C_{ij}^{(n)} = C_{ij}^{(n-1)}
459 + \frac1{n-1}\left[\left(x_{i...
460 ...\left&lt;x_j\right&gt;^{(n)}\right)\right]
461 - \frac1n C_{ij}^{(n-1)}
462 \end{displaymath}">
463 </DIV>
464 <BR CLEAR="ALL">
465 <P></P>
466 since this is a really fast method, with no rounding errors (please
467 refer to CERN 72-21 pp. 54-106).
468 
469 <P>
470 The data is stored internally in a <TT>TVectorD</TT>, in the following
471 way:
472 <BR><P></P>
473 <DIV ALIGN="CENTER">
474 
475 <!-- MATH
476  \begin{displaymath}
477 \mathbf{x} = \left[\left(x_{0_0},\ldots,x_{{P-1}_0}\right),\ldots,
478  \left(x_{0_i},\ldots,x_{{P-1}_i}\right), \ldots\right]
479 \end{displaymath}
480  -->
481 
482 
483 <IMG
484  WIDTH="319" HEIGHT="31" BORDER="0"
485  SRC="gif/principal_img40.gif"
486  ALT="\begin{displaymath}
487 \mathbf{x} = \left[\left(x_{0_0},\ldots,x_{{P-1}_0}\right),\ldots,
488 \left(x_{0_i},\ldots,x_{{P-1}_i}\right), \ldots\right]
489 \end{displaymath}">
490 </DIV>
491 <BR CLEAR="ALL">
492 <P></P>
493 With <IMG
494  WIDTH="18" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
495  SRC="gif/principal_img6.gif"
496  ALT="$P$"> as defined in the class description.
497  <PRE>
498  */
499  // End_Html
500  if (!p)
501  return;
502 
503  // Increment the data point counter
504  Int_t i,j;
505  if (++fNumberOfDataPoints == 1) {
506  for (i = 0; i < fNumberOfVariables; i++)
507  fMeanValues(i) = p[i];
508  }
509  else {
510 
511  Double_t cor = 1 - 1./Double_t(fNumberOfDataPoints);
512  for (i = 0; i < fNumberOfVariables; i++) {
513 
514  fMeanValues(i) *= cor;
516  Double_t t1 = (p[i] - fMeanValues(i)) / (fNumberOfDataPoints - 1);
517 
518  // Setting Matrix (lower triangle) elements
519  for (j = 0; j < i + 1; j++) {
520  fCovarianceMatrix(i,j) *= cor;
521  fCovarianceMatrix(i,j) += t1 * (p[j] - fMeanValues(j));
522  }
523  }
524  }
525 
526  // Store data point in internal vector
527  // If the vector isn't big enough to hold the new data, then
528  // expand the vector by half it's size.
529  if (!fStoreData)
530  return;
531  Int_t size = fUserData.GetNrows();
533  fUserData.ResizeTo(size + size/2);
534 
535  for (i = 0; i < fNumberOfVariables; i++) {
536  j = (fNumberOfDataPoints-1) * fNumberOfVariables + i;
537  fUserData(j) = p[i];
538  }
539 
540 }
541 
542 ////////////////////////////////////////////////////////////////////////////////
543 /// Browse the TPrincipal object in the TBrowser.
544 
546 {
547  if (fHistograms) {
549  TH1* h = 0;
550  while ((h = (TH1*)next()))
551  b->Add(h,h->GetName());
552  }
553 
554  if (fStoreData)
555  b->Add(&fUserData,"User Data");
556  b->Add(&fCovarianceMatrix,"Covariance Matrix");
557  b->Add(&fMeanValues,"Mean value vector");
558  b->Add(&fSigmas,"Sigma value vector");
559  b->Add(&fEigenValues,"Eigenvalue vector");
560  b->Add(&fEigenVectors,"Eigenvector Matrix");
561 
562 }
563 
564 ////////////////////////////////////////////////////////////////////////////////
565 /// Clear the data in Object. Notice, that's not possible to change
566 /// the dimension of the original data.
567 
569 {
570  if (fHistograms) {
571  fHistograms->Delete(opt);
572  }
573 
575  fTrace = 0;
578  fEigenValues.Zero();
579  fMeanValues.Zero();
580  fSigmas.Zero();
581  fOffDiagonal.Zero();
582 
583  if (fStoreData) {
585  fUserData.Zero();
586  }
587 }
588 
589 ////////////////////////////////////////////////////////////////////////////////
590 /// Return a row of the user supplied data.
591 /// If row is out of bounds, 0 is returned.
592 /// It's up to the user to delete the returned array.
593 /// Row 0 is the first row;
594 
596 {
597  if (row >= fNumberOfDataPoints)
598  return 0;
599 
600  if (!fStoreData)
601  return 0;
602 
603  Int_t index = row * fNumberOfVariables;
604  return &fUserData(index);
605 }
606 
607 
608 ////////////////////////////////////////////////////////////////////////////////
609 /// Generates the file <filename>, with .C appended if it does
610 /// argument doesn't end in .cxx or .C.
611 ///
612 /// The file contains the implementation of two functions
613 ///
614 /// void X2P(Double_t *x, Double *p)
615 /// void P2X(Double_t *p, Double *x, Int_t nTest)
616 ///
617 /// which does the same as TPrincipal::X2P and TPrincipal::P2X
618 /// respectively. Please refer to these methods.
619 ///
620 /// Further, the static variables:
621 ///
622 /// Int_t gNVariables
623 /// Double_t gEigenValues[]
624 /// Double_t gEigenVectors[]
625 /// Double_t gMeanValues[]
626 /// Double_t gSigmaValues[]
627 ///
628 /// are initialized. The only ROOT header file needed is Rtypes.h
629 ///
630 /// See TPrincipal::MakeRealCode for a list of options
631 
632 void TPrincipal::MakeCode(const char *filename, Option_t *opt)
633 {
634  TString outName(filename);
635  if (!outName.EndsWith(".C") && !outName.EndsWith(".cxx"))
636  outName += ".C";
637 
638  MakeRealCode(outName.Data(),"",opt);
639 }
640 
641 ////////////////////////////////////////////////////////////////////////////////
642 /// Make histograms of the result of the analysis.
643 /// The option string say which histograms to create
644 /// X Histogram original data
645 /// P Histogram principal components corresponding to
646 /// original data
647 /// D Histogram the difference between the original data
648 /// and the projection of principal unto a lower
649 /// dimensional subspace (2D histograms)
650 /// E Histogram the eigenvalues
651 /// S Histogram the square of the residues
652 /// (see TPrincipal::SumOfSquareResidues)
653 /// The histograms will be named <name>_<type><number>, where <name>
654 /// is the first argument, <type> is one of X,P,D,E,S, and <number>
655 /// is the variable.
656 
657 void TPrincipal::MakeHistograms(const char *name, Option_t *opt)
658 {
659  Bool_t makeX = kFALSE;
660  Bool_t makeD = kFALSE;
661  Bool_t makeP = kFALSE;
662  Bool_t makeE = kFALSE;
663  Bool_t makeS = kFALSE;
664 
665  Int_t len = strlen(opt);
666  Int_t i,j,k;
667  for (i = 0; i < len; i++) {
668  switch (opt[i]) {
669  case 'X':
670  case 'x':
671  if (fStoreData)
672  makeX = kTRUE;
673  break;
674  case 'd':
675  case 'D':
676  if (fStoreData)
677  makeD = kTRUE;
678  break;
679  case 'P':
680  case 'p':
681  if (fStoreData)
682  makeP = kTRUE;
683  break;
684  case 'E':
685  case 'e':
686  makeE = kTRUE;
687  break;
688  case 's':
689  case 'S':
690  if (fStoreData)
691  makeS = kTRUE;
692  break;
693  default:
694  Warning("MakeHistograms","Unknown option: %c",opt[i]);
695  }
696  }
697 
698  // If no option was given, then exit gracefully
699  if (!makeX && !makeD && !makeP && !makeE && !makeS)
700  return;
701 
702  // If the list of histograms doesn't exist, create it.
703  if (!fHistograms)
704  fHistograms = new TList;
705 
706  // Don't create the histograms if they are already in the TList.
707  if (makeX && fHistograms->FindObject(Form("%s_x000",name)))
708  makeX = kFALSE;
709  if (makeD && fHistograms->FindObject(Form("%s_d000",name)))
710  makeD = kFALSE;
711  if (makeP && fHistograms->FindObject(Form("%s_p000",name)))
712  makeP = kFALSE;
713  if (makeE && fHistograms->FindObject(Form("%s_e",name)))
714  makeE = kFALSE;
715  if (makeS && fHistograms->FindObject(Form("%s_s",name)))
716  makeS = kFALSE;
717 
718  TH1F **hX = 0;
719  TH2F **hD = 0;
720  TH1F **hP = 0;
721  TH1F *hE = 0;
722  TH1F *hS = 0;
723 
724  // Initialize the arrays of histograms needed
725  if (makeX)
726  hX = new TH1F * [fNumberOfVariables];
727 
728  if (makeD)
729  hD = new TH2F * [fNumberOfVariables];
730 
731  if (makeP)
732  hP = new TH1F * [fNumberOfVariables];
733 
734  if (makeE){
735  hE = new TH1F(Form("%s_e",name), "Eigenvalues of Covariance matrix",
737  hE->SetXTitle("Eigenvalue");
738  fHistograms->Add(hE);
739  }
740 
741  if (makeS) {
742  hS = new TH1F(Form("%s_s",name),"E_{N}",
744  hS->SetXTitle("N");
745  hS->SetYTitle("#sum_{i=1}^{M} (x_{i} - x'_{N,i})^{2}");
746  fHistograms->Add(hS);
747  }
748 
749  // Initialize sub elements of the histogram arrays
750  for (i = 0; i < fNumberOfVariables; i++) {
751  if (makeX) {
752  // We allow 4 sigma spread in the original data in our
753  // histogram.
754  Double_t xlowb = fMeanValues(i) - 4 * fSigmas(i);
755  Double_t xhighb = fMeanValues(i) + 4 * fSigmas(i);
756  Int_t xbins = fNumberOfDataPoints/100;
757  hX[i] = new TH1F(Form("%s_x%03d", name, i),
758  Form("Pattern space, variable %d", i),
759  xbins,xlowb,xhighb);
760  hX[i]->SetXTitle(Form("x_{%d}",i));
761  fHistograms->Add(hX[i]);
762  }
763 
764  if(makeD) {
765  // The upper limit below is arbitrary!!!
766  Double_t dlowb = 0;
767  Double_t dhighb = 20;
768  Int_t dbins = fNumberOfDataPoints/100;
769  hD[i] = new TH2F(Form("%s_d%03d", name, i),
770  Form("Distance from pattern to "
771  "feature space, variable %d", i),
772  dbins,dlowb,dhighb,
773  fNumberOfVariables-1,
774  1,
775  fNumberOfVariables);
776  hD[i]->SetXTitle(Form("|x_{%d} - x'_{%d,N}|/#sigma_{%d}",i,i,i));
777  hD[i]->SetYTitle("N");
778  fHistograms->Add(hD[i]);
779  }
780 
781  if(makeP) {
782  // For some reason, the trace of the none-scaled matrix
783  // (see TPrincipal::MakeNormalised) should enter here. Taken
784  // from LINTRA code.
786  Double_t plowb = -10 * TMath::Sqrt(et);
787  Double_t phighb = -plowb;
788  Int_t pbins = 100;
789  hP[i] = new TH1F(Form("%s_p%03d", name, i),
790  Form("Feature space, variable %d", i),
791  pbins,plowb,phighb);
792  hP[i]->SetXTitle(Form("p_{%d}",i));
793  fHistograms->Add(hP[i]);
794  }
795 
796  if (makeE)
797  // The Eigenvector histogram is easy
798  hE->Fill(i,fEigenValues(i));
799 
800  }
801  if (!makeX && !makeP && !makeD && !makeS)
802  return;
803 
804  Double_t *x = 0;
807  for (i = 0; i < fNumberOfDataPoints; i++) {
808 
809  // Zero arrays
810  for (j = 0; j < fNumberOfVariables; j++)
811  p[j] = d[j] = 0;
812 
813  // update the original data histogram
814  x = (Double_t*)(GetRow(i));
815  R__ASSERT(x);
816 
817  if (makeP||makeD||makeS)
818  // calculate the corresponding principal component
819  X2P(x,p);
820 
821  if (makeD || makeS) {
822  // Calculate the difference between the original data, and the
823  // same project onto principal components, and then to a lower
824  // dimensional sub-space
825  for (j = fNumberOfVariables; j > 0; j--) {
826  P2X(p,d,j);
827 
828  for (k = 0; k < fNumberOfVariables; k++) {
829  // We use the absolute value of the difference!
830  d[k] = x[k] - d[k];
831 
832  if (makeS)
833  hS->Fill(j,d[k]*d[k]);
834 
835  if (makeD) {
836  d[k] = TMath::Abs(d[k]) / (fIsNormalised ? fSigmas(k) : 1);
837  (hD[k])->Fill(d[k],j);
838  }
839  }
840  }
841  }
842 
843  if (makeX||makeP) {
844  // If we are asked to make any of these histograms, we have to
845  // go here
846  for (j = 0; j < fNumberOfVariables; j++) {
847  if (makeX)
848  (hX[j])->Fill(x[j]);
849 
850  if (makeP)
851  (hP[j])->Fill(p[j]);
852  }
853  }
854  }
855  // Clean up
856  if (hX)
857  delete [] hX;
858  if (hD)
859  delete [] hD;
860  if (hP)
861  delete [] hP;
862  if (d)
863  delete [] d;
864  if (p)
865  delete [] p;
866 
867  // Normalize the residues
868  if (makeS)
869  hS->Scale(Double_t(1.)/fNumberOfDataPoints);
870 }
871 
872 ////////////////////////////////////////////////////////////////////////////////
873 /// PRIVATE METHOD: Normalize the covariance matrix
874 
876 {
877  Int_t i,j;
878  for (i = 0; i < fNumberOfVariables; i++) {
880  if (fIsNormalised)
881  for (j = 0; j <= i; j++)
882  fCovarianceMatrix(i,j) /= (fSigmas(i) * fSigmas(j));
883 
884  fTrace += fCovarianceMatrix(i,i);
885  }
886 
887  // Fill remaining parts of matrix, and scale.
888  for (i = 0; i < fNumberOfVariables; i++)
889  for (j = 0; j <= i; j++) {
890  fCovarianceMatrix(i,j) /= fTrace;
892  }
893 
894 }
895 
896 ////////////////////////////////////////////////////////////////////////////////
897 /// Generate the file <classname>PCA.cxx which contains the
898 /// implementation of two methods:
899 ///
900 /// void <classname>::X2P(Double_t *x, Double *p)
901 /// void <classname>::P2X(Double_t *p, Double *x, Int_t nTest)
902 ///
903 /// which does the same as TPrincipal::X2P and TPrincipal::P2X
904 /// respectivly. Please refer to these methods.
905 ///
906 /// Further, the public static members:
907 ///
908 /// Int_t <classname>::fgNVariables
909 /// Double_t <classname>::fgEigenValues[]
910 /// Double_t <classname>::fgEigenVectors[]
911 /// Double_t <classname>::fgMeanValues[]
912 /// Double_t <classname>::fgSigmaValues[]
913 ///
914 /// are initialized, and assumed to exist. The class declaration is
915 /// assumed to be in <classname>.h and assumed to be provided by the
916 /// user.
917 ///
918 /// See TPrincipal::MakeRealCode for a list of options
919 ///
920 /// The minimal class definition is:
921 ///
922 /// class <classname> {
923 /// public:
924 /// static Int_t fgNVariables;
925 /// static Double_t fgEigenVectors[];
926 /// static Double_t fgEigenValues[];
927 /// static Double_t fgMeanValues[];
928 /// static Double_t fgSigmaValues[];
929 ///
930 /// void X2P(Double_t *x, Double_t *p);
931 /// void P2X(Double_t *p, Double_t *x, Int_t nTest);
932 /// };
933 ///
934 /// Whether the methods <classname>::X2P and <classname>::P2X should
935 /// be static or not, is up to the user.
936 
937 void TPrincipal::MakeMethods(const char *classname, Option_t *opt)
938 {
939 
940  MakeRealCode(Form("%sPCA.cxx", classname), classname, opt);
941 }
942 
943 
944 ////////////////////////////////////////////////////////////////////////////////
945 /// Perform the principal components analysis.
946 /// This is done in several stages in the TMatrix::EigenVectors method:
947 /// * Transform the covariance matrix into a tridiagonal matrix.
948 /// * Find the eigenvalues and vectors of the tridiagonal matrix.
949 
951 {
952  // Normalize covariance matrix
953  MakeNormalised();
954 
956  TMatrixDSymEigen eigen(sym);
957  fEigenVectors = eigen.GetEigenVectors();
958  fEigenValues = eigen.GetEigenValues();
959  //make sure that eigenvalues are positive
960  for (Int_t i = 0; i < fNumberOfVariables; i++) {
961  if (fEigenValues[i] < 0) fEigenValues[i] = -fEigenValues[i];
962  }
963 }
964 
965 ////////////////////////////////////////////////////////////////////////////////
966 /// PRIVATE METHOD:
967 /// This is the method that actually generates the code for the
968 /// transformations to and from feature space and pattern space
969 /// It's called by TPrincipal::MakeCode and TPrincipal::MakeMethods.
970 ///
971 /// The options are: NONE so far
972 
973 void TPrincipal::MakeRealCode(const char *filename, const char *classname,
974  Option_t *)
975 {
976  Bool_t isMethod = (classname[0] == '\0' ? kFALSE : kTRUE);
977  const char *prefix = (isMethod ? Form("%s::", classname) : "");
978  const char *cv_qual = (isMethod ? "" : "static ");
979 
980  std::ofstream outFile(filename,std::ios::out|std::ios::trunc);
981  if (!outFile) {
982  Error("MakeRealCode","couldn't open output file '%s'",filename);
983  return;
984  }
985 
986  std::cout << "Writing on file \"" << filename << "\" ... " << std::flush;
987  //
988  // Write header of file
989  //
990  // Emacs mode line ;-)
991  outFile << "// -*- mode: c++ -*-" << std::endl;
992  // Info about creator
993  outFile << "// " << std::endl
994  << "// File " << filename
995  << " generated by TPrincipal::MakeCode" << std::endl;
996  // Time stamp
997  TDatime date;
998  outFile << "// on " << date.AsString() << std::endl;
999  // ROOT version info
1000  outFile << "// ROOT version " << gROOT->GetVersion()
1001  << std::endl << "//" << std::endl;
1002  // General information on the code
1003  outFile << "// This file contains the functions " << std::endl
1004  << "//" << std::endl
1005  << "// void " << prefix
1006  << "X2P(Double_t *x, Double_t *p); " << std::endl
1007  << "// void " << prefix
1008  << "P2X(Double_t *p, Double_t *x, Int_t nTest);"
1009  << std::endl << "//" << std::endl
1010  << "// The first for transforming original data x in " << std::endl
1011  << "// pattern space, to principal components p in " << std::endl
1012  << "// feature space. The second function is for the" << std::endl
1013  << "// inverse transformation, but using only nTest" << std::endl
1014  << "// of the principal components in the expansion" << std::endl
1015  << "// " << std::endl
1016  << "// See TPrincipal class documentation for more "
1017  << "information " << std::endl << "// " << std::endl;
1018  // Header files
1019  outFile << "#ifndef __CINT__" << std::endl;
1020  if (isMethod)
1021  // If these are methods, we need the class header
1022  outFile << "#include \"" << classname << ".h\"" << std::endl;
1023  else
1024  // otherwise, we need the typedefs of Int_t and Double_t
1025  outFile << "#include <Rtypes.h> // needed for Double_t etc" << std::endl;
1026  // Finish the preprocessor block
1027  outFile << "#endif" << std::endl << std::endl;
1028 
1029  //
1030  // Now for the data
1031  //
1032  // We make the Eigenvector matrix, Eigenvalue vector, Sigma vector,
1033  // and Mean value vector static, since all are needed in both
1034  // functions. Also ,the number of variables are stored in a static
1035  // variable.
1036  outFile << "//" << std::endl
1037  << "// Static data variables" << std::endl
1038  << "//" << std::endl;
1039  outFile << cv_qual << "Int_t " << prefix << "gNVariables = "
1040  << fNumberOfVariables << ";" << std::endl;
1041 
1042  // Assign the values to the Eigenvector matrix. The elements are
1043  // stored row-wise, that is element
1044  // M[i][j] = e[i * nVariables + j]
1045  // where i and j are zero-based.
1046  outFile << std::endl << "// Assignment of eigenvector matrix." << std::endl
1047  << "// Elements are stored row-wise, that is" << std::endl
1048  << "// M[i][j] = e[i * nVariables + j] " << std::endl
1049  << "// where i and j are zero-based" << std::endl;
1050  outFile << cv_qual << "Double_t " << prefix
1051  << "gEigenVectors[] = {" << std::flush;
1052  Int_t i,j;
1053  for (i = 0; i < fNumberOfVariables; i++) {
1054  for (j = 0; j < fNumberOfVariables; j++) {
1055  Int_t index = i * fNumberOfVariables + j;
1056  outFile << (index != 0 ? "," : "" ) << std::endl
1057  << " " << fEigenVectors(i,j) << std::flush;
1058  }
1059  }
1060  outFile << "};" << std::endl << std::endl;
1061 
1062  // Assignment to eigenvalue vector. Zero-based.
1063  outFile << "// Assignment to eigen value vector. Zero-based." << std::endl;
1064  outFile << cv_qual << "Double_t " << prefix
1065  << "gEigenValues[] = {" << std::flush;
1066  for (i = 0; i < fNumberOfVariables; i++)
1067  outFile << (i != 0 ? "," : "") << std::endl
1068  << " " << fEigenValues(i) << std::flush;
1069  outFile << std::endl << "};" << std::endl << std::endl;
1070 
1071  // Assignment to mean Values vector. Zero-based.
1072  outFile << "// Assignment to mean value vector. Zero-based." << std::endl;
1073  outFile << cv_qual << "Double_t " << prefix
1074  << "gMeanValues[] = {" << std::flush;
1075  for (i = 0; i < fNumberOfVariables; i++)
1076  outFile << (i != 0 ? "," : "") << std::endl
1077  << " " << fMeanValues(i) << std::flush;
1078  outFile << std::endl << "};" << std::endl << std::endl;
1079 
1080  // Assignment to mean Values vector. Zero-based.
1081  outFile << "// Assignment to sigma value vector. Zero-based." << std::endl;
1082  outFile << cv_qual << "Double_t " << prefix
1083  << "gSigmaValues[] = {" << std::flush;
1084  for (i = 0; i < fNumberOfVariables; i++)
1085  outFile << (i != 0 ? "," : "") << std::endl
1086  << " " << (fIsNormalised ? fSigmas(i) : 1) << std::flush;
1087  // << " " << fSigmas(i) << std::flush;
1088  outFile << std::endl << "};" << std::endl << std::endl;
1089 
1090  //
1091  // Finally we reach the functions themselves
1092  //
1093  // First: void x2p(Double_t *x, Double_t *p);
1094  //
1095  outFile << "// " << std::endl
1096  << "// The "
1097  << (isMethod ? "method " : "function ")
1098  << " void " << prefix
1099  << "X2P(Double_t *x, Double_t *p)"
1100  << std::endl << "// " << std::endl;
1101  outFile << "void " << prefix
1102  << "X2P(Double_t *x, Double_t *p) {" << std::endl
1103  << " for (Int_t i = 0; i < gNVariables; i++) {" << std::endl
1104  << " p[i] = 0;" << std::endl
1105  << " for (Int_t j = 0; j < gNVariables; j++)" << std::endl
1106  << " p[i] += (x[j] - gMeanValues[j]) " << std::endl
1107  << " * gEigenVectors[j * gNVariables + i] "
1108  << "/ gSigmaValues[j];" << std::endl << std::endl << " }"
1109  << std::endl << "}" << std::endl << std::endl;
1110  //
1111  // Now: void p2x(Double_t *p, Double_t *x, Int_t nTest);
1112  //
1113  outFile << "// " << std::endl << "// The "
1114  << (isMethod ? "method " : "function ")
1115  << " void " << prefix
1116  << "P2X(Double_t *p, Double_t *x, Int_t nTest)"
1117  << std::endl << "// " << std::endl;
1118  outFile << "void " << prefix
1119  << "P2X(Double_t *p, Double_t *x, Int_t nTest) {" << std::endl
1120  << " for (Int_t i = 0; i < gNVariables; i++) {" << std::endl
1121  << " x[i] = gMeanValues[i];" << std::endl
1122  << " for (Int_t j = 0; j < nTest; j++)" << std::endl
1123  << " x[i] += p[j] * gSigmaValues[i] " << std::endl
1124  << " * gEigenVectors[i * gNVariables + j];" << std::endl
1125  << " }" << std::endl << "}" << std::endl << std::endl;
1126 
1127  // EOF
1128  outFile << "// EOF for " << filename << std::endl;
1129 
1130  // Close the file
1131  outFile.close();
1132 
1133  std::cout << "done" << std::endl;
1134 }
1135 
1136 ////////////////////////////////////////////////////////////////////////////////
1137 /// Calculate x as a function of nTest of the most significant
1138 /// principal components p, and return it in x.
1139 /// It's the users responsibility to make sure that both x and p are
1140 /// of the right size (i.e., memory must be allocated for x).
1141 
1142 void TPrincipal::P2X(const Double_t *p, Double_t *x, Int_t nTest)
1143 {
1144  for (Int_t i = 0; i < fNumberOfVariables; i++){
1145  x[i] = fMeanValues(i);
1146  for (Int_t j = 0; j < nTest; j++)
1147  x[i] += p[j] * (fIsNormalised ? fSigmas(i) : 1)
1148  * fEigenVectors(i,j);
1149  }
1150 
1151 }
1152 
1153 ////////////////////////////////////////////////////////////////////////////////
1154 /// Print the statistics
1155 /// Options are
1156 /// M Print mean values of original data
1157 /// S Print sigma values of original data
1158 /// E Print eigenvalues of covariance matrix
1159 /// V Print eigenvectors of covariance matrix
1160 /// Default is MSE
1161 
1162 void TPrincipal::Print(Option_t *opt) const
1163 {
1164  Bool_t printV = kFALSE;
1165  Bool_t printM = kFALSE;
1166  Bool_t printS = kFALSE;
1167  Bool_t printE = kFALSE;
1168 
1169  Int_t len = strlen(opt);
1170  for (Int_t i = 0; i < len; i++) {
1171  switch (opt[i]) {
1172  case 'V':
1173  case 'v':
1174  printV = kTRUE;
1175  break;
1176  case 'M':
1177  case 'm':
1178  printM = kTRUE;
1179  break;
1180  case 'S':
1181  case 's':
1182  printS = kTRUE;
1183  break;
1184  case 'E':
1185  case 'e':
1186  printE = kTRUE;
1187  break;
1188  default:
1189  Warning("Print", "Unknown option '%c'",opt[i]);
1190  break;
1191  }
1192  }
1193 
1194  if (printM||printS||printE) {
1195  std::cout << " Variable # " << std::flush;
1196  if (printM)
1197  std::cout << "| Mean Value " << std::flush;
1198  if (printS)
1199  std::cout << "| Sigma " << std::flush;
1200  if (printE)
1201  std::cout << "| Eigenvalue" << std::flush;
1202  std::cout << std::endl;
1203 
1204  std::cout << "-------------" << std::flush;
1205  if (printM)
1206  std::cout << "+------------" << std::flush;
1207  if (printS)
1208  std::cout << "+------------" << std::flush;
1209  if (printE)
1210  std::cout << "+------------" << std::flush;
1211  std::cout << std::endl;
1212 
1213  for (Int_t i = 0; i < fNumberOfVariables; i++) {
1214  std::cout << std::setw(12) << i << " " << std::flush;
1215  if (printM)
1216  std::cout << "| " << std::setw(10) << std::setprecision(4)
1217  << fMeanValues(i) << " " << std::flush;
1218  if (printS)
1219  std::cout << "| " << std::setw(10) << std::setprecision(4)
1220  << fSigmas(i) << " " << std::flush;
1221  if (printE)
1222  std::cout << "| " << std::setw(10) << std::setprecision(4)
1223  << fEigenValues(i) << " " << std::flush;
1224  std::cout << std::endl;
1225  }
1226  std::cout << std::endl;
1227  }
1228 
1229  if(printV) {
1230  for (Int_t i = 0; i < fNumberOfVariables; i++) {
1231  std::cout << "Eigenvector # " << i << std::flush;
1232  TVectorD v(fNumberOfVariables);
1234  v.Print();
1235  }
1236  }
1237 }
1238 
1239 ////////////////////////////////////////////////////////////////////////////////
1240 /// PRIVATE METHOD:
1241 /// Begin_html
1242 
1244 {
1245  /*
1246  </PRE>
1247  Calculates the sum of the square residuals, that is
1248  <BR><P></P>
1249  <DIV ALIGN="CENTER">
1250 
1251  <!-- MATH
1252  \begin{displaymath}
1253  E_N = \sum_{i=0}^{P-1} \left(x_i - x^\prime_i\right)^2
1254  \end{displaymath}
1255  -->
1256 
1257 
1258  <IMG
1259  WIDTH="147" HEIGHT="58" BORDER="0"
1260  SRC="gif/principal_img52.gif"
1261  ALT="\begin{displaymath}
1262  E_N = \sum_{i=0}^{P-1} \left(x_i - x^\prime_i\right)^2
1263  \end{displaymath}">
1264  </DIV>
1265  <BR CLEAR="ALL">
1266  <P></P>
1267  where
1268  <!-- MATH
1269  $x^\prime_i = \sum_{j=i}^N p_i e_{n_j}$
1270  -->
1271  <IMG
1272  WIDTH="122" HEIGHT="40" ALIGN="MIDDLE" BORDER="0"
1273  SRC="gif/principal_img53.gif"
1274  ALT="$x^\prime_i = \sum_{j=i}^N p_i e_{n_j}$">, <IMG
1275  WIDTH="19" HEIGHT="30" ALIGN="MIDDLE" BORDER="0"
1276  SRC="gif/principal_img54.gif"
1277  ALT="$p_i$"> is the
1278  <IMG
1279  WIDTH="28" HEIGHT="23" ALIGN="BOTTOM" BORDER="0"
1280  SRC="gif/principal_img55.gif"
1281  ALT="$i^{\mbox{th}}$"> component of the principal vector, corresponding to
1282  <IMG
1283  WIDTH="20" HEIGHT="30" ALIGN="MIDDLE" BORDER="0"
1284  SRC="gif/principal_img56.gif"
1285  ALT="$x_i$">, the original data; I.e., the square distance to the space
1286  spanned by <IMG
1287  WIDTH="20" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
1288  SRC="gif/principal_img12.gif"
1289  ALT="$N$"> eigenvectors.
1290  <BR>
1291  <PRE>
1292  */
1293  // End_Html
1294  if (!x)
1295  return;
1296 
1297  Double_t p[100];
1298  Double_t xp[100];
1299 
1300  X2P(x,p);
1301  for (Int_t i = fNumberOfVariables-1; i >= 0; i--) {
1302  P2X(p,xp,i);
1303  for (Int_t j = 0; j < fNumberOfVariables; j++) {
1304  s[i] += (x[j] - xp[j])*(x[j] - xp[j]);
1305  }
1306  }
1307 }
1308 
1309 ////////////////////////////////////////////////////////////////////////////////
1310 /// Test the PCA, bye calculating the sum square of residuals
1311 /// (see method SumOfSquareResiduals), and display the histogram
1312 
1314 {
1315  MakeHistograms("pca","S");
1316 
1317  if (!fStoreData)
1318  return;
1319 
1320  TH1 *pca_s = 0;
1321  if (fHistograms) pca_s = (TH1*)fHistograms->FindObject("pca_s");
1322  if (!pca_s) {
1323  Warning("Test", "Couldn't get histogram of square residuals");
1324  return;
1325  }
1326 
1327  pca_s->Draw();
1328 }
1329 
1330 ////////////////////////////////////////////////////////////////////////////////
1331 /// Calculate the principal components from the original data vector
1332 /// x, and return it in p.
1333 /// It's the users responsibility to make sure that both x and p are
1334 /// of the right size (i.e., memory must be allocated for p).
1335 
1337 {
1338  for (Int_t i = 0; i < fNumberOfVariables; i++){
1339  p[i] = 0;
1340  for (Int_t j = 0; j < fNumberOfVariables; j++)
1341  p[i] += (x[j] - fMeanValues(j)) * fEigenVectors(j,i) /
1342  (fIsNormalised ? fSigmas(j) : 1);
1343  }
1344 
1345 }
void Add(TObject *obj, const char *name=0, Int_t check=-1)
Add object with name to browser.
Definition: TBrowser.cxx:259
const TVectorD & GetEigenValues() const
virtual void Scale(Double_t c1=1, Option_t *option="")
Multiply this histogram by a constant c1.
Definition: TH1.cxx:6174
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3159
Principal Components Analysis (PCA)
Definition: TPrincipal.h:28
TVectorT< Element > & ResizeTo(Int_t lwb, Int_t upb)
Resize the vector to [lwb:upb] .
Definition: TVectorT.cxx:291
virtual void Browse(TBrowser *b)
Browse the TPrincipal object in the TBrowser.
Definition: TPrincipal.cxx:545
virtual void Delete(Option_t *option="")
Remove all objects from the list AND delete all heap based objects.
Definition: TList.cxx:404
Bool_t fIsNormalised
Definition: TPrincipal.h:49
TMatrixTSym< Element > & Use(Int_t row_lwb, Int_t row_upb, Element *data)
virtual ~TPrincipal()
destructor
Definition: TPrincipal.cxx:361
void MakeRealCode(const char *filename, const char *prefix, Option_t *option="")
PRIVATE METHOD: This is the method that actually generates the code for the transformations to and fr...
Definition: TPrincipal.cxx:973
TMatrixD fEigenVectors
Definition: TPrincipal.h:38
static Vc_ALWAYS_INLINE float_v trunc(float_v::AsArg v)
Definition: math.h:95
ClassImp(TSeqCollection) Int_t TSeqCollection TIter next(this)
Return index of object in collection.
TList * fHistograms
Definition: TPrincipal.h:47
virtual TMatrixTBase< Element > & Zero()
Set matrix elements to zero.
Bool_t IsValid() const
Definition: TVectorT.h:89
virtual void P2X(const Double_t *p, Double_t *x, Int_t nTest)
Calculate x as a function of nTest of the most significant principal components p, and return it in x.
THist< 2, float > TH2F
Definition: THist.h:321
const char Option_t
Definition: RtypesCore.h:62
Int_t GetNrows() const
Definition: TVectorT.h:81
virtual void SetName(const char *name)
Change (i.e.
Definition: TNamed.cxx:128
TH1 * h
Definition: legend2.C:5
virtual void Print(Option_t *opt="MSE") const
Print the statistics Options are M Print mean values of original data S Print sigma values of origina...
PyObject * fTrace
TVectorD fMeanValues
Definition: TPrincipal.h:34
THist< 1, float > TH1F
Definition: THist.h:315
static const char * filename()
#define R__ASSERT(e)
Definition: TError.h:98
#define gROOT
Definition: TROOT.h:340
Basic string class.
Definition: TString.h:137
1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:570
TMatrixTColumn_const< Double_t > TMatrixDColumn_const
virtual void SumOfSquareResiduals(const Double_t *x, Double_t *s)
PRIVATE METHOD: Begin_html.
int Int_t
Definition: RtypesCore.h:41
virtual void SetYTitle(const char *title)
Definition: TH1.h:409
bool Bool_t
Definition: RtypesCore.h:59
const Bool_t kFALSE
Definition: Rtypes.h:92
virtual TObject * FindObject(const char *name) const
Find an object in this list using its name.
Definition: TList.cxx:496
void MakeNormalised()
PRIVATE METHOD: Normalize the covariance matrix.
Definition: TPrincipal.cxx:875
void Test(Option_t *option="")
Test the PCA, bye calculating the sum square of residuals (see method SumOfSquareResiduals), and display the histogram.
TLatex * t1
Definition: textangle.C:20
Short_t Abs(Short_t d)
Definition: TMathBase.h:110
Int_t fNumberOfVariables
Definition: TPrincipal.h:32
Bool_t IsValid() const
Definition: TMatrixTBase.h:157
const char * Data() const
Definition: TString.h:349
Double_t x[n]
Definition: legend1.C:17
The TNamed class is the base class for all named ROOT classes.
Definition: TNamed.h:33
virtual void MakeMethods(const char *classname="PCA", Option_t *option="")
Generate the file PCA.cxx which contains the implementation of two methods: ...
Definition: TPrincipal.cxx:937
Double_t fTrace
Definition: TPrincipal.h:45
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition: TObject.cxx:918
virtual void MakeHistograms(const char *name="pca", Option_t *option="epsdx")
Make histograms of the result of the analysis.
Definition: TPrincipal.cxx:657
char * out
Definition: TBase64.cxx:29
virtual void X2P(const Double_t *x, Double_t *p)
Calculate the principal components from the original data vector x, and return it in p...
Int_t GetNrows() const
Definition: TMatrixTBase.h:134
A doubly linked list.
Definition: TList.h:47
void Print(Option_t *option="") const
Print the vector as a list of elements.
Definition: TVectorT.cxx:1360
Using a TBrowser one can browse all ROOT objects.
Definition: TBrowser.h:41
TNamed & operator=(const TNamed &rhs)
TNamed assignment operator.
Definition: TNamed.cxx:40
TVectorD fUserData
Definition: TPrincipal.h:43
Bool_t EndsWith(const char *pat, ECaseCompare cmp=kExact) const
Return true if string ends with the specified string.
Definition: TString.cxx:2220
virtual void AddRow(const Double_t *x)
Begin_Html.
Definition: TPrincipal.cxx:372
virtual void Draw(Option_t *option="")
Draw this histogram with options.
Definition: TH1.cxx:2878
SVector< double, 2 > v
Definition: Dict.h:5
2-D histogram with a float per channel (see TH1 documentation)}
Definition: TH2.h:256
TVectorT< Element > & Zero()
Set vector elements to zero.
Definition: TVectorT.cxx:450
TVectorD fEigenValues
Definition: TPrincipal.h:39
char * Form(const char *fmt,...)
virtual const char * GetName() const
Returns name of object.
Definition: TNamed.h:51
virtual const Element * GetMatrixArray() const
Definition: TMatrixT.h:211
virtual void MakePrincipals()
Perform the principal components analysis.
Definition: TPrincipal.cxx:950
Int_t fNumberOfDataPoints
Definition: TPrincipal.h:31
virtual void Clear(Option_t *option="")
Clear the data in Object.
Definition: TPrincipal.cxx:568
Bool_t fStoreData
Definition: TPrincipal.h:50
const Double_t * GetRow(Int_t row)
Return a row of the user supplied data.
Definition: TPrincipal.cxx:595
double Double_t
Definition: RtypesCore.h:55
const TMatrixD & GetEigenVectors() const
TPrincipal & operator=(const TPrincipal &)
assignement operator
Definition: TPrincipal.cxx:337
The TH1 histogram class.
Definition: TH1.h:80
#define name(a, b)
Definition: linkTestLib0.cpp:5
virtual void SetXTitle(const char *title)
Definition: TH1.h:408
TVectorD fOffDiagonal
Definition: TPrincipal.h:41
virtual void Add(TObject *obj)
Definition: TList.h:81
ClassImp(TPrincipal)
virtual void MakeCode(const char *filename="pca", Option_t *option="")
Generates the file , with .C appended if it does argument doesn't end in ...
Definition: TPrincipal.cxx:632
Double_t Sqrt(Double_t x)
Definition: TMath.h:464
const char * AsString() const
Return the date & time as a string (ctime() format).
Definition: TDatime.cxx:99
const Bool_t kTRUE
Definition: Rtypes.h:91
#define sym(otri1, otri2)
Definition: triangle.c:932
TMatrixD fCovarianceMatrix
Definition: TPrincipal.h:36
TVectorD fSigmas
Definition: TPrincipal.h:35
TPrincipal()
Empty CTOR, Do not use.
Definition: TPrincipal.cxx:233
This class stores the date and time with a precision of one second in an unsigned 32 bit word (950130...
Definition: TDatime.h:39
virtual void Warning(const char *method, const char *msgfmt,...) const
Issue warning message.
Definition: TObject.cxx:904