Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
TPrincipal.cxx
Go to the documentation of this file.
1// @(#)root/hist:$Id$
2// Author: Christian Holm Christensen 1/8/2000
3
4/*************************************************************************
5 * Copyright (C) 1995-2004, Rene Brun and Fons Rademakers. *
6 * All rights reserved. *
7 * *
8 * For the licensing terms see $ROOTSYS/LICENSE. *
9 * For the list of contributors see $ROOTSYS/README/CREDITS. *
10 *************************************************************************/
11
12/** \class TPrincipal
13 \ingroup Hist
14Principal Components Analysis (PCA)
15
16The current implementation is based on the LINTRA package from CERNLIB
17by R. Brun, H. Hansroul, and J. Kubler.
18The class has been implemented by Christian Holm Christensen in August 2000.
19
20## Introduction
21
22In many applications of various fields of research, the treatment of
23large amounts of data requires powerful techniques capable of rapid
24data reduction and analysis. Usually, the quantities most
25conveniently measured by the experimentalist, are not necessarily the
26most significant for classification and analysis of the data. It is
27then useful to have a way of selecting an optimal set of variables
28necessary for the recognition process and reducing the dimensionality
29of the problem, resulting in an easier classification procedure.
30
31This paper describes the implementation of one such method of
32feature selection, namely the principal components analysis. This
33multidimensional technique is well known in the field of pattern
34recognition and and its use in Particle Physics has been documented
35elsewhere (cf. H. Wind, <I>Function Parameterization</I>, CERN
3672-21).
37
38## Overview
39Suppose we have prototypes which are trajectories of particles,
40passing through a spectrometer. If one measures the passage of the
41particle at say 8 fixed planes, the trajectory is described by an
428-component vector:
43\f[
44 \mathbf{x} = \left(x_0, x_1, \ldots, x_7\right)
45\f]
46in 8-dimensional pattern space.
47
48One proceeds by generating a representative tracks sample and
49building up the covariance matrix \f$\mathsf{C}\f$. Its eigenvectors and
50eigenvalues are computed by standard methods, and thus a new basis is
51obtained for the original 8-dimensional space the expansion of the
52prototypes,
53\f[
54 \mathbf{x}_m = \sum^7_{i=0} a_{m_i} \mathbf{e}_i
55 \quad
56 \mbox{where}
57 \quad
58 a_{m_i} = \mathbf{x}^T\bullet\mathbf{e}_i
59\f]
60allows the study of the behavior of the coefficients \f$a_{m_i}\f$ for all
61the tracks of the sample. The eigenvectors which are insignificant for
62the trajectory description in the expansion will have their
63corresponding coefficients \f$a_{m_i}\f$ close to zero for all the
64prototypes.
65
66On one hand, a reduction of the dimensionality is then obtained by
67omitting these least significant vectors in the subsequent analysis.
68
69On the other hand, in the analysis of real data, these least
70significant variables(?) can be used for the pattern
71recognition problem of extracting the valid combinations of
72coordinates describing a true trajectory from the set of all possible
73wrong combinations.
74
75The program described here performs this principal components analysis
76on a sample of data provided by the user. It computes the covariance
77matrix, its eigenvalues ands corresponding eigenvectors and exhibits
78the behavior of the principal components \f$a_{m_i}\f$, thus providing
79to the user all the means of understanding their data.
80
81## Principal Components Method
82Let's consider a sample of \f$M\f$ prototypes each being characterized by
83\f$P\f$ variables \f$x_0, x_1, \ldots, x_{P-1}\f$. Each prototype is a point, or a
84column vector, in a \f$P\f$-dimensional *Pattern space*.
85\f[
86 \mathbf{x} = \left[\begin{array}{c}
87 x_0\\x_1\\\vdots\\x_{P-1}\end{array}\right]\,,
88\f]
89where each \f$x_n\f$ represents the particular value associated with the
90\f$n\f$-dimension.
91
92Those \f$P\f$ variables are the quantities accessible to the
93experimentalist, but are not necessarily the most significant for the
94classification purpose.
95
96The *Principal Components Method* consists of applying a
97*linear* transformation to the original variables. This
98transformation is described by an orthogonal matrix and is equivalent
99to a rotation of the original pattern space into a new set of
100coordinate vectors, which hopefully provide easier feature
101identification and dimensionality reduction.
102
103Let's define the covariance matrix:
104\f[
105 \mathsf{C} = \left\langle\mathbf{y}\mathbf{y}^T\right\rangle
106 \quad\mbox{where}\quad
107 \mathbf{y} = \mathbf{x} - \left\langle\mathbf{x}\right\rangle\,,
108\f]
109and the brackets indicate mean value over the sample of \f$M\f$
110prototypes.
111
112This matrix \f$\mathsf{C}\f$ is real, positive definite, symmetric, and will
113have all its eigenvalues greater then zero. It will now be show that
114among the family of all the complete orthonormal bases of the pattern
115space, the base formed by the eigenvectors of the covariance matrix
116and belonging to the largest eigenvalues, corresponds to the most
117significant features of the description of the original prototypes.
118
119let the prototypes be expanded on into a set of \f$N\f$ basis vectors
120\f$\mathbf{e}_n, n=0,\ldots,N,N+1, \ldots, P-1\f$
121\f[
122 \mathbf{y}_i = \sum^N_{i=0} a_{i_n} \mathbf{e}_n,
123 \quad
124 i = 1, \ldots, M,
125 \quad
126 N < P-1
127\f]
128The `best' feature coordinates \f$\mathbf{e}_n\f$, spanning a *feature
129space*, will be obtained by minimizing the error due to this
130truncated expansion, i.e.,
131\f[
132 \min\left(E_N\right) =
133 \min\left[\left\langle\left(\mathbf{y}_i - \sum^N_{i=0} a_{i_n} \mathbf{e}_n\right)^2\right\rangle\right]
134\f]
135with the conditions:
136\f[
137 \mathbf{e}_k\bullet\mathbf{e}_j = \delta_{jk} =
138 \left\{\begin{array}{rcl}
139 1 & \mbox{for} & k = j\\
140 0 & \mbox{for} & k \neq j
141 \end{array}\right.
142\f]
143Multiplying (3) by \f$\mathbf{e}^T_n\f$ using (5),
144we get
145\f[
146 a_{i_n} = \mathbf{y}_i^T\bullet\mathbf{e}_n\,,
147\f]
148so the error becomes
149\f{eqnarray*}{
150 E_N &=&
151 \left\langle\left[\sum_{n=N+1}^{P-1} a_{i_n}\mathbf{e}_n\right]^2\right\rangle\nonumber\\
152 &=&
153 \left\langle\left[\sum_{n=N+1}^{P-1} \mathbf{y}_i^T\bullet\mathbf{e}_n\mathbf{e}_n\right]^2\right\rangle\nonumber\\
154 &=&
155 \left\langle\sum_{n=N+1}^{P-1} \mathbf{e}_n^T\mathbf{y}_i\mathbf{y}_i^T\mathbf{e}_n\right\rangle\nonumber\\
156 &=&
157 \sum_{n=N+1}^{P-1} \mathbf{e}_n^T\mathsf{C}\mathbf{e}_n
158\f}
159The minimization of the sum in (7) is obtained when each
160term \f$\mathbf{e}_n^\mathsf{C}\mathbf{e}_n\f$ is minimum, since \f$\mathsf{C}\f$ is
161positive definite. By the method of Lagrange multipliers, and the
162condition (5), we get
163\f[
164 E_N = \sum^{P-1}_{n=N+1} \left(\mathbf{e}_n^T\mathsf{C}\mathbf{e}_n -
165 l_n\mathbf{e}_n^T\bullet\mathbf{e}_n + l_n\right)
166\f]
167The minimum condition \f$\frac{dE_N}{d\mathbf{e}^T_n} = 0\f$ leads to the
168equation
169\f[
170 \mathsf{C}\mathbf{e}_n = l_n\mathbf{e}_n\,,
171\f]
172which shows that \f$\mathbf{e}_n\f$ is an eigenvector of the covariance
173matrix \f$\mathsf{C}\f$ with eigenvalue \f$l_n\f$. The estimated minimum error is
174then given by
175\f[
176 E_N \sim \sum^{P-1}_{n=N+1} \mathbf{e}_n^T\bullet l_n\mathbf{e}_n
177 = \sum^{P-1}_{n=N+1} l_n\,,
178\f]
179where \f$l_n,\,n=N+1,\ldots,P\f$ \f$l_n,\,n=N+1,\ldots,P-1\f$ are the eigenvalues associated with the
180omitted eigenvectors in the expansion (3). Thus, by choosing
181the \f$N\f$ largest eigenvalues, and their associated eigenvectors, the
182error \f$E_N\f$ is minimized.
183
184The transformation matrix to go from the pattern space to the feature
185space consists of the ordered eigenvectors \f$\mathbf{e}_1,\ldots,\mathbf{e}_P\f$
186\f$\mathbf{e}_0,\ldots,\mathbf{e}_{P-1}\f$ for its columns
187\f[
188 \mathsf{T} = \left[
189 \begin{array}{cccc}
190 \mathbf{e}_0 &
191 \mathbf{e}_1 &
192 \vdots &
193 \mathbf{e}_{P-1}
194 \end{array}\right]
195 = \left[
196 \begin{array}{cccc}
197 \mathbf{e}_{0_0} & \mathbf{e}_{1_0} & \cdots & \mathbf{e}_{{P-1}_0}\\
198 \mathbf{e}_{0_1} & \mathbf{e}_{1_1} & \cdots & \mathbf{e}_{{P-1}_1}\\
199 \vdots & \vdots & \ddots & \vdots \\
200 \mathbf{e}_{0_{P-1}} & \mathbf{e}_{1_{P-1}} & \cdots & \mathbf{e}_{{P-1}_{P-1}}\\
201 \end{array}\right]
202\f]
203This is an orthogonal transformation, or rotation, of the pattern
204space and feature selection results in ignoring certain coordinates
205in the transformed space.
206
207Christian Holm August 2000, CERN
208*/
209
210#include "TPrincipal.h"
211
212#include "TVectorD.h"
213#include "TMatrixD.h"
214#include "TMatrixDSymEigen.h"
215#include "TMath.h"
216#include "TList.h"
217#include "TH2.h"
218#include "TDatime.h"
219#include "TBrowser.h"
220#include "TROOT.h"
221#include "Riostream.h"
222
223
225
226////////////////////////////////////////////////////////////////////////////////
227/// Empty constructor. Do not use.
228
230 : fMeanValues(0),
231 fSigmas(0),
232 fCovarianceMatrix(1,1),
233 fEigenVectors(1,1),
234 fEigenValues(0),
235 fOffDiagonal(0),
236 fStoreData(kFALSE)
237{
238 fTrace = 0;
239 fHistograms = nullptr;
243}
244
245////////////////////////////////////////////////////////////////////////////////
246/// Constructor. Argument is number of variables in the sample of data
247/// Options are:
248/// - N Normalize the covariance matrix (default)
249/// - D Store input data (default)
250///
251/// The created object is named "principal" by default.
252
254 : fMeanValues(nVariables),
255 fSigmas(nVariables),
256 fCovarianceMatrix(nVariables,nVariables),
257 fEigenVectors(nVariables,nVariables),
258 fEigenValues(nVariables),
259 fOffDiagonal(nVariables),
260 fStoreData(kFALSE)
261{
262 if (nVariables <= 1) {
263 Error("TPrincipal", "You can't be serious - nVariables <= 1!!!");
264 return;
265 }
266 if (nVariables > std::numeric_limits<Int_t>::max()) {
267 Error("TPrincipal", "`nVariables` input parameter %lld is larger than the allowed maximum %d", nVariables, std::numeric_limits<Int_t>::max());
268 return;
269 }
270
271 SetName("principal");
272
273 fTrace = 0;
274 fHistograms = nullptr;
277 fNumberOfVariables = nVariables;
278 while (opt && strlen(opt) > 0) {
279 switch(*opt++) {
280 case 'N':
281 case 'n':
283 break;
284 case 'D':
285 case 'd':
287 break;
288 default:
289 break;
290 }
291 }
292
293 if (!fMeanValues.IsValid())
294 Error("TPrincipal","Couldn't create vector mean values");
295 if (!fSigmas.IsValid())
296 Error("TPrincipal","Couldn't create vector sigmas");
298 Error("TPrincipal","Couldn't create covariance matrix");
299 if (!fEigenVectors.IsValid())
300 Error("TPrincipal","Couldn't create eigenvector matrix");
301 if (!fEigenValues.IsValid())
302 Error("TPrincipal","Couldn't create eigenvalue vector");
303 if (!fOffDiagonal.IsValid())
304 Error("TPrincipal","Couldn't create offdiagonal vector");
305 if (fStoreData) {
306 fUserData.ResizeTo(nVariables*1000);
307 fUserData.Zero();
308 if (!fUserData.IsValid())
309 Error("TPrincipal","Couldn't create user data vector");
310 }
311}
312
313////////////////////////////////////////////////////////////////////////////////
314/// Copy constructor.
315
317 TNamed(pr),
318 fNumberOfDataPoints(pr.fNumberOfDataPoints),
319 fNumberOfVariables(pr.fNumberOfVariables),
320 fMeanValues(pr.fMeanValues),
321 fSigmas(pr.fSigmas),
322 fCovarianceMatrix(pr.fCovarianceMatrix),
323 fEigenVectors(pr.fEigenVectors),
324 fEigenValues(pr.fEigenValues),
325 fOffDiagonal(pr.fOffDiagonal),
326 fUserData(pr.fUserData),
327 fTrace(pr.fTrace),
328 fHistograms(pr.fHistograms),
329 fIsNormalised(pr.fIsNormalised),
330 fStoreData(pr.fStoreData)
331{
332}
333
334////////////////////////////////////////////////////////////////////////////////
335/// Assignment operator.
336
338{
339 if(this!=&pr) {
344 fSigmas=pr.fSigmas;
350 fTrace=pr.fTrace;
354 }
355 return *this;
356}
357
358////////////////////////////////////////////////////////////////////////////////
359/// Destructor.
360
362{
363 if (fHistograms) {
365 delete fHistograms;
366 }
367}
368
369////////////////////////////////////////////////////////////////////////////////
370/// Add a data point and update the covariance matrix. The input
371/// array must be <TT>fNumberOfVariables</TT> long.
372///
373///
374/// The Covariance matrix and mean values of the input data is calculated
375/// on the fly by the following equations:
376///
377/// \f[
378/// \left<x_i\right>^{(0)} = x_{i0}
379/// \f]
380///
381///
382/// \f[
383/// \left<x_i\right>^{(n)} = \left<x_i\right>^{(n-1)}
384/// + \frac1n \left(x_{in} - \left<x_i\right>^{(n-1)}\right)
385/// \f]
386///
387/// \f[
388/// C_{ij}^{(0)} = 0
389/// \f]
390///
391///
392///
393/// \f[
394/// C_{ij}^{(n)} = C_{ij}^{(n-1)}
395/// + \frac1{n-1}\left[\left(x_{in} - \left<x_i\right>^{(n)}\right)
396/// \left(x_{jn} - \left<x_j\right>^{(n)}\right)\right]
397/// - \frac1n C_{ij}^{(n-1)}
398/// \f]
399///
400/// since this is a really fast method, with no rounding errors (please
401/// refer to CERN 72-21 pp. 54-106).
402///
403///
404/// The data is stored internally in a <TT>TVectorD</TT>, in the following
405/// way:
406///
407/// \f[
408/// \mathbf{x} = \left[\left(x_{0_0},\ldots,x_{{P-1}_0}\right),\ldots,
409/// \left(x_{0_i},\ldots,x_{{P-1}_i}\right), \ldots\right]
410/// \f]
411///
412/// With \f$P\f$ as defined in the class description.
413
415{
416 if (!p)
417 return;
418 if (fNumberOfDataPoints == std::numeric_limits<Int_t>::max()) {
419 Error("AddRow", "`fNumberOfDataPoints` has reached its allowed maximum %d, cannot add new row.", fNumberOfDataPoints);
420 return;
421 }
422
423 // Increment the data point counter
424 Int_t i,j;
425 if (++fNumberOfDataPoints == 1) {
426 for (i = 0; i < fNumberOfVariables; i++)
427 fMeanValues(i) = p[i];
428 }
429 else {
430
431 const Double_t invnp = 1. / Double_t(fNumberOfDataPoints);
432 const Double_t invnpM1 = 1. /(Double_t(fNumberOfDataPoints - 1));
433 const Double_t cor = 1. - invnp;
434 // use directly vector array for faster element access
435 Double_t * meanValues = fMeanValues.GetMatrixArray();
437 for (i = 0; i < fNumberOfVariables; i++) {
438
439 meanValues[i] *= cor;
440 meanValues[i] += p[i] * invnp;
441 const Double_t t1 = (p[i] - meanValues[i]) * invnpM1;
442
443 // Setting Matrix (lower triangle) elements
444 for (j = 0; j < i + 1; j++) {
445 const Int_t index = i * fNumberOfVariables + j;
446 covMatrix[index] *= cor;
447 covMatrix[index] += t1 * (p[j] - meanValues[j]);
448 }
449 }
450 }
451
452 // Store data point in internal vector
453 // If the vector isn't big enough to hold the new data, then
454 // expand the vector by half it's size.
455 if (!fStoreData)
456 return;
460
461 for (i = 0; i < fNumberOfVariables; i++) {
463 fUserData.GetMatrixArray()[j] = p[i];
464 }
465
466}
467
468////////////////////////////////////////////////////////////////////////////////
469/// Browse the TPrincipal object in the TBrowser.
470
472{
473 if (fHistograms) {
474 TIter next(fHistograms);
475 TH1* h = nullptr;
476 while ((h = (TH1*)next()))
477 b->Add(h,h->GetName());
478 }
479
480 if (fStoreData)
481 b->Add(&fUserData,"User Data");
482 b->Add(&fCovarianceMatrix,"Covariance Matrix");
483 b->Add(&fMeanValues,"Mean value vector");
484 b->Add(&fSigmas,"Sigma value vector");
485 b->Add(&fEigenValues,"Eigenvalue vector");
486 b->Add(&fEigenVectors,"Eigenvector Matrix");
487
488}
489
490////////////////////////////////////////////////////////////////////////////////
491/// Clear the data in Object. Notice, that's not possible to change
492/// the dimension of the original data.
493
495{
496 if (fHistograms) {
497 fHistograms->Delete(opt);
498 }
499
501 fTrace = 0;
506 fSigmas.Zero();
508
509 if (fStoreData) {
511 fUserData.Zero();
512 }
513}
514
515////////////////////////////////////////////////////////////////////////////////
516/// Return a row of the user supplied data.
517/// If row is out of bounds, 0 is returned.
518/// It's up to the user to delete the returned array.
519/// Row 0 is the first row;
520
522{
523 if (row >= fNumberOfDataPoints)
524 return nullptr;
525
526 if (!fStoreData)
527 return nullptr;
528
530 if (index > std::numeric_limits<Int_t>::max()) {
531 Error("GetRow", "Input parameter `row` %lld x fNumberOfVariables %d goes into overflow (%lld>%d), returning nullptr.", row, fNumberOfVariables, index, std::numeric_limits<Int_t>::max());
532 return nullptr;
533 }
534 return &fUserData(index);
535}
536
537
538////////////////////////////////////////////////////////////////////////////////
539/// Generates the file `<filename>`, with `.C` appended if it does
540/// argument doesn't end in .cxx or .C.
541///
542/// The file contains the implementation of two functions
543/// ~~~ {.cpp}
544/// void X2P(Double_t *x, Double *p)
545/// void P2X(Double_t *p, Double *x, Int_t nTest)
546/// ~~~
547/// which does the same as `TPrincipal::X2P` and `TPrincipal::P2X`
548/// respectively. Please refer to these methods.
549///
550/// Further, the static variables:
551/// ~~~ {.cpp}
552/// Int_t gNVariables
553/// Double_t gEigenValues[]
554/// Double_t gEigenVectors[]
555/// Double_t gMeanValues[]
556/// Double_t gSigmaValues[]
557/// ~~~
558/// are initialized. The only ROOT header file needed is Rtypes.h
559///
560/// See TPrincipal::MakeRealCode for a list of options
561
563{
564 TString outName(filename);
565 if (!outName.EndsWith(".C") && !outName.EndsWith(".cxx"))
566 outName += ".C";
567
568 MakeRealCode(outName.Data(),"",opt);
569}
570
571////////////////////////////////////////////////////////////////////////////////
572/// Make histograms of the result of the analysis.
573/// The option string say which histograms to create
574/// - X Histogram original data
575/// - P Histogram principal components corresponding to
576/// original data
577/// - D Histogram the difference between the original data
578/// and the projection of principal unto a lower
579/// dimensional subspace (2D histograms)
580/// - E Histogram the eigenvalues
581/// - S Histogram the square of the residues
582/// (see `TPrincipal::SumOfSquareResiduals`)
583/// The histograms will be named `<name>_<type><number>`, where `<name>`
584/// is the first argument, `<type>` is one of X,P,D,E,S, and `<number>`
585/// is the variable.
586
588{
589 Bool_t makeX = kFALSE;
590 Bool_t makeD = kFALSE;
591 Bool_t makeP = kFALSE;
592 Bool_t makeE = kFALSE;
593 Bool_t makeS = kFALSE;
594
595 Int_t len = opt ? strlen(opt) : 0;
596 Int_t i,j,k;
597 for (i = 0; i < len; i++) {
598 switch (opt[i]) {
599 case 'X':
600 case 'x':
601 if (fStoreData)
602 makeX = kTRUE;
603 break;
604 case 'd':
605 case 'D':
606 if (fStoreData)
607 makeD = kTRUE;
608 break;
609 case 'P':
610 case 'p':
611 if (fStoreData)
612 makeP = kTRUE;
613 break;
614 case 'E':
615 case 'e':
616 makeE = kTRUE;
617 break;
618 case 's':
619 case 'S':
620 if (fStoreData)
621 makeS = kTRUE;
622 break;
623 default:
624 Warning("MakeHistograms","Unknown option: %c",opt[i]);
625 }
626 }
627
628 // If no option was given, then exit gracefully
629 if (!makeX && !makeD && !makeP && !makeE && !makeS)
630 return;
631
632 // If the list of histograms doesn't exist, create it.
633 if (!fHistograms)
634 fHistograms = new TList;
635
636 // Don't create the histograms if they are already in the TList.
637 if (makeX && fHistograms->FindObject(TString::Format("%s_x000",name)))
638 makeX = kFALSE;
639 if (makeD && fHistograms->FindObject(TString::Format("%s_d000",name)))
640 makeD = kFALSE;
641 if (makeP && fHistograms->FindObject(TString::Format("%s_p000",name)))
642 makeP = kFALSE;
643 if (makeE && fHistograms->FindObject(TString::Format("%s_e",name)))
644 makeE = kFALSE;
645 if (makeS && fHistograms->FindObject(TString::Format("%s_s",name)))
646 makeS = kFALSE;
647
648 TH1F **hX = nullptr;
649 TH2F **hD = nullptr;
650 TH1F **hP = nullptr;
651 TH1F *hE = nullptr;
652 TH1F *hS = nullptr;
653
654 // Initialize the arrays of histograms needed
655 if (makeX)
656 hX = new TH1F * [fNumberOfVariables];
657
658 if (makeD)
659 hD = new TH2F * [fNumberOfVariables];
660
661 if (makeP)
662 hP = new TH1F * [fNumberOfVariables];
663
664 if (makeE){
665 hE = new TH1F(TString::Format("%s_e",name), "Eigenvalues of Covariance matrix",
667 hE->SetXTitle("Eigenvalue");
668 fHistograms->Add(hE);
669 }
670
671 if (makeS) {
672 hS = new TH1F(TString::Format("%s_s",name),"E_{N}",
674 hS->SetXTitle("N");
675 hS->SetYTitle("#sum_{i=1}^{M} (x_{i} - x'_{N,i})^{2}");
676 fHistograms->Add(hS);
677 }
678
679 // Initialize sub elements of the histogram arrays
680 for (i = 0; i < fNumberOfVariables; i++) {
681 if (makeX) {
682 // We allow 4 sigma spread in the original data in our
683 // histogram.
684 Double_t xlowb = fMeanValues(i) - 4 * fSigmas(i);
685 Double_t xhighb = fMeanValues(i) + 4 * fSigmas(i);
686 Int_t xbins = (fNumberOfDataPoints > 0 && fNumberOfDataPoints < 100 ? 1 : fNumberOfDataPoints/100);
687 hX[i] = new TH1F(TString::Format("%s_x%03d", name, i),
688 TString::Format("Pattern space, variable %d", i),
689 xbins,xlowb,xhighb);
690 hX[i]->SetXTitle(Form("x_{%d}",i));
691 fHistograms->Add(hX[i]);
692 }
693
694 if(makeD) {
695 // The upper limit below is arbitrary!!!
696 Double_t dlowb = 0;
697 Double_t dhighb = 20;
698 Int_t dbins = (fNumberOfDataPoints > 0 && fNumberOfDataPoints < 100 ? 1 : fNumberOfDataPoints/100);
699 hD[i] = new TH2F(TString::Format("%s_d%03d", name, i),
700 TString::Format("Distance from pattern to feature space, variable %d", i),
701 dbins,dlowb,dhighb,
703 hD[i]->SetXTitle(TString::Format("|x_{%d} - x'_{%d,N}|/#sigma_{%d}",i,i,i));
704 hD[i]->SetYTitle("N");
705 fHistograms->Add(hD[i]);
706 }
707
708 if(makeP) {
709 // For some reason, the trace of the none-scaled matrix
710 // (see TPrincipal::MakeNormalised) should enter here. Taken
711 // from LINTRA code.
713 Double_t plowb = -10 * TMath::Sqrt(et);
714 Double_t phighb = -plowb;
715 Int_t pbins = 100;
716 hP[i] = new TH1F(TString::Format("%s_p%03d", name, i),
717 TString::Format("Feature space, variable %d", i),
718 pbins,plowb,phighb);
719 hP[i]->SetXTitle(TString::Format("p_{%d}",i));
720 fHistograms->Add(hP[i]);
721 }
722
723 if (makeE)
724 // The Eigenvector histogram is easy
725 hE->Fill(i,fEigenValues(i));
726
727 }
728 if (!makeX && !makeP && !makeD && !makeS) {
729 if (hX)
730 delete[] hX;
731 if (hD)
732 delete[] hD;
733 if (hP)
734 delete[] hP;
735 return;
736 }
737
738 Double_t *x = nullptr;
741 for (i = 0; i < fNumberOfDataPoints; i++) {
742
743 // Zero arrays
744 for (j = 0; j < fNumberOfVariables; j++)
745 p[j] = d[j] = 0;
746
747 // update the original data histogram
748 x = (Double_t*)(GetRow(i));
749 R__ASSERT(x);
750
751 if (makeP||makeD||makeS)
752 // calculate the corresponding principal component
753 X2P(x,p);
754
755 if (makeD || makeS) {
756 // Calculate the difference between the original data, and the
757 // same project onto principal components, and then to a lower
758 // dimensional sub-space
759 for (j = fNumberOfVariables; j > 0; j--) {
760 P2X(p,d,j);
761
762 for (k = 0; k < fNumberOfVariables; k++) {
763 // We use the absolute value of the difference!
764 d[k] = x[k] - d[k];
765
766 if (makeS)
767 hS->Fill(j,d[k]*d[k]);
768
769 if (makeD) {
770 d[k] = TMath::Abs(d[k]) / (fIsNormalised ? fSigmas(k) : 1);
771 (hD[k])->Fill(d[k],j);
772 }
773 }
774 }
775 }
776
777 if (makeX||makeP) {
778 // If we are asked to make any of these histograms, we have to
779 // go here
780 for (j = 0; j < fNumberOfVariables; j++) {
781 if (makeX)
782 (hX[j])->Fill(x[j]);
783
784 if (makeP)
785 (hP[j])->Fill(p[j]);
786 }
787 }
788 }
789 // Clean up
790 if (hX)
791 delete [] hX;
792 if (hD)
793 delete [] hD;
794 if (hP)
795 delete [] hP;
796 if (d)
797 delete [] d;
798 if (p)
799 delete [] p;
800
801 // Normalize the residues
802 if (makeS)
803 hS->Scale(Double_t(1.)/fNumberOfDataPoints);
804}
805
806////////////////////////////////////////////////////////////////////////////////
807/// Normalize the covariance matrix
808
810{
811 Int_t i,j;
812 for (i = 0; i < fNumberOfVariables; i++) {
814 if (fIsNormalised)
815 for (j = 0; j <= i; j++)
816 fCovarianceMatrix(i,j) /= (fSigmas(i) * fSigmas(j));
817
819 }
820
821 // Fill remaining parts of matrix, and scale.
822 for (i = 0; i < fNumberOfVariables; i++)
823 for (j = 0; j <= i; j++) {
826 }
827
828}
829
830////////////////////////////////////////////////////////////////////////////////
831/// Generate the file `<classname>PCA.cxx` which contains the
832/// implementation of two methods:
833/// ~~~ {.cpp}
834/// void <classname>::X2P(Double_t *x, Double *p)
835/// void <classname>::P2X(Double_t *p, Double *x, Int_t nTest)
836/// ~~~
837/// which does the same as TPrincipal::X2P and TPrincipal::P2X
838/// respectively. Please refer to these methods.
839///
840/// Further, the public static members:
841/// ~~~ {.cpp}
842/// Int_t <classname>::fgNVariables
843/// Double_t <classname>::fgEigenValues[]
844/// Double_t <classname>::fgEigenVectors[]
845/// Double_t <classname>::fgMeanValues[]
846/// Double_t <classname>::fgSigmaValues[]
847/// ~~~
848/// are initialized, and assumed to exist. The class declaration is
849/// assumed to be in `<classname>.h` and assumed to be provided by the
850/// user.
851///
852/// See TPrincipal::MakeRealCode for a list of options
853///
854/// The minimal class definition is:
855/// ~~~ {.cpp}
856/// class <classname> {
857/// public:
858/// static Int_t fgNVariables;
859/// static Double_t fgEigenVectors[];
860/// static Double_t fgEigenValues[];
861/// static Double_t fgMeanValues[];
862/// static Double_t fgSigmaValues[];
863///
864/// void X2P(Double_t *x, Double_t *p);
865/// void P2X(Double_t *p, Double_t *x, Int_t nTest);
866/// };
867/// ~~~
868/// Whether the methods `<classname>::%X2P` and `<classname>::%P2X` should
869/// be static or not, is up to the user.
870
871void TPrincipal::MakeMethods(const char *classname, Option_t *opt)
872{
873
874 MakeRealCode(TString::Format("%sPCA.cxx", classname), classname, opt);
875}
876
877
878////////////////////////////////////////////////////////////////////////////////
879/// Perform the principal components analysis.
880/// This is done in several stages in the TMatrix::EigenVectors method:
881/// - Transform the covariance matrix into a tridiagonal matrix.
882/// - Find the eigenvalues and vectors of the tridiagonal matrix.
883
885{
886 // Normalize covariance matrix
888
890 TMatrixDSymEigen eigen(sym);
893 //make sure that eigenvalues are positive
894 for (Int_t i = 0; i < fNumberOfVariables; i++) {
895 if (fEigenValues[i] < 0) fEigenValues[i] = -fEigenValues[i];
896 }
897}
898
899////////////////////////////////////////////////////////////////////////////////
900/// This is the method that actually generates the code for the
901/// transformations to and from feature space and pattern space
902/// It's called by TPrincipal::MakeCode and TPrincipal::MakeMethods.
903///
904/// The options are: NONE so far
905
906void TPrincipal::MakeRealCode(const char *filename, const char *classname,
907 Option_t *)
908{
909 Bool_t isMethod = classname[0] == '\0' ? kFALSE : kTRUE;
910 TString prefix;
911 const char *cv_qual = isMethod ? "" : "static ";
912 if (isMethod)
913 prefix.Form("%s::", classname);
914
915 std::ofstream outFile(filename,std::ios::out|std::ios::trunc);
916 if (!outFile) {
917 Error("MakeRealCode","couldn't open output file '%s'",filename);
918 return;
919 }
920
921 std::cout << "Writing on file \"" << filename << "\" ... " << std::flush;
922 //
923 // Write header of file
924 //
925 // Emacs mode line ;-)
926 outFile << "// -*- mode: c++ -*-" << std::endl;
927 // Info about creator
928 outFile << "// " << std::endl
929 << "// File " << filename
930 << " generated by TPrincipal::MakeCode" << std::endl;
931 // Time stamp
932 TDatime date;
933 outFile << "// on " << date.AsString() << std::endl;
934 // ROOT version info
935 outFile << "// ROOT version " << gROOT->GetVersion()
936 << std::endl << "//" << std::endl;
937 // General information on the code
938 outFile << "// This file contains the functions " << std::endl
939 << "//" << std::endl
940 << "// void " << prefix
941 << "X2P(Double_t *x, Double_t *p); " << std::endl
942 << "// void " << prefix
943 << "P2X(Double_t *p, Double_t *x, Int_t nTest);"
944 << std::endl << "//" << std::endl
945 << "// The first for transforming original data x in " << std::endl
946 << "// pattern space, to principal components p in " << std::endl
947 << "// feature space. The second function is for the" << std::endl
948 << "// inverse transformation, but using only nTest" << std::endl
949 << "// of the principal components in the expansion" << std::endl
950 << "// " << std::endl
951 << "// See TPrincipal class documentation for more "
952 << "information " << std::endl << "// " << std::endl;
953 // Header files
954 if (isMethod)
955 // If these are methods, we need the class header
956 outFile << "#include \"" << classname << ".h\"" << std::endl;
957 else
958 // otherwise, we need the typedefs of Int_t and Double_t
959 outFile << "#include <Rtypes.h> // needed for Double_t etc" << std::endl;
960
961 //
962 // Now for the data
963 //
964 // We make the Eigenvector matrix, Eigenvalue vector, Sigma vector,
965 // and Mean value vector static, since all are needed in both
966 // functions. Also ,the number of variables are stored in a static
967 // variable.
968 outFile << "//" << std::endl
969 << "// Static data variables" << std::endl
970 << "//" << std::endl;
971 outFile << cv_qual << "Int_t " << prefix << "gNVariables = "
972 << fNumberOfVariables << ";" << std::endl;
973
974 // Assign the values to the Eigenvector matrix. The elements are
975 // stored row-wise, that is element
976 // M[i][j] = e[i * nVariables + j]
977 // where i and j are zero-based.
978 outFile << std::endl << "// Assignment of eigenvector matrix." << std::endl
979 << "// Elements are stored row-wise, that is" << std::endl
980 << "// M[i][j] = e[i * nVariables + j] " << std::endl
981 << "// where i and j are zero-based" << std::endl;
982 outFile << cv_qual << "Double_t " << prefix
983 << "gEigenVectors[] = {" << std::flush;
984 Int_t i,j;
985 for (i = 0; i < fNumberOfVariables; i++) {
986 for (j = 0; j < fNumberOfVariables; j++) {
988 outFile << (index != 0 ? "," : "" ) << std::endl
989 << " " << fEigenVectors(i,j) << std::flush;
990 }
991 }
992 outFile << "};" << std::endl << std::endl;
993
994 // Assignment to eigenvalue vector. Zero-based.
995 outFile << "// Assignment to eigen value vector. Zero-based." << std::endl;
996 outFile << cv_qual << "Double_t " << prefix
997 << "gEigenValues[] = {" << std::flush;
998 for (i = 0; i < fNumberOfVariables; i++)
999 outFile << (i != 0 ? "," : "") << std::endl
1000 << " " << fEigenValues(i) << std::flush;
1001 outFile << std::endl << "};" << std::endl << std::endl;
1002
1003 // Assignment to mean Values vector. Zero-based.
1004 outFile << "// Assignment to mean value vector. Zero-based." << std::endl;
1005 outFile << cv_qual << "Double_t " << prefix
1006 << "gMeanValues[] = {" << std::flush;
1007 for (i = 0; i < fNumberOfVariables; i++)
1008 outFile << (i != 0 ? "," : "") << std::endl
1009 << " " << fMeanValues(i) << std::flush;
1010 outFile << std::endl << "};" << std::endl << std::endl;
1011
1012 // Assignment to mean Values vector. Zero-based.
1013 outFile << "// Assignment to sigma value vector. Zero-based." << std::endl;
1014 outFile << cv_qual << "Double_t " << prefix
1015 << "gSigmaValues[] = {" << std::flush;
1016 for (i = 0; i < fNumberOfVariables; i++)
1017 outFile << (i != 0 ? "," : "") << std::endl
1018 << " " << (fIsNormalised ? fSigmas(i) : 1) << std::flush;
1019 // << " " << fSigmas(i) << std::flush;
1020 outFile << std::endl << "};" << std::endl << std::endl;
1021
1022 //
1023 // Finally we reach the functions themselves
1024 //
1025 // First: void x2p(Double_t *x, Double_t *p);
1026 //
1027 outFile << "// " << std::endl
1028 << "// The "
1029 << (isMethod ? "method " : "function ")
1030 << " void " << prefix
1031 << "X2P(Double_t *x, Double_t *p)"
1032 << std::endl << "// " << std::endl;
1033 outFile << "void " << prefix
1034 << "X2P(Double_t *x, Double_t *p) {" << std::endl
1035 << " for (Int_t i = 0; i < gNVariables; i++) {" << std::endl
1036 << " p[i] = 0;" << std::endl
1037 << " for (Int_t j = 0; j < gNVariables; j++)" << std::endl
1038 << " p[i] += (x[j] - gMeanValues[j]) " << std::endl
1039 << " * gEigenVectors[j * gNVariables + i] "
1040 << "/ gSigmaValues[j];" << std::endl << std::endl << " }"
1041 << std::endl << "}" << std::endl << std::endl;
1042 //
1043 // Now: void p2x(Double_t *p, Double_t *x, Int_t nTest);
1044 //
1045 outFile << "// " << std::endl << "// The "
1046 << (isMethod ? "method " : "function ")
1047 << " void " << prefix
1048 << "P2X(Double_t *p, Double_t *x, Int_t nTest)"
1049 << std::endl << "// " << std::endl;
1050 outFile << "void " << prefix
1051 << "P2X(Double_t *p, Double_t *x, Int_t nTest) {" << std::endl
1052 << " for (Int_t i = 0; i < gNVariables; i++) {" << std::endl
1053 << " x[i] = gMeanValues[i];" << std::endl
1054 << " for (Int_t j = 0; j < nTest; j++)" << std::endl
1055 << " x[i] += p[j] * gSigmaValues[i] " << std::endl
1056 << " * gEigenVectors[i * gNVariables + j];" << std::endl
1057 << " }" << std::endl << "}" << std::endl << std::endl;
1058
1059 // EOF
1060 outFile << "// EOF for " << filename << std::endl;
1061
1062 // Close the file
1063 outFile.close();
1064
1065 std::cout << "done" << std::endl;
1066}
1067
1068////////////////////////////////////////////////////////////////////////////////
1069/// Calculate x as a function of nTest of the most significant
1070/// principal components p, and return it in x.
1071/// It's the users responsibility to make sure that both x and p are
1072/// of the right size (i.e., memory must be allocated for x).
1073
1075{
1076 for (Int_t i = 0; i < fNumberOfVariables; i++){
1077 x[i] = fMeanValues(i);
1078 for (Int_t j = 0; j < nTest; j++)
1079 x[i] += p[j] * (fIsNormalised ? fSigmas(i) : 1)
1080 * fEigenVectors(i,j);
1081 }
1082
1083}
1084
1085////////////////////////////////////////////////////////////////////////////////
1086/// Print the statistics
1087/// Options are
1088/// - M Print mean values of original data
1089/// - S Print sigma values of original data
1090/// - E Print eigenvalues of covariance matrix
1091/// - V Print eigenvectors of covariance matrix
1092/// Default is MSE
1093
1095{
1096 Bool_t printV = kFALSE;
1097 Bool_t printM = kFALSE;
1098 Bool_t printS = kFALSE;
1099 Bool_t printE = kFALSE;
1100
1101 Int_t len = opt ? strlen(opt) : 0;
1102 for (Int_t i = 0; i < len; i++) {
1103 switch (opt[i]) {
1104 case 'V':
1105 case 'v':
1106 printV = kTRUE;
1107 break;
1108 case 'M':
1109 case 'm':
1110 printM = kTRUE;
1111 break;
1112 case 'S':
1113 case 's':
1114 printS = kTRUE;
1115 break;
1116 case 'E':
1117 case 'e':
1118 printE = kTRUE;
1119 break;
1120 default:
1121 Warning("Print", "Unknown option '%c'",opt[i]);
1122 break;
1123 }
1124 }
1125
1126 if (printM||printS||printE) {
1127 std::cout << " Variable # " << std::flush;
1128 if (printM)
1129 std::cout << "| Mean Value " << std::flush;
1130 if (printS)
1131 std::cout << "| Sigma " << std::flush;
1132 if (printE)
1133 std::cout << "| Eigenvalue" << std::flush;
1134 std::cout << std::endl;
1135
1136 std::cout << "-------------" << std::flush;
1137 if (printM)
1138 std::cout << "+------------" << std::flush;
1139 if (printS)
1140 std::cout << "+------------" << std::flush;
1141 if (printE)
1142 std::cout << "+------------" << std::flush;
1143 std::cout << std::endl;
1144
1145 for (Int_t i = 0; i < fNumberOfVariables; i++) {
1146 std::cout << std::setw(12) << i << " " << std::flush;
1147 if (printM)
1148 std::cout << "| " << std::setw(10) << std::setprecision(4)
1149 << fMeanValues(i) << " " << std::flush;
1150 if (printS)
1151 std::cout << "| " << std::setw(10) << std::setprecision(4)
1152 << fSigmas(i) << " " << std::flush;
1153 if (printE)
1154 std::cout << "| " << std::setw(10) << std::setprecision(4)
1155 << fEigenValues(i) << " " << std::flush;
1156 std::cout << std::endl;
1157 }
1158 std::cout << std::endl;
1159 }
1160
1161 if(printV) {
1162 for (Int_t i = 0; i < fNumberOfVariables; i++) {
1163 std::cout << "Eigenvector # " << i << std::flush;
1166 v.Print();
1167 }
1168 }
1169}
1170
1171////////////////////////////////////////////////////////////////////////////////
1172/// Calculates the sum of the square residuals, that is
1173///
1174/// \f[
1175/// E_N = \sum_{i=0}^{P-1} \left(x_i - x^\prime_i\right)^2
1176/// \f]
1177///
1178/// where \f$x^\prime_i = \sum_{j=i}^N p_i e_{n_j}\f$
1179/// is the \f$i^{\mbox{th}}\f$ component of the principal vector, corresponding to
1180/// \f$x_i\f$, the original data; I.e., the square distance to the space
1181/// spanned by \f$N\f$ eigenvectors.
1182
1184{
1185
1186 if (!x)
1187 return;
1188
1189 Double_t p[100];
1190 Double_t xp[100];
1191
1192 X2P(x,p);
1193 for (Int_t i = fNumberOfVariables-1; i >= 0; i--) {
1194 P2X(p,xp,i);
1195 for (Int_t j = 0; j < fNumberOfVariables; j++) {
1196 s[i] += (x[j] - xp[j])*(x[j] - xp[j]);
1197 }
1198 }
1199}
1200
1201////////////////////////////////////////////////////////////////////////////////
1202/// Test the PCA, bye calculating the sum square of residuals
1203/// (see method SumOfSquareResiduals), and display the histogram
1204
1206{
1207 MakeHistograms("pca","S");
1208
1209 if (!fStoreData)
1210 return;
1211
1212 TH1 *pca_s = nullptr;
1213 if (fHistograms) pca_s = (TH1*)fHistograms->FindObject("pca_s");
1214 if (!pca_s) {
1215 Warning("Test", "Couldn't get histogram of square residuals");
1216 return;
1217 }
1218
1219 pca_s->Draw();
1220}
1221
1222////////////////////////////////////////////////////////////////////////////////
1223/// Calculate the principal components from the original data vector
1224/// x, and return it in p.
1225///
1226/// It's the users responsibility to make sure that both x and p are
1227/// of the right size (i.e., memory must be allocated for p).
1228
1230{
1231 for (Int_t i = 0; i < fNumberOfVariables; i++){
1232 p[i] = 0;
1233 for (Int_t j = 0; j < fNumberOfVariables; j++)
1234 p[i] += (x[j] - fMeanValues(j)) * fEigenVectors(j,i) /
1235 (fIsNormalised ? fSigmas(j) : 1);
1236 }
1237
1238}
#define d(i)
Definition RSha256.hxx:102
#define b(i)
Definition RSha256.hxx:100
#define h(i)
Definition RSha256.hxx:106
size_t size(const MatrixT &matrix)
retrieve the size of a square matrix
constexpr Bool_t kFALSE
Definition RtypesCore.h:94
double Double_t
Definition RtypesCore.h:59
long long Long64_t
Definition RtypesCore.h:69
constexpr Bool_t kTRUE
Definition RtypesCore.h:93
const char Option_t
Definition RtypesCore.h:66
#define ClassImp(name)
Definition Rtypes.h:382
#define R__ASSERT(e)
Checks condition e and reports a fatal error if it's false.
Definition TError.h:125
winID h TVirtualViewer3D TVirtualGLPainter p
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char filename
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t index
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
char name[80]
Definition TGX11.cxx:110
TMatrixTColumn_const< Double_t > TMatrixDColumn_const
#define gROOT
Definition TROOT.h:406
char * Form(const char *fmt,...)
Formats a string in a circular formatting buffer.
Definition TString.cxx:2489
Using a TBrowser one can browse all ROOT objects.
Definition TBrowser.h:37
This class stores the date and time with a precision of one second in an unsigned 32 bit word (950130...
Definition TDatime.h:37
const char * AsString() const
Return the date & time as a string (ctime() format).
Definition TDatime.cxx:102
1-D histogram with a float per channel (see TH1 documentation)
Definition TH1.h:622
TH1 is the base class of all histogram classes in ROOT.
Definition TH1.h:59
virtual void SetXTitle(const char *title)
Definition TH1.h:419
virtual Bool_t Add(TF1 *h1, Double_t c1=1, Option_t *option="")
Performs the operation: this = this + c1*f1 if errors are defined (see TH1::Sumw2),...
Definition TH1.cxx:826
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition TH1.cxx:3344
void Draw(Option_t *option="") override
Draw this histogram with options.
Definition TH1.cxx:3066
virtual void SetYTitle(const char *title)
Definition TH1.h:420
2-D histogram with a float per channel (see TH1 documentation)
Definition TH2.h:307
A doubly linked list.
Definition TList.h:38
TObject * FindObject(const char *name) const override
Find an object in this list using its name.
Definition TList.cxx:576
void Add(TObject *obj) override
Definition TList.h:81
void Delete(Option_t *option="") override
Remove all objects from the list AND delete all heap based objects.
Definition TList.cxx:468
TMatrixDSymEigen.
const TVectorD & GetEigenValues() const
const TMatrixD & GetEigenVectors() const
Int_t GetNrows() const
virtual TMatrixTBase< Element > & Zero()
Set matrix elements to zero.
Bool_t IsValid() const
TMatrixTSym< Element > & Use(Int_t row_lwb, Int_t row_upb, Element *data)
const Element * GetMatrixArray() const override
Definition TMatrixT.h:228
The TNamed class is the base class for all named ROOT classes.
Definition TNamed.h:29
virtual void SetName(const char *name)
Set the name of the TNamed.
Definition TNamed.cxx:140
TNamed & operator=(const TNamed &rhs)
TNamed assignment operator.
Definition TNamed.cxx:51
virtual void Warning(const char *method, const char *msgfmt,...) const
Issue warning message.
Definition TObject.cxx:991
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition TObject.cxx:1005
Principal Components Analysis (PCA)
Definition TPrincipal.h:21
virtual void MakeMethods(const char *classname="PCA", Option_t *option="")
Generate the file <classname>PCA.cxx which contains the implementation of two methods:
virtual void AddRow(const Double_t *x)
Add a data point and update the covariance matrix.
virtual void X2P(const Double_t *x, Double_t *p)
Calculate the principal components from the original data vector x, and return it in p.
Double_t fTrace
Trace of covarience matrix.
Definition TPrincipal.h:38
TPrincipal()
Empty constructor. Do not use.
void Clear(Option_t *option="") override
Clear the data in Object.
virtual void MakeHistograms(const char *name="pca", Option_t *option="epsdx")
Make histograms of the result of the analysis.
void Print(Option_t *opt="MSE") const override
Print the statistics Options are.
TVectorD fUserData
Vector of original data points.
Definition TPrincipal.h:36
virtual void MakeCode(const char *filename="pca", Option_t *option="")
Generates the file <filename>, with .C appended if it does argument doesn't end in ....
TMatrixD fCovarianceMatrix
Covariance matrix.
Definition TPrincipal.h:29
Int_t fNumberOfVariables
Number of variables.
Definition TPrincipal.h:25
TVectorD fSigmas
vector of sigmas
Definition TPrincipal.h:28
TVectorD fOffDiagonal
Elements of the tridiagonal.
Definition TPrincipal.h:34
TVectorD fEigenValues
Eigenvalue vector of trans.
Definition TPrincipal.h:32
TVectorD fMeanValues
Mean value over all data points.
Definition TPrincipal.h:27
TList * fHistograms
List of histograms.
Definition TPrincipal.h:40
void MakeRealCode(const char *filename, const char *prefix, Option_t *option="")
This is the method that actually generates the code for the transformations to and from feature space...
Bool_t fStoreData
Should we store input data?
Definition TPrincipal.h:43
TMatrixD fEigenVectors
Eigenvector matrix of trans.
Definition TPrincipal.h:31
~TPrincipal() override
Destructor.
virtual void P2X(const Double_t *p, Double_t *x, Int_t nTest)
Calculate x as a function of nTest of the most significant principal components p,...
void Browse(TBrowser *b) override
Browse the TPrincipal object in the TBrowser.
Int_t fNumberOfDataPoints
Number of data points.
Definition TPrincipal.h:24
void MakeNormalised()
Normalize the covariance matrix.
virtual void MakePrincipals()
Perform the principal components analysis.
virtual void SumOfSquareResiduals(const Double_t *x, Double_t *s)
Calculates the sum of the square residuals, that is.
const Double_t * GetRow(Long64_t row)
Return a row of the user supplied data.
void Test(Option_t *option="")
Test the PCA, bye calculating the sum square of residuals (see method SumOfSquareResiduals),...
Bool_t fIsNormalised
Normalize matrix?
Definition TPrincipal.h:42
TPrincipal & operator=(const TPrincipal &)
Assignment operator.
Basic string class.
Definition TString.h:139
Bool_t EndsWith(const char *pat, ECaseCompare cmp=kExact) const
Return true if string ends with the specified string.
Definition TString.cxx:2244
const char * Data() const
Definition TString.h:376
static TString Format(const char *fmt,...)
Static method which formats a string using a printf style format descriptor and return a TString.
Definition TString.cxx:2378
void Form(const char *fmt,...)
Formats a string using a printf style format descriptor.
Definition TString.cxx:2356
TVectorT< Element > & Zero()
Set vector elements to zero.
Definition TVectorT.cxx:453
TVectorT< Element > & ResizeTo(Int_t lwb, Int_t upb)
Resize the vector to [lwb:upb] .
Definition TVectorT.cxx:294
Int_t GetNrows() const
Definition TVectorT.h:73
Bool_t IsValid() const
Definition TVectorT.h:81
Element * GetMatrixArray()
Definition TVectorT.h:76
Double_t x[n]
Definition legend1.C:17
Double_t Sqrt(Double_t x)
Returns the square root of x.
Definition TMath.h:662
Short_t Abs(Short_t d)
Returns the absolute value of parameter Short_t d.
Definition TMathBase.h:123
auto * t1
Definition textangle.C:20