Principal Components Analysis (PCA) example
Example of using TPrincipal as a stand alone class.
I create n-dimensional data points, where c = trunc(n / 5) + 1 are correlated with the rest n - c randomly distributed variables.
Based on principal.C by Rene Brun and Christian Holm Christensen
Variable # | Mean Value | Sigma | Eigenvalue
-------------+------------+------------+------------
0 | 4.994 | 0.9926 | 0.3856
1 | 8.011 | 2.824 | 0.112
2 | 2.017 | 1.992 | 0.1031
3 | 4.998 | 0.9952 | 0.1022
4 | 8.019 | 2.794 | 0.09998
5 | 1.976 | 2.009 | 0.0992
6 | 4.996 | 0.9996 | 0.09794
7 | 35.01 | 5.147 | 1.409e-16
8 | 30.01 | 5.041 | 2.723e-16
9 | 28.04 | 4.644 | 4.578e-16
Writing on file "pca.C" ... done
*************************************************
* Principal Component Analysis *
* *
* Number of variables: 10 *
* Number of data points: 10000 *
* Number of dependent variables: 3 *
* *
*************************************************
from ROOT import TPrincipal, gRandom, TBrowser, vector
n = 10
m = 10000
print ("""*************************************************
* Principal Component Analysis *
* *
* Number of variables: {0:4d} *
* Number of data points: {1:8d} *
* Number of dependent variables: {2:4d} *
* *
*************************************************""".format(n, m, c))
randumNum = gRandom
data = vector('double')()
for i in range(m):
for j in range(n - c):
if j % 3 == 0:
data.push_back(randumNum.Gaus(5, 1))
elif j % 3 == 1:
data.push_back(randumNum.Poisson(8))
else:
data.push_back(randumNum.Exp(2))
for j in range(c):
data.push_back(0)
for k in range(n - c - j):
data[n - c + j] += data[k]
principal.AddRow(data.data())
data.clear()
principal.MakePrincipals()
principal.Print()
principal.Test()
principal.MakeHistograms()
principal.MakeCode()
b =
TBrowser(
"principalBrowser", principal)
Using a TBrowser one can browse all ROOT objects.
Principal Components Analysis (PCA)
- Authors
- Juan Fernando Jaramillo Botero
Definition in file principal.py.