Principal Components Analysis (PCA)
The current implementation is based on the LINTRA package from CERNLIB by R. Brun, H. Hansroul, and J. Kubler. The class has been implemented by Christian Holm Christensen in August 2000.
Introduction
In many applications of various fields of research, the treatment of large amounts of data requires powerful techniques capable of rapid data reduction and analysis. Usually, the quantities most conveniently measured by the experimentalist, are not necessarily the most significant for classification and analysis of the data. It is then useful to have a way of selecting an optimal set of variables necessary for the recognition process and reducing the dimensionality of the problem, resulting in an easier classification procedure.
This paper describes the implementation of one such method of feature selection, namely the principal components analysis. This multidimensional technique is well known in the field of pattern recognition and and its use in Particle Physics has been documented elsewhere (cf. H. Wind, Function Parameterization, CERN 72-21).
Overview
Suppose we have prototypes which are trajectories of particles, passing through a spectrometer. If one measures the passage of the particle at say 8 fixed planes, the trajectory is described by an 8-component vector:
\mathbf{x} = \left(x_0, x_1, \ldots, x_7\right)
in 8-dimensional pattern space.
One proceeds by generating a a representative tracks sample and building up the covariance matrix \mathsf{C}. Its eigenvectors and eigenvalues are computed by standard methods, and thus a new basis is obtained for the original 8-dimensional space the expansion of the prototypes,
\mathbf{x}_m = \sum^7_{i=0} a_{m_i} \mathbf{e}_i \quad \mbox{where} \quad a_{m_i} = \mathbf{x}^T\bullet\mathbf{e}_i
allows the study of the behavior of the coefficients a_{m_i} for all the tracks of the sample. The eigenvectors which are insignificant for the trajectory description in the expansion will have their corresponding coefficients a_{m_i} close to zero for all the prototypes.
On one hand, a reduction of the dimensionality is then obtained by omitting these least significant vectors in the subsequent analysis.
On the other hand, in the analysis of real data, these least significant variables(?) can be used for the pattern recognition problem of extracting the valid combinations of coordinates describing a true trajectory from the set of all possible wrong combinations.
The program described here performs this principal components analysis on a sample of data provided by the user. It computes the covariance matrix, its eigenvalues ands corresponding eigenvectors and exhibits the behavior of the principal components a_{m_i}, thus providing to the user all the means of understanding their data.
Principal Components Method
Let's consider a sample of M prototypes each being characterized by P variables x_0, x_1, \ldots, x_{P-1}. Each prototype is a point, or a column vector, in a P-dimensional Pattern space.
\mathbf{x} = \left[\begin{array}{c} x_0\\x_1\\\vdots\\x_{P-1}\end{array}\right]\,,
where each x_n represents the particular value associated with the n-dimension.
Those P variables are the quantities accessible to the experimentalist, but are not necessarily the most significant for the classification purpose.
The Principal Components Method consists of applying a linear* transformation to the original variables. This transformation is described by an orthogonal matrix and is equivalent to a rotation of the original pattern space into a new set of coordinate vectors, which hopefully provide easier feature identification and dimensionality reduction.
Let's define the covariance matrix:
\mathsf{C} = \left\langle\mathbf{y}\mathbf{y}^T\right\rangle \quad\mbox{where}\quad \mathbf{y} = \mathbf{x} - \left\langle\mathbf{x}\right\rangle\,,
and the brackets indicate mean value over the sample of M prototypes.
This matrix \mathsf{C} is real, positive definite, symmetric, and will have all its eigenvalues greater then zero. It will now be show that among the family of all the complete orthonormal bases of the pattern space, the base formed by the eigenvectors of the covariance matrix and belonging to the largest eigenvalues, corresponds to the most significant features of the description of the original prototypes.
let the prototypes be expanded on into a set of N basis vectors \mathbf{e}_n, n=0,\ldots,N,N+1, \ldots, P-1
\mathbf{y}_i = \sum^N_{i=0} a_{i_n} \mathbf{e}_n, \quad i = 1, \ldots, M, \quad N < P-1
The ‘best’ feature coordinates \mathbf{e}_n, spanning a feature space, will be obtained by minimizing the error due to this truncated expansion, i.e.,
\min\left(E_N\right) = \min\left[\left\langle\left(\mathbf{y}_i - \sum^N_{i=0} a_{i_n} \mathbf{e}_n\right)^2\right\rangle\right]
with the conditions:
\mathbf{e}_k\bullet\mathbf{e}_j = \delta_{jk} = \left\{\begin{array}{rcl} 1 & \mbox{for} & k = j\\ 0 & \mbox{for} & k \neq j \end{array}\right.
Multiplying (3) by \mathbf{e}^T_n using (5), we get
a_{i_n} = \mathbf{y}_i^T\bullet\mathbf{e}_n\,,
so the error becomes
\begin{eqnarray*} E_N &=& \left\langle\left[\sum_{n=N+1}^{P-1} a_{i_n}\mathbf{e}_n\right]^2\right\rangle\nonumber\\ &=& \left\langle\left[\sum_{n=N+1}^{P-1} \mathbf{y}_i^T\bullet\mathbf{e}_n\mathbf{e}_n\right]^2\right\rangle\nonumber\\ &=& \left\langle\sum_{n=N+1}^{P-1} \mathbf{e}_n^T\mathbf{y}_i\mathbf{y}_i^T\mathbf{e}_n\right\rangle\nonumber\\ &=& \sum_{n=N+1}^{P-1} \mathbf{e}_n^T\mathsf{C}\mathbf{e}_n \end{eqnarray*}
The minimization of the sum in (7) is obtained when each term \mathbf{e}_n^\mathsf{C}\mathbf{e}_n is minimum, since \mathsf{C} is positive definite. By the method of Lagrange multipliers, and the condition (5), we get
E_N = \sum^{P-1}_{n=N+1} \left(\mathbf{e}_n^T\mathsf{C}\mathbf{e}_n - l_n\mathbf{e}_n^T\bullet\mathbf{e}_n + l_n\right)
The minimum condition \frac{dE_N}{d\mathbf{e}^T_n} = 0 leads to the equation
\mathsf{C}\mathbf{e}_n = l_n\mathbf{e}_n\,,
which shows that \mathbf{e}_n is an eigenvector of the covariance matrix \mathsf{C} with eigenvalue l_n. The estimated minimum error is then given by
E_N \sim \sum^{P-1}_{n=N+1} \mathbf{e}_n^T\bullet l_n\mathbf{e}_n = \sum^{P-1}_{n=N+1} l_n\,,
where l_n,\,n=N+1,\ldots,P l_n,\,n=N+1,\ldots,P-1 are the eigenvalues associated with the omitted eigenvectors in the expansion (3). Thus, by choosing the N largest eigenvalues, and their associated eigenvectors, the error E_N is minimized.
The transformation matrix to go from the pattern space to the feature space consists of the ordered eigenvectors \mathbf{e}_1,\ldots,\mathbf{e}_P \mathbf{e}_0,\ldots,\mathbf{e}_{P-1} for its columns
\mathsf{T} = \left[ \begin{array}{cccc} \mathbf{e}_0 & \mathbf{e}_1 & \vdots & \mathbf{e}_{P-1} \end{array}\right] = \left[ \begin{array}{cccc} \mathbf{e}_{0_0} & \mathbf{e}_{1_0} & \cdots & \mathbf{e}_{{P-1}_0}\\ \mathbf{e}_{0_1} & \mathbf{e}_{1_1} & \cdots & \mathbf{e}_{{P-1}_1}\\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{e}_{0_{P-1}} & \mathbf{e}_{1_{P-1}} & \cdots & \mathbf{e}_{{P-1}_{P-1}}\\ \end{array}\right]
This is an orthogonal transformation, or rotation, of the pattern space and feature selection results in ignoring certain coordinates in the transformed space.
Christian Holm August 2000, CERN
Definition at line 20 of file TPrincipal.h.