
related topics 
{math, number, function} 
{rate, high, increase} 
{math, energy, light} 
{style, bgcolor, rowspan} 

Principal component analysis (PCA) is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has as high a variance as possible (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to (uncorrelated with) the preceding components. Principal components are guaranteed to be independent only if the data set is jointly normally distributed. PCA is sensitive to the relative scaling of the original variables. Depending on the field of application, it is also named the discrete Karhunen–Loève transform (KLT), the Hotelling transform or proper orthogonal decomposition (POD).
PCA was invented in 1901 by Karl Pearson.^{[1]} Now it is mostly used as a tool in exploratory data analysis and for making predictive models. PCA can be done by eigenvalue decomposition of a data covariance matrix or singular value decomposition of a data matrix, usually after mean centering the data for each attribute. The results of a PCA are usually discussed in terms of component scores (the transformed variable values corresponding to a particular case in the data) and loadings (the variance each original variable would have if the data were projected onto a given PCA axis) (Shaw, 2003).
PCA is the simplest of the true eigenvectorbased multivariate analyses. Often, its operation can be thought of as revealing the internal structure of the data in a way which best explains the variance in the data. If a multivariate dataset is visualised as a set of coordinates in a highdimensional data space (1 axis per variable), PCA can supply the user with a lowerdimensional picture, a "shadow" of this object when viewed from its (in some sense) most informative viewpoint. This is done by using only the first few principal components so that the dimensionality of the transformed data is reduced.
PCA is closely related to factor analysis; indeed, some statistical packages (such as Stata) deliberately conflate the two techniques. True factor analysis makes different assumptions about the underlying structure and solves eigenvectors of a slightly different matrix.
Contents
Full article ▸


related documents 
Johnston diagram 
Riemann zeta function 
Forcing (mathematics) 
Boolean satisfiability problem 
List of trigonometric identities 
Direct sum of modules 
Functor 
Integration by parts 
Newton's method 
Series (mathematics) 
Cauchy sequence 
Groupoid 
Pascal's triangle 
Russell's paradox 
Braket notation 
Stone–Čech compactification 
Mathematical induction 
Infinity 
Nonstandard analysis 
Complete lattice 
Ruby (programming language) 
Denotational semantics 
Addition 
Numerical analysis 
Kernel (algebra) 
Wavelet 
Sequence alignment 
Cardinal number 
Inner product space 
Gaussian elimination 
