blog




  • Essay / PCA Principal Component Analysis - 1617

    Principal Component Analysis Principal component analysis (PCA) is a multivariate analysis carried out with the aim of reducing the dimensionality of a multivariate data set in order to recognize the form or model of this data set. In other words, PCA is a powerful pattern recognition technique that attempts to explain the variance of a large set of intercorrelated variables. It indicates the association between variables, thereby reducing the dimensionality of the dataset. (Helena et al, 2000; Wunderlin et al, 2001; Singh et al, 2004) Principal components seek to transform the original variables into a new set of variables which are (1) linear combinations of the variables in the set of data, (2) Uncorrelated with each other and (3) ordered according to the amount of variation in the original variables that they explain (Everitt and Hothorn 2011). The assumptions of PCA: Linearity - The reduced dimension must represent the linear combination of the original variables. The importance of mean and covariance - There is no guarantee that the directions of maximum variance will contain good discrimination characteristics. Large variances have large dynamics - PCA assumes that components with larger variances correspond to interesting dynamics and that smaller ones correspond to noise. Important Terminologies for PCA: Dimension: In principal component analysis, each random variable is considered as an individual dimension. Standard deviation: Standard deviation is a measure of the distribution of numbers. It describes the dispersion of a data set relative to its mean. If the dispersion of the dataset is greater than the average value, then the deviation is also higher. It is expressed by the Greek letter Sigma (σ)....... middle of paper ......ferré because it produces meaningful information about each data point and where it falls in its normal distribution, and also provides an indicator of outliers. (Ben Etzkorn 2011). If we do not standardize the data in the case of principal component analysis, the result of the analysis will tend to give more importance to variables with higher variances. So in this case the analysis will entirely depend on the unit of data we used. Another important step is that if we use the covariance matrix for principal component analysis, we need to standardize the data. But if a correlation matrix is ​​implemented for analysis, raw data can be used. Therefore, the covariance matrix of the standardized data is equal to the correlation matrix of the unstandardized data. (https://onlinecourses.science.psu.edu/stat505/node/55) Working procedure: Data from the same date we used.......