Principal Component Analysis

Statistics > Multivariate Analysis > Principal Component

 

Used to understand the underlying data structure and/or form a smaller number of uncorrelated variables.

Principal components analysis is used to form a smaller number of uncorrelated variables from a large set of data. The goal of principal components analysis is to explain the maximum amount of variance with the fewest number of principal components. Principal components analysis is commonly used in the social sciences, market research, and other industries that use large data sets. You can use principal components analysis when you have too many predictors relative to the number of observations.

 

Dialog box items

Variables:
Choose the columns containing the variables to be included in the analysis

Report:
The display of outputs of VisualStat.

 

Data

Set up your datasheet so that each row contains measurements on a single item or subject. You must have two or more numeric columns, with each column representing a different measurement (response). If a missing value exists in any column, VisualStat ignores the whole row. Missing values are excluded from all calculations, including correlation and covariance matrix.

 

Plot

Displays plots for judging the importance of the different principal components. This section is available only when the number of factors to be fitted is greater than one.

Score plot:
Check to plot the scores for the selected components.

Loading plot:
Select this to display a scatter plot of the components loadings.

Biplot:
Check to plot produce a plot in which both the scores and the loadings are represented in a two dimensional space. The biplot shows the relation of the factors to both the original variables and the original data.

Plot Score Headers:
Check to plot the scores headers.

Axis pairs to plot (X,Y):
Enter the two components to be plotted in the form c(factor1, factor2). By default, a biplot of the first two factors is created.

 

Storage

Stores the coefficients and scores.

Eigenvalues: Check to store eigenvalues.

Eigenvectors: Check to store eigenvectors.

Scores: Check to store columns for the principal components scores.

Loadings: Check to store columns for the principal components loadings.

 

Options

Select centered and/or scaled (to have unit variance) to define the entering and the scaling on which the computation of principal components is based on. The Printed Results group allows you to control displayed output.

Centered: Check if the data supplied at construction time was shifted to be zero-centered.

Scaled: Check if the data supplied at construction time was scaled to have unit variance.

Eigenvalues: Check to print out the eigenvalues.

Eigenvectors: Check to print out the eigenvectors.

Scores: Check to print out the columns for the principal components scores.

Loadings: Check to print out the columns for the principal components loadings.

 

Example

The data are the percentage employed in different industries in Europe countries during 1979. The job categories are agriculture, mining, manufacturing, power supplies, construction, service industries, finance, social and personal services, and transport and communications. It is important to note that these data were collected during the Cold War. Principal components analysis may be used to examine which countries have similar employment patterns.

Source: http://lib.stat.cmu.edu/DASL/Datafiles/EuropeanJobs.html

1.Open the DataBook mva.vstz

2.Select the sheet EuropeanJobs

3.Choose the tab Statistics, the group Multivariate Analysis and the command Principal Component

4.In Variables, select Agr, Min, Man, PS, Con, SI, Fin, SPS and TC

5.Click Plot page

6.Check Loadings plot, Biplot and Plot Score Headers

7.Click Options page

8.Check Centered, Scaled, Eigenvalues, Eigenvectors, Scores, and Loadings.

9.In Number of components, enter 5.

10.Click OK

 

Report window output

Principal Component Analysis

 

 

 

Eigenanalysis of the matrix (Centered, Scaled)

------*----------*--------*--------*-----------------------------------------------------

Axis | Inertia  |  Ratio |  Cumul |      Histogram of Eigenvalues of Matrix

------*----------*--------*--------*-----------------------------------------------------

    1|    3.4872|  0.3875|  0.3875|**********|*********

    2|    2.1302|  0.2367|  0.6241|**********|**

    3|    1.0990|  0.1221|  0.7463|******

    4|    0.9945|  0.1105|  0.8568|******

    5|    0.5432|  0.0604|  0.9171|***

    6|    0.3834|  0.0426|  0.9597|**

    7|    0.2258|  0.0251|  0.9848|*

    8|    0.1368|  0.0152|  1.0000|*

    9|    0.0000|  0.0000|  1.0000|

 

 

 

Table of Eigenvectors

——————---*————————————————*————————————————*————————————————*————————————————*————————————————*

Variable |  Eigenvector 1 |  Eigenvector 2 |  Eigenvector 3 |  Eigenvector 4 |  Eigenvector 5 |

——————---*————————————————*————————————————*————————————————*————————————————*————————————————*

    Agr |    0.523790989 |    0.053593894 |    0.048674388 |   -0.028792848 |    0.212702633 |

    Min |    0.001323458 |    0.617807137 |   -0.201100209 |   -0.064084952 |   -0.163743057 |

    Man |   -0.347495131 |    0.355053603 |   -0.150463083 |    0.346088211 |   -0.384957553 |

     PS |   -0.255716182 |    0.261096058 |   -0.561083250 |   -0.393308968 |    0.295171537 |

    Con |   -0.325179319 |    0.051288448 |    0.153321137 |    0.668323954 |    0.471593431 |

     SI |   -0.378919663 |   -0.350172064 |   -0.115095507 |    0.050156509 |   -0.283568077 |

    Fin |   -0.074373583 |   -0.453697854 |   -0.587361304 |    0.051566520 |    0.279568181 |

    SPS |   -0.387408806 |   -0.221521203 |    0.311903500 |   -0.412230190 |   -0.220351396 |

     TC |   -0.366822713 |    0.202591851 |    0.375106007 |   -0.314371879 |    0.512935593 |

 

 

 

Table of Scores

——-———*————————*——————————————*——————————————*——————————————*——————————————*——————————————*

 NUM |    Row |  Component 1 |  Component 2 |  Component 3 |  Component 4 |  Component 5 |

——-———*————————*——————————————*——————————————*——————————————*——————————————*——————————————*

   1 |Belgium |   -1.6772810 |   -1.1980648 |   -0.1125361 |   -0.3328993 |   -0.3182334 |

   2 |Denmark |   -0.9343975 |   -2.0864648 |    0.9322598 |   -0.5824148 |    0.1006675 |

   3 | France |   -0.7399751 |   -1.0994344 |   -0.4882838 |    0.4906130 |   -0.2938984 |

   4 |W. Germ |   -0.8359965 |   -0.0111557 |   -0.5682728 |    0.1083246 |   -1.1425989 |

   5 |Ireland |    0.1014918 |   -0.4059478 |   -0.3765899 |   -0.9086688 |    0.0149257 |

   6 |  Italy |   -0.3681164 |   -0.7546033 |    1.0400018 |    1.4485444 |   -0.6326536 |

   7 |Luxembo |   -1.0388688 |    0.7411495 |   -0.6388286 |    0.8189379 |   -0.8491208 |

   8 |Netherl |   -1.6554330 |   -1.9659121 |    0.0625041 |    0.0230576 |    0.6228449 |

   9 |United  |   -1.5987869 |   -0.3658837 |   -1.1187476 |   -1.2422767 |   -0.7971389 |

  10 |Austria |   -1.1536026 |    0.1403217 |   -1.0231024 |    0.1546841 |    0.5108637 |

  11 |Finland |   -0.9726772 |   -0.7337736 |   -0.3531286 |   -1.1575079 |    0.5014692 |

  12 | Greece |    2.0723643 |   -0.3465972 |    0.8609706 |    0.2969531 |    0.9504974 |

  13 | Norway |   -1.6537457 |   -1.0548269 |    1.2942914 |   -0.9159234 |    0.9147518 |

  14 |Portuga |    0.9955766 |   -0.7423437 |    0.6430624 |    0.7116872 |   -0.1122109 |

  15 |  Spain |    0.4273981 |   -0.6043692 |   -0.8230923 |    2.4477895 |    1.3687200 |

  16 | Sweden |   -1.0627579 |   -1.5478696 |    0.4094978 |   -1.0055124 |   -0.5303858 |

  17 |Switzer |   -1.0375801 |   -0.7279744 |   -0.5896469 |    1.8222982 |   -0.5881609 |

  18 | Turkey |    6.2242751 |   -1.0454410 |    0.9080548 |   -1.2220470 |   -1.1554566 |

  19 |Bulgari |    0.7115094 |    1.4676998 |    0.9741275 |    0.3845951 |   -0.5620266 |

  20 |Czechos |   -0.4176781 |    2.6128124 |   -0.1492857 |    0.1545210 |   -0.2379228 |

  21 |E. Germ |   -1.7462437 |    2.7603568 |   -0.1056843 |   -0.6348577 |   -0.6696099 |

  22 |Hungary |   -0.5671132 |    3.0824016 |   -0.9057690 |   -1.2760553 |    0.9083005 |

  23 | Poland |    1.0920848 |    1.8682486 |    0.5428310 |   -0.0256849 |    0.3729212 |

  24 |Rumania |    2.0116616 |    1.5738631 |    0.2613181 |    1.3314992 |   -0.3208307 |

  25 |   USSR |   -0.0494557 |    1.2419373 |    2.3759454 |   -0.3134590 |    1.1354054 |

  26 |Yugosla |    3.8733475 |   -0.7981284 |   -3.0518966 |   -0.5761979 |    0.8088810 |

 

 

 

Table of Loadings

——-———*————————*——————————————*——————————————*——————————————*——————————————*——————————————*

 NUM | Column |  Component 1 |  Component 2 |  Component 3 |  Component 4 |  Component 5 |

——-———*————————*——————————————*——————————————*——————————————*——————————————*——————————————*

   1 |    Agr |    0.9781229 |    0.0782209 |    0.0510259 |   -0.0287133 |    0.1567689 |

   2 |    Min |    0.0024714 |    0.9016965 |   -0.2108157 |   -0.0639079 |   -0.1206840 |

   3 |    Man |   -0.6489095 |    0.5182047 |   -0.1577322 |    0.3451322 |   -0.2837264 |

   4 |     PS |   -0.4775222 |    0.3810726 |   -0.5881902 |   -0.3922225 |    0.2175512 |

   5 |    Con |   -0.6072371 |    0.0748561 |    0.1607284 |    0.6664778 |    0.3475799 |

   6 |     SI |   -0.7075914 |   -0.5110801 |   -0.1206560 |    0.0500180 |   -0.2089990 |

   7 |    Fin |   -0.1388846 |   -0.6621771 |   -0.6157378 |    0.0514241 |    0.2060510 |

   8 |    SPS |   -0.7234439 |   -0.3233127 |    0.3269721 |   -0.4110915 |   -0.1624063 |

   9 |     TC |   -0.6850016 |    0.2956851 |    0.3932280 |   -0.3135035 |    0.3780505 |

 

 

 

 

 

Chart window output

 

Plot 1

 

 

 

Plot 2

 

 

 

 

 

Interpreting the results

 

A principal components analysis of the data yields two main principal components (PC's) that explain 38.7% and 23.7% of the variability in the data respectively. In principal components analysis we must be careful to remember that the PC's are mathematical constructs and do not necessarily have an interpretation with respect to the observed variables. However, bearing this in mind, it is often valuable to look at the loadings of the major PC's to see our interpretations might be.

The first PC (denoted Component 1 in Plot 2 above) has a high positive loading on the agriculture variable and negative or zero loadings on all other variables. This PC may be interpreted as distinguishing between countries with agricultural and industrial economies. The plot above shows that Turkey and Yugoslavia have much values on X Axis than the other countries, which seems to suggest that their economies are more agricultural than the other nations.

The second PC (denoted Component 2 in Plot 1) has negative loadings on service industries, finance, and social and personal services, but has positive loadings on all others loadings. This PC may be interpreted as distinguishing between nations with large and small service sectors. The capitalist Western nations have lower values on Y Axis than the communist Eastern nations, which suggests that economies in the West have larger service sectors in their economies.

 

 

 

See Also:


Report | Numeric Formats