Principal component analysis

Principal Component analysis

PCA deduction using Spectral decomposition

Suppose we have a random vector \[ \begin{gather} X = \begin{pmatrix}X_1\\X_2\\ \vdots\\\ X_p\end{pmatrix}\\ var(X) = \Sigma=\begin{bmatrix}\sigma_1^2 & \sigma_{21}^2& \cdots & \sigma_{1p}^2\\ \sigma_{21}^2& \sigma_{22}^2 & \cdots & \sigma_{2p}^2\\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{p1}^2&\sigma_{p2}^2& \cdots & \sigma_{p}^2 \end{bmatrix}\\ \end{gather} \] We need to augment the spread between the data, so we need to consider linear combination \[ Y_i = e_{11}X_1+e_{12}X_2+\cdots+e_{1p}X_p=\mathbf{e}_i^TX \] and maximize the variance of linear combination \[ \begin{gather} var(Y_i)=E[(\mathbf{e}_i^TX-E(\mathbf{e}_i^TX))^2]\\ E[(\mathbf{e}_i^TX-E(\mathbf{e}_i^TX))^T(\mathbf{e}_i^TX-E(\mathbf{e}_i^TX))]\\ =E[(\mathbf{e}_i^TX-E(\mathbf{e}_i^TX))(\mathbf{e}_i^TX-E(\mathbf{e}_i^TX))^T](\text{variance is a scalar})\\ =E[\mathbf{e}_i^TXX^T\mathbf{e}_i-\mathbf{e}_i^TEXEX^T\mathbf{e}_i]\\ =E[\mathbf{e}_i^T(XX^T)\mathbf{e}_i]\\ =\mathbf{e}_i^T[E(XX^T)-E(X)E(X^T)]\mathbf{e}_i\\ =\mathbf{e}_i^T\Sigma \mathbf{e}_i \end{gather} \] Since \(\Sigma \mathbf{e}_i=\lambda_i\mathbf{e}_i\) in spectral decomposition \[ var(Y_i)=e_i^T\lambda_ie_i=\lambda_ie_i^Te_i=\lambda_i \]