%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Build $X \in \mathbb{R}^{m \times n}$, $m$ data points $\in \mathbb{R^n}$.
Let $m = 400$ et $n=2$ (i.e., each row $A$ represents a point in $\mathbb{R}^2$).
m = 400
n = 2
# Build a matrix with m columns and n rows, such that each element follows a normal distribution
A = np.random.randn(m,n)
# Create a matrix that will modify the initial matrix (rotation, scale change)
T = np.array([[-0.5,0.3],
[1,2]])
# Apply the linear transformation matrix T to the data matrix A
X = np.dot(A,T)
# Plot the 1st column vector of A vs. the 2nd column vector of A
plt.plot(A[:,0],A[:,1],"o",alpha=0.1,label="$A$")
# Plot the 1st column vector of A vs. the 2nd column vector of X
plt.plot(X[:,0],X[:,1],"o",alpha=0.1,label="$X$")
# Set limits to x and y axis
plt.xlim(-15, 15)
plt.ylim(-15, 15)
# Plot the legend
plt.legend()
# Compute the mean of each column vector of X. Notice that X.mean(axis=1) would compute the mean of each row, and X.mean() the overall mean
col_mean = X.mean(axis=0)
# Numpy will subtract the mean vector to each row
Xc = X - col_mean
print(X.mean(axis=0), Xc.mean(axis=0))
$cov [X_{.,i},X_{.,j}]= \sum_{k=1}^m \frac{(X_{k,i}-E[X_{.,i}])(X_{k,j}-{E}[X_{.,j}])}{m}$
$cov [X_{.,i},X_{.,j}]= \sum_{k=1}^m \frac{X_{k,i}X_{k,j}}{m}$
$cov [X_{.,i},X_{.,j}] = \frac{1}{m} X_{.,i}^* \cdot X_{.,j}$
# If Xc is a numpy array, you can obtain its transpose by typing Xc.T or Xc.transpose()
Xc_transpose = Xc.T
# Xc_transpose.dot(Xc) computes the dot product between Xc_transpose and Xc
Q = Xc_transpose.dot(Xc)
# Xc.shape is a tuple containing the size of the matrix, the 1st element is the number of rows and the 2nd the nb. of cols.
print(Xc.shape)
# Here we simply multiply by 1/m to get the variances and covariances
Q*1/Xc.shape[0]
(recall: $Q\cdot V_{.,i} = \lambda_i V_{.,i} \$)
# np.linalg.eig returns a tuple with 2 elements, the first one is a np.array with the eigen values, and the 2nd a matrix with the eigen vectors (in columns)
lamb,V = np.linalg.eig(Q)
# Check the property of eigen vectors and eigen values
Q.dot(V), V*lamb
# lamb.argsort() sorts the index of lamb, according to their elements' values (ascending order)
idx = lamb.argsort()
# We want to order themin descending order, so we invert the list
idx = idx[::-1]
# We get the eigen values in the right order
lamb = lamb[idx]
# We get the eigen vectors in the right order
V = V[:,idx]
print(lamb)
print(V)
Let $PC_i = X \cdot V_{.,i}$ be the $i$-th Principal Component.
Compute the covariance between $PC_j$ and $PC_k$
$\begin{aligned} cov[PC_j,PC_k] &\propto (X \cdot V_{.,j})^{*} \cdot (X \cdot V_{.,k})\\&= V_{.,j}^*\cdot X^*\cdot X\cdot V_{.,k}\\&=V_{.,j}^*\cdot (\lambda_k V_{.,k})\\&=\lambda_k V_{.,j}^{*}\cdot V_{.,k}\end{aligned}$
$cov[PC_j,PC_k] = cov[PC_k,PC_k] \propto \lambda_k$
$cov[PC_j,PC_k] = 0$
# lamb is equal to the sum of coordinates to the square in the principal axis space. We need to devide these values by the number of rows, to get the variance
plt.plot(lamb/Xc.shape[0], "o")
plt.xlabel("Principal Axis",fontsize=15)
plt.ylabel('Variance explained')
# lamb.sum() returns the sum of the lambda values.
plt.plot(lamb/lamb.sum()*100, "o")
plt.xlabel("Principal Axis",fontsize=15)
plt.ylabel('Variance explained (%)')
Question 3.D
# Compute the principal components
PC = Xc.dot(V)
# Canonical axis (2D idendity matrix)
cannonical_axis = np.eye(2)
# Project the canonical using matrix V
axis_proj = cannonical_axis.dot(V)
plt.plot(PC[:,0],PC[:,1],"o",alpha=0.1)
plt.quiver(*axis_proj[:,0],color="y",angles='xy', scale_units='xy', scale=1)
plt.quiver(*axis_proj[:,1],color="k",angles='xy', scale_units='xy', scale=1)
plt.xlim(-5,5)
plt.ylim(-5,5)
plt.plot(Xc[:,0],Xc[:,1],"o",alpha=0.1)
plt.quiver(*V.T[0,:],color="r",angles='xy', scale_units='xy', scale=1)
plt.quiver(*V.T[1,:],color="g",angles='xy', scale_units='xy', scale=1)
plt.xlim(-5,5)
plt.ylim(-5,5)
Question 3.E: Given the structure of the dataset in the new space, and the percentage of variance explained along each principal axis, does it seem necessary to keep a a 2D space representation?
The singular value decomposition (SVD) of $X$ is:
$X = U \cdot \Sigma \cdot \mathbf{V}^*$
Using the SVD, $X^* X$ can be written as:
$\begin{aligned}X^*X &=V \Sigma^* U^* U \Sigma V^*\\&= V \Sigma^*\Sigma V^* \\&= V \hat {\Sigma}^{2}V^*\end{aligned}$
Therefore:
Question 4: Compute the Principal Components of $X$ using the SVD
# Compute the SVD of Xc, this function returns a tuple with 3 elements: matrix U, an array with the singular values, and matrix V transposed
U,s,Vt = np.linalg.svd(Xc)
# Compute the principal components
V = Vt.T
PC = Xc.dot(V)
plt.plot(PC[:,0],PC[:,1],"o", alpha=0.1)
plt.xlim(-5,5)
plt.ylim(-5,5)
# Load the dataset from a library
from sklearn.datasets import fetch_olivetti_faces
dataset = fetch_olivetti_faces(shuffle=True)
faces = dataset.data
#If the sklearn library is not installed, you can load the faces dataset (attached to the email) uncommenting and running the following commands:
#import pandas as pd
#faces = pd.read_csv("faces.csv",header=0,index_col=0)
#faces = faces.values
Each row of the faces
dataset represents a human face picture in gray scale.
The dataset contains 400 pictures in total.
In practice, the size of a picture is $64 \times 64$ pixels, and in order to fill the matrix, each picture has been flattened in a row vector with 4096 ($=64 \times 64$) elements.
Therefore the faces image represents 400 pictures described in a 4096 dimensional space of pixels.
If we transpose the faces dataset, we have 4096 pixels described in a 400 dimensional space of human faces.
In this practical work we are going to use the SVD to find the "Principal Faces" in the dataset
Question 1: Reshape the first row of the dataset, to create a $64\times 64$ image, and plot the first face, using the plt.imshow
# We get the first row (the first face)
first_face = faces[0,:]
# We reshape it, to have a 64x64 matrix
reshaped_first_face = first_face.reshape(64,64)
# We plot it
plt.imshow(reshaped_first_face)
Question 4: Transpose the faces
matrix (we are going to work with the transpose only) and center each column
Question 3: Compute its SVD
Question 4: Plot the percentage of standard deviation explained along the 20 first principal axis
Question 5: Compute the principal components
Question 6: Plot the first and the last 4 "Principal Faces"
Let $V^k$ be a matrix containing the first $k$ columns of matrix $V$.
Let $\hat X = X \cdot V^k \cdot (V^k)^*$ be an approximation of $X$ using only the first $k$ principal axis
Question 7: Compute $\hat X$ for the face
dataset (for k = 3 and k = 50) and plot the first 4 faces of the new dataset
Question 8: Let $|| X - \hat X ||_2$ be the approximation error. Compute the error for k = 2 to 400
def SSE(X,X_hat):
return(((X - X_hat)**2).sum().sum())