class: left, bottom, inverse, title-slide # Bayesian Statistics and Computing ## Lecture 16: Numerical Algorithms for Eigenanalysis ### Yanfei Kang ### 2020/04/01 (updated: 2020-04-16) --- # Finding eigenvalues and eigenvectors - So far, we have only considered the characteristic polynomial approach to find the eigenvalues of a matrix. - Once we have the eigenvalues, we have been solving the homogeneous equation to find the corresponding eigenvectors. - The process is: find the eigenvalues first, and then find the corresponding eigenvectors. --- # Practicalities - Unfortunately, this is an impractical approach for `\(n > 4\)`. - We will bypass the characteristic polynomial and now take a different approach - The Power Method - QR decomposition --- class: inverse, center, middle # The Power Method --- # The Power Method - The Power Method finds the dominant eigenvalue `\(\lambda_1\)` and corresponding dominant eigenvector `\(v_1\)` of a matrix `\(A\)`. - The dominant eigenvalue is the one with the largest modulus (absolute value for real eigenvalues). - The Power Method is an iterative approach that generates a sequence of scalars that converge to `\(\lambda_1\)` and a sequence of vectors that converge to `\(v_1\)`. - The Power Method works well (converges quickly) when the dominant eigenvalue is clearly dominant. --- # The Power Method - It works by starting with an initial vector `\(x_0\)`, transforming to `\(x_1 = Ax_0\)`, transforming `\(x_1\)` to `\(x_2 = Ax_1\)`, etc. `$$x_1 = Ax_0;~x_2 = Ax_1 = A^2x_0; \cdots; x_k = A^kx_0.$$` - As `\(k\rightarrow \infty\)`, `\(x_k \rightarrow v_1\)`. --- # Estimation of eigenvalues So once we have an eigenvector estimate we can quickly estimate the corresponding eigenvalue `$$Ax = \lambda x \Rightarrow x^T Ax = \lambda x^Tx \Rightarrow \lambda = \frac{x^TAx}{x^Tx} = \frac{x^TAx}{||x||^2} = q^TAq.$$` --- # Practical Power method (normalised) Since the components of `\(x_k\)` just get larger and larger as the Power Method iterates, and we really just want to know the direction of `\(v_1\)` (not its magnitude), we normalise each `\(x_k\)`. So the Power Method can be summarised as: - Set initial vector `\(q_0 = x_0/(||x_0||)\)`. - Repeat for k - Compute `\(x_k = Aq_{k-1}\)` - Normalise `\(q_k = x_k/||x_k||\)` - Estimate `\(\lambda_k = q_k^TAq_k\)` --- # Comments - The Power method is not expected to converge if the matrix A is not diagonalisable. - Convergence rate depends on how dominant `\(\lambda_1\)` is. - Google uses it to calculate the PageRank and Twitter uses it to show users recommendations of who to follow. - And for non-dominant eigenvalues/vectors? --- class: inverse, center, middle # QR decomposition --- # Eigenvalue Revealing Decomposition - It would be nice if we could get our matrix `\(A\)` into an **eigenvalue revealing decomposition** like Schur decomposition `\(A = QSQ^T\)`. - So we can read off the eigenvalues (all of them) from the diagonal. - We will do it iteratively using QR decomposition: `\(A = QR\)`. - QR decomposition is not an eigenvalue revealing decomposition, but it will help us with our aim. - QR decomposition can be done with Gram-Schmidt Orthogonalisation (GSO) algorithm. --- # GSO algorithm Any set of basis vectors `\((a_1, a_2, \cdots, a_n)\)` can be transformed to an orthonormal basis `\((q_1,q_2,\cdots,q_n)\)` by: <center> <img src="./figs/gso.png" height="400px" /> </center> --- # GSO `\(\rightarrow\)` QR <center> <img src="./figs/QR.png" height="200px" /> </center> --- # QR For any `\(m \times n\)` matrix `\(A\)`, we can express `\(A\)` as `\(A = QR\)`. - `\(Q\)` is `\(m\times m\)`, orthogonal. - `\(R\)` is `\(m\times n\)`, upper triangular. For (non-square) tall matrices `\(A\)` with `\(m > n\)`, the last ($m-n$) rows of `\(R\)` are all zero, so we can express `\(A\)` as: $$ A = QR = (Q_1, Q_2) \left(\begin{array}{cc}R_1 \\ \mathbf{0} \end{array}\right) = Q_1R_1.$$ --- # QR algorithm This algorithm computes an upper triangular matrix `\(S\)` and a unitary matrix `\(Q\)` such that `\(A = QSQ^T\)` is the Schur decomposition of `\(A\)`. 1. Set `\(A_0 := A\)`. 2. for `\(k = 1, 2, \cdots\)`, - `\(A_{k-1} = Q_{k-1}R_{k-1}.\)` - `\(A_k := R_{k-1}Q_{k-1}.\)` 3. Set `\(S := A_{\infty}.\)` --- # Solution to `\(Ax=b\)` by QR If we have `\(A=QR\)` or (even better) the economy form `\(A=Q_1R_1\)`, then the linear system `\(Ax = b\)` can be easily solved: `$$Ax = b$$` `$$(Q_1R_1)x=b$$` `$$R_1x = Q_1^Tb$$` and `\(x\)` is found through back substitution. --- # Computation of the SVD by QR Just as we can use `\(A=QR\)` to avoid calculating `\(A^TA\)` in the normal equations, we can also use QR decomposition to solve the eigenvalue problems for `\(A^TA\)` and `\(AA^T\)` to obtain the SVD of `\(A\)`. **Check out `lm()` in R.** --- # Lab 12 Write your own `lm()` function using QR and test your function on the `bodyfat` data. --- # Further readings - Iterative Methods for Computing Eigenvalues and Eigenvectors by Maysum Panju at University of Waterloo. Available [here](http:/yanfei.site/docs/bsc/svdnum.pdf). - Computational Statistics by James E. Gentle in 2009.