M1399.000200 Homework 3

Due Nov 16, 2021 @ 11:59PM

Instruction: only submit README.md, this .ipynb file with your answers added (name change is not allowed), and .jl files in case you include them in this .ipynb file. Especially, do not put data files on Git. No other files will not be graded.

Q1. Eigenvalue computation

1) Let $\mathbf{A}$ be a symmetric $n\times n$ matrix. Show that a real number $z \in \mathbb{R}$ is a Rayleigh quotient if and only if $z$ is a diagonal entry of $\mathbf{Q}^T\mathbf{A}\mathbf{Q}$, where $\mathbf{Q} \in \mathbb{R}^{n\times n}$ is an orthogonal matrix.

2) Recall that the shifted inverse iteration depends on solving a linear system of the form $(\mathbf{A} - \sigma\mathbf{I})\mathbf{x} = \mathbf{v}$ where $\sigma$ is close to an eigenvalue of a symmetric matrix $\mathbf{A}$. That is, the linear system is likely ill-conditioned. How can then the algorithm work? \
Suppose without loss of generality that $\mathbf{A}$ has an eigenvalue very small in absolute value. Take shift $\sigma=0$. Suppose $\mathbf{v}$ is a vector with components in the directions of all the eigenvectors $\mathbf{u}_1, \dotsc, \mathbf{u}_n$ of $\mathbf{A}$, associated with eigenvalues $|\lambda_1| \ll |\lambda_2| \le \dotsc \le |\lambda_n|$. Also suppose that linear system $\mathbf{A}\mathbf{x} = \mathbf{v}$ is solved backward stably, yielding a computed solution $\tilde{\mathbf{x}}$. Show that although $\tilde{\mathbf{x}}$ may be far from $\mathbf{x} = \mathbf{A}^{-1}\mathbf{v}$, $\tilde{\mathbf{x}}/\Vert \tilde{\mathbf{x}} \Vert$ will not be far from $\mathbf{x} / \Vert \mathbf{x} \Vert$. \ (Hint. Using the matrix calculus rule, $d(\mathbf{A}^{-1}) = -\mathbf{A}^{-1}(d\mathbf{A})\mathbf{A}^{-1}$, or $$ (\mathbf{A} + \delta\mathbf{A})^{-1} = \mathbf{A}^{-1} - \mathbf{A}^{-1}(\delta\mathbf{A})\mathbf{A}^{-1} + O(\Vert\delta\mathbf{A}\Vert^2) , $$ where $\delta\mathbf{A}$ is the roundoff error.)

Q2. MNIST and PCA

This problem is about automatic recognition of handwritten digits. The dataset to be used is the famous MNIST dataset available at

https://www.kaggle.com/oddrationale/mnist-in-csv

The dataset mnist_train.csv contains $n = 60, 000$ images for training. An image is a 28 * 28 gray-level array. Each image is stored as a line of the file mnist_test.csv, in the format

label, pix(1, 1), pix(1, 2), pix(1, 3), ..., pix(28, 28)

where label is a label corresponding to the digit represented by the image (0, 1, ..., 9), and pix(r, c) is the pixel intensity at row r, column c (an integer between 0 and 255 (inclusive)).

The $i$-th image in a dataset is considered a vector $\mathbf{x}_i \in \mathbb{R}^p$, $p = 28\times 28 = 784$.

1) A widely used approach towards digit recognition is principal component analysis. Compute the sample covariance matrix $$ \hat{\Sigma} = \frac{1}{n}\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i^T. $$ \ after standardization. Let $V \in \mathbb{R}^{d\times r}$ be the matrix whose columns are given by the eigenvectors of $\Sigma$ associated with the $r = 10$ largest eigenvalues. \ Plot the $r$ principal components as 28 * 28 pixels images. Scale them as to make them visible.

2) Consider the case $r = 10$ and pick a random image $\mathbf{x}_i$ for each digit 0,1,...,9 from mnist_test.csv that contains $n=10,000$ images for testing. Plot $$ \mathbf{y}_i = \mathbf{V}^T\mathbf{x}_i, $$ the loadings of each image $\mathbf{x}_i$ in terms of the principal components $\mathbf{v}_1,\dotsc,\mathbf{v}_r$. Discuss the possibility of the use of the loadings for digit recognition.

Q3. PageRank

Perron-Frobenius theory states that if a matrix $\mathbf{A}$ has positive entries, then there exists a positive dominant eigenvalue-eigenvector pair, i.e., $\mathbf{A}\mathbf{v}_0 = \lambda_0\mathbf{v}_0$ for some $\lambda_0 > 0$ and $\mathbf{v}_0 > 0$, and if $\lambda\neq\lambda_0$ is an eigenvalue of $\mathbf{A}$, then $|\lambda| < \lambda_0$. Also, the geometric and algebraic multiplicity of $\lambda_0$ is 1. That is, $\text{dim}(\text{span}\{\mathbf{v}: \mathbf{A}\mathbf{v}=\lambda_0\mathbf{v}\})=1$, and $\lambda_0$ is the unique root of $\text{det}(\mathbf{A}-\lambda\mathbf{I})=0$.

In class, we defined the PageRank problem as an eigenvalue problem associated with the transition matrix $\mathbf{P}$ of a Markov chain. We assume the random jump probability $p > 0$ (the inequality is elementwise).

1) Express $\mathbf{P}$ as a sum of a sparse matrix and a rank-1 matrix.

2) Show that the equation $(\mathbf{I}-\mathbf{P}^T)\mathbf{x}=\mathbf{0}$ has a unique solution up to scaling. Also show that the solution $\mathbf{x} > 0$.

3) Obtain the connectivity matrix $\mathbf{A}$ from the SNAP/web-Google data in the MatrixDepot package.

4) Compute summary statistics:

- number of pages
- number of edges
- number of dangling nodes (pages with no out links)
- histogram of in-degrees
- which page has the maximum in-degree?
- histogram of out-degrees
- which page has the maximum out-degree?
- visualize the sparsity pattern of $\mathbf{A}$ using `UnicodePlots.jl`.    

5) Set $p=0.8$ and solve the PageRank problem with the above $\mathbf{A}$ using

- An iterative linear system solver such as [GMRES](https://iterativesolvers.julialinearalgebra.org/dev/linear_systems/gmres/).
- An iterative EVD solver such as Arnoldi method. (Check out the `Lanczos/Arnoldi iterative method for top eigen-pairs` section in the lecture notes on EVD.)

\ For iterative methods, use the IterativeSolvers.jl and Arpack.jl packages. Make sure to utilize the special structure of $\mathbf{P}$ to speed up the matrix-vector multiplication. (Hint: LinearMaps.jl)

Q4. Fenchel conjugate

For an extended real-valued function $f: \mathbb{R}^n \to \mathbb{R} \cup \{\infty\}$, its Fenchel conjugate is defined as $$ f^*(\mathbf{y}) := \sup_{\mathbf{x}\in\mathbb{R}^n}\{\mathbf{x}^T\mathbf{y} - f(\mathbf{x})\} . $$

1) Show that $f^*$ is closed and convex regardless of closedness and convexity of $f$. (Hint. $f^*$ is closed iff $\text{epi}f^*$ is a closed set. A set is closed iff any convergent sequence in the set has the limit in the set.)

2) Prove Fenchel's inequality: $f(\mathbf{x}) + f^*(\mathbf{y}) \ge \mathbf{x}^T\mathbf{y}$ for all $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$.

3) For a given norm $\Vert \cdot \Vert$ on $\mathbb{R}^n$, the dual norm $\Vert \cdot \Vert_*$ is defined as $\Vert \mathbf{y} \Vert_* = \sup_{\Vert \mathbf{x} \Vert \le 1} \mathbf{x}^T\mathbf{y}$.

1. Show that $\Vert \cdot \Vert_*$ is a norm.
2. Show that the Fenchel conjugate of the norm function $f(\mathbf{x}) = \Vert \mathbf{x} \Vert$ is the indicator function of the unit ball with respect to the dual norm $\Vert \cdot \Vert_*$, i.e,
$$ f^*(\mathbf{y}) = \begin{cases} 0, & \Vert \mathbf{y} \Vert_* \le 1, \\ \infty, & \text{otherwise}. \end{cases} $$

4) The biconjugate $f^{**}$ of $f$ is the Fenchel conjugate of $f^*$. Show that, if $f$ is closed and convex, then $f^{**} = f$. (Hint. Show $\text{epi}f^{**} = \text{epi}f$ by using the separating hyperplane theorem.)

5) Show that $\Vert \mathbf{x} \Vert = \sup_{\Vert \mathbf{y} \Vert_* \le 1} \mathbf{x}^T\mathbf{y}$. That is, the dual of the dual norm is the original norm.

6) Show that for any $\mathbf{x}\in\mathbb{R}^n$, there exists a nonzero $\mathbf{z}\in\mathbb{R}^n$ such that $|\mathbf{z}^T\mathbf{x}| = \Vert \mathbf{z}\Vert_*\Vert\mathbf{x}\Vert$.

7) Show that the condition number of problem $f: \mathbf{A} \mapsto \mathbf{A}^{-1}\mathbf{b}$, where $\mathbf{A}$ is an invertible matrix, is $$ \kappa = \Vert\mathbf{A}\Vert\Vert\mathbf{A}^{-1}\Vert. $$ (Hint. Denote $\mathbf{A}^{-1}\mathbf{b}$ by $\mathbf{x}$. Show that there is a rank-1 matrix $\delta \mathbf{A} \in \mathbb{R}^{n\times n}$ such that $$ \Vert\mathbf{A}^{-1}(\delta \mathbf{A})\mathbf{x} \Vert = \Vert\mathbf{A}^{-1}\Vert\Vert\delta\mathbf{A}\Vert\Vert\mathbf{x}\Vert . $$ Use part 6.)

Q5. LP

We want to solve an LP $$ \begin{array}{ll} \text{minimize} & \mathbf{c}^T\mathbf{x} \\ \text{subject to} & \mathbf{A}\mathbf{x}=\mathbf{b} \\ & \mathbf{x} \ge \mathbf{0} \end{array} $$ with $$ \mathbf{A} = \begin{bmatrix} 2 & 0 & 0 & 1 & 0 & 0 \\ 0 & 2 & 0 & 0 & 1 & 0 \\ 0 & 0 & 2 & 0 & 0 & 1 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, \quad \mathbf{c} = (-1, -1, -1, 0, 0, 0)^T. $$

1) Find the solution to the above LP by hand.

2) Find the solution to the above LP using the linprog R package (Your code should run on Julia).

3) Find the solution to the above LP using Convex.jl with your favorite solver.

Q6. Lasso and Danzig

1) Formulate the Danzig selector problem as an LP.

2) Characterize $t_{\max}$, for which if $t \ge t_{\max}$, the solution to the Danzig selector problem does not vary.

3) Download the prostate cancer datae dataset from https://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.data. The response variable is log of prostate-specific antigen (lpsa). The predictors are log cancer volume (lcavol), log prostate weight (weight), age, log of amount of benign prostatic hyperplasia (lbph), seminal vesicle invasion (svi), log of capsular penetration (lcp), Gleason score (gleason), and percent of Gleason scores 4 or 5 (pgg45). More information can be found in https://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.info.txt; see also Section 3.2.1 of Elements of Statistical Learning by Hastie et al. (2009). \ Standardize predictors to have mean 0 and variance 1 and add intercept. Then split data into the training and test sets by 8 to 2. Fit the Dantig selector model for $t=0, 0.05t_{\max}, 0.1 t_{\max}, \dotsc, t_{\max}$ on the training data and plot the solution path. Also plot the prediction error on the test data over $t$.

4) Repeat 3 for the lasso.