M1399.000200 Homework 2¶

Due Oct 18, 2020 @ 11:59PM¶

Instruction¶

Only submit a README.md file that identify you, this notebook file (hw2.ipynb) with your answers, and the html export of the notebook (hw2.html), along with all code (*.jl) that are necessary to reproduce the results. No other files will not be graded.
Before the due date, commit your master branch. The teaching assistant and the instructor will check out your committed master branch for grading. Commit time will be used as your submission time. That means if you commit your Homework 1 submission after the deadline, penalty points will be deducted for late submission according to the syllabus.
To save the storage, do not put data files on Git.
Start early. The time to be spent on coding is always underestimated.

Q1. Condition number¶

Let $\|\cdot\|$ be a norm on $\mathbb{R}^n$. It can be shown that for any $x\in\mathbb{R}^n$, there exists a nonzero $z\in\mathbb{R}^n$ such that $|z^Tx| = \|z\|_*\|x\|$. Here, $\|\cdot\|_{*}$ is the dual norm of $\|\cdot\|$, defined as $$ \|z\|_* = \sup_{x \neq 0} \frac{|z^Tx|}{\|x\|} = \sup_{\|x\|\le 1} z^Tx . $$

Let $x, y \in \mathbb{R}^n$ with $\|x\|=\|y\|=1$ is given. Show that there exists a rank-1 matrix $M=yz^T$ such that $Mx=y$ and $\|M\|=1$, where $\|M\|$ is the matrix norm of $M$ induced by the vector norm $\|\cdot\|$.
Let $A\in\mathbb{R}^{n\times n}$ is invertible and $b=\mathbb{R}^n$. Denote $A^{-1}b$ by $x$. Show that there is a matrix $\delta A \in \mathbb{R}^{n\times n}$ such that $$ \|A^{-1}(\delta A)x \| = \|A^{-1}\|\|\delta A\|\|x\| . $$
Show that the condition number of problem $f: A \mapsto A^{-1}b$, where $A$ is an invertible matrix, is $$ \kappa = \|A\|\|A^{-1}\|. $$

Q2. Gaussian Elimination¶

Let $A\in\mathbb{R}^{n\times n}$ be nonsingular. Show that $A$ has an LU decomposition if and only if for each $k$ with $1\le k \le n$, the upper left $k\times k$ block A[1:k,1:k] is nonsingular. Prove that this LU decomopsition is unique.
Suppose the $A$ above is strictly column diagonally dominant, i.e., $$ |a_{kk}| > \sum_{j\neq k}|a_{jk}|. $$ Show that if Gaussian elimination with partial pivoting is applied to $A$, no row interchange take place.

Q3. LU Decomposition¶

Implement the LU decomposition in Julia.

Write function LUdecomp! with interface
```
LUdecomp!(A::Matrix{T}; tol::T=convert(T, 1e-8)) where T <: AbstractFloat
```
\ The decomposition must be done in place. That is, if $A=LU \in \mathbb{R}^{n\times n}$, the $U$ should overwrite the upper triangular part of the input matrix A, and the strictly lower triangular part of A should be overwritten by the same part of the $L$ matrix computed. (Where does the diagonal part of $L$ go?) You must also implement partial pivioting: function LUdecomp! must return
```
Tuple{Array{Int64,1},Int64}
```
\ where the first element of the tuple is the array of the permutation indexes of the rows, and the second element is the indicator of success: if A is (numerically) singular, the function must return the row index where singularity occurs (where may a singularity occur in the LU decompostion?) as the second return value; otherwise return 0. Use tol to determine the singularity.
Using the LUdecomp! function written above, write function LUsolve! that solves the linear equation $Ax = b$ with interface
```
LUsolve!(A::Matrix{T}, b::Vector{T}) where T <: AbstractFloat
```
\ again in place. That is, in addition to A overwritten by LUdecomp!, vector b should be overwitten by the solution $A^{-1}b$. Then, write a wrapper function
```
LUsolve(A::Matrix{T}, b::Vector{T}) where T <: AbstractFloat
```
\ which does not alter neither A nor b but solves $Ax=b$ by calling LUsolve!. Compare your results with the Julia expression A \ b.

Write your functions in a separate .jl file within your branch and have your IJulia notebook to include this file.

Q4. QR decomposition¶

Show that for every $\mathbf{X}\in\mathbb{R}^{n\times p}$, its full QR decomposition exists. Also show that the reduced QR decomposition of $\mathbf{X}$ is unique if $\mathbf{X}$ is of full column rank and the diagonal entries of $\mathbf{R}$ are required to be positive.
From the lecture note on QR decompostion, explain why classical Gram-Schmidt (cgs()) fails with the given matrix A.
From the same lecture note, explain why both classical and modified Gram-Schmidt (cgs() and mgs()) fails with the given matrix B.
Implement the Householder QR decomposition in Julia. The algorithm should be in-place: let the $\mathbf{R}$ matrix occupy the upper triangular part of the input $\mathbf{X}\in\mathbf{R}^{n\times p}$. Below the diagonal place the vectors $\mathbf{u}_k$ that define the Householder transformation matrix $\mathbf{H}_k=\mathbf{I}-2\mathbf{u}_k\mathbf{u}_k^T/\mathbf{u}_k^T\mathbf{u}_k$. By setting the first element of $\mathbf{u}_k$ to 1, you can fit in these vectors in $\mathbf{X}$. The algorithm should return an additional vector storing the values of $\mathbf{u}_k^T\mathbf{u}_k$. This is how the LAPACK routine geqrf is designed. Note that $\mathbf{Q}$ can be recovered from $\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_p$. The function interface should be
```
function householder!(X::Matrix{T}) where T<:AbstractFloat
```
\ returning
```
Tuple{Array{T,2},Array{T,1}}
```
\ Write a separate routine
```
function recoverQ(X::Matrix{T}) where T<:AbstractFloat
```
\ that recovers $Q$. (Hint. For $\mathbf{P}_i=\mathbf{I}-2\mathbf{v}_k\mathbf{v}_k^T$ with $\|\mathbf{v}_k\|=1$, $\mathbf{P}_1\mathbf{P}_2 \cdots \mathbf{P}_{n} = \mathbf{I}- \mathbf{V}\mathbf{T}\mathbf{V}^T$, where $\mathbf{V}=[\mathbf{v}_1 | \mathbf{v}_2 | \dotsb | \mathbf{v}_n]$ for some upper triangular matrix $\mathbf{T}$.)

Q5. Big regression¶

For a big data problem, you should choose the right method in order for the analysis even possible. Visit https://doi.org/10.7910/DVN/HG7NV7. The website hosts data on 123 million flights over 22 years.

The data for each year contains 29 variable. Assuming that all variables for double precision floating point numbers, estimate the size of the design matrix ($\mathbf{X}$) needed to analyze the entire 22 years of flights. Will it fit into your computer's memory?
We are interested in how the time gain of a flight, defined as Gain = DepDelay - ArrDelay, depends on the distance traveled (Distance), departure delay (DepDelay), and carrier (UniqueCarrier).

We want to fit a linear regression Gain ~ 1 + Distance + DepDelay + UniqueCarrier using data from 2001-2008. Note UniqueCarrier is a factor with 23 levels: "9E", "AA", "AQ", "AS", "B6", "CO", "DH", "DL", "EV", "F9", "FL", "HA", "HP", "MQ", "NW", "OH", "OO", "TZ", "UA", "US", "WN", "XE", and "YV". We use the dummy coding with "9E" as base level.

Estimate the size of the design matrix $\mathbf{X}$ in double precision.
Review the lecture note on Linear Regression and choose one method in the table to solve the linear regression. The goal is to report the estimated regression coefficients $\widehat \beta$, estimated variance $\widehat \sigma^2 = \sum_i (y_i - \widehat y_i)^2 / (n - p)$, and standard errors of $\widehat \beta$.
Download data from years 2001-2008. Each year's data set is compressed in bz2 format (e.g., 2008.csv.bz2). Do not uncompress the files into your directory. Instead, you will load each compressed file into Julia's memory and uncompress it on-the-fly to read it line-by-line. Use TranscodingStreams.jl in conjunction with CodecBzip2.jl to accomplish this. Your code should look like
```
using CodecBzip2

 for year in 2001:2008
     fn = string(year) * ".csv.bz2"
     stream = Bzip2DecompressorStream(open(fn))
     for line in eachline(stream)
         # do something...
     end
     close(stream)
 end
 # do something else...
```
\ Please do not put data files in Git. For grading purpose (reproducibility), we will assume that data files are in the same directory as the Jupyter notebook file.

Find out how many data points are in each year.

Hint: It took my laptop less than 10 minutes to import data and fit linear regression.

Q6. MNIST¶

This problem is about automatic recognition of handwritten digits. The dataset to be used is the famous MNIST dataset available at

https://www.kaggle.com/oddrationale/mnist-in-csv

The dataset mnist_train.csv contains $n = 60, 000$ images for training. An image is a 28 * 28 gray-level array. Each image is stored as a line of the file mnist test.csv, in the format

label, pix(1, 1), pix(1, 2), pix(1, 3), ..., pix(28, 28)

where label is a label corresponding to the digit represented by the image (0, 1, ..., 9), and pix(r, c) is the pixel intensity at row r, column c (an integer between 0 and 255 (inclusive)).

The $i$-th image in a dataset is considered a vector $\mathbf{x}_i \in \mathbb{R}^p$, $p = 28\times 28 = 784$.

A widely used approach towards digit recognition is principal component analysis. Compute the sample covariance matrix $$ \hat{\Sigma} = \frac{1}{n}\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i^T. $$ \ after standardization. Let $V \in \mathbb{R}^{d\times r}$ be the matrix whose columns are given by the eigenvectors of $\Sigma$ associated with the $r = 10$ largest eigenvalues.

Plot the $r$ principal components as 28 * 28 pixels images. Scale them as to make them visible.
Consider the case $r = 10$ and pick a random image $\mathbf{x}_i$ for each digit 0,1,...,9 from mnist_test.csv that contains $n=10,000$ images for testing. Plot $$ \mathbf{y}_i = \mathbf{V}^T\mathbf{x}_i, $$ the loadings of each image $\mathbf{x}_i$ in terms of the principal components $\mathbf{v}_1,\dotsc,\mathbf{v}_r$. Discuss the possibility of the use of the loadings for digit recognition.