Introduction

Advanced Statistical Computing

Joong-Ho Won

Seoul National University

Fall 2023

Course Overview

Tue/Thu 11:00am-12:15pm @ 25-210
Instructor: Joong-Ho Won wonj@stats.snu.ac.kr

What is this course about?

Often statisticians need to implement their own methods, test new algorithms, or tailor classical methods to new types of data (big, streaming).
This entails at least two essential skills: programming and fundamental knowledge of algorithms.

I would in either case call algorithms the statistical skeleton.

Bühlmann, P., & Van De Geer, S. (2018). Statistics for big data: A perspective. Statistics & Probability Letters, 136, 37–41.

This course focuses on algorithms, mostly those in numerical linear algebra and numerical optimization.

What is this course NOT about?

Not a course on statistical packages. It does not answer questions such as How to fit a linear mixed model in R, Julia, SAS, SPSS, or Stata?
Not a pure programming course, although programming is important and we do homework in Julia.
Undergraduate course 326.312 (Statistical Computing and Labs) focuses on programming in R.
Not a course on data science. My previous course 326.621a-2018 (Introduction to Data Science) focused on some software tools for data scientists.

Course goals

To learn in mind the quote by James Gentle

The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.

For a common numerical task in statistics, say solving the weighted least squares problem \[ \widehat \beta = ({\bf X}^T {\bf W} {\bf X})^{-1} {\bf X}^T{\bf W}{\bf y}, \] where \(\mathbf{W}=\text{diag}(w)\) is diagonal, we need to know which methods/algorithms are out there and what are their advantages and disadvantages.
You will fail this course if you (syntactically correctly) code inv(X' * W * X) * X' * W * y

Experiment

versioninfo()

Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 2 on 8 virtual cores

using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()

Status `~/Dropbox/class/M1399.000200/2023/M1300_000200-2023fall/lectures/01-intro/Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.2
  [505f98c9] InplaceOps v0.3.0

Simulate data

using LinearAlgebra, Random

# Random seed for reproducibility
Random.seed!(2023)
# samples, features
n, p = 5000, 100
# design matrix
X = [ones(n) randn(n, p - 1)]
# responses
y = randn(n)
# weights
w = 0.25 * rand(n);

Method 1

# method 1 
res1 = inv(X' * diagm(w) * X) * X' * diagm(w) * y

100-element Vector{Float64}:
  0.03893958882251126
  0.03173648898397523
  0.016186796334722572
  0.00358850210061397
 -0.0044022378766381785
  0.024896291106910624
  0.03953100089520529
  0.0013222671721829441
 -0.014988787442492648
 -0.009560194750918379
  0.0011105379474297915
  0.01540484587563391
  0.0126857414133708
  ⋮
  0.011669004887653427
 -0.02802344657262936
  0.03177302433513665
  0.014037511751114791
  0.004110124419340505
 -0.018736555772605728
 -0.008352804824917221
  0.03330393468803309
 -0.004453763082797064
  0.022698106424626115
 -0.004316953749865081
 -0.008959026076773076

Benchmark Method 1

using BenchmarkTools

bm1 = @benchmark ((inv($X' * diagm($w) * $X) * $X') * diagm($w)) * $y
bm1

BenchmarkTools.Trial: 15 samples with 1 evaluation.
 Range (min … max):  296.434 ms … 395.980 ms  ┊ GC (min … max): 4.00% … 6.71%
 Time  (median):     348.685 ms               ┊ GC (median):    7.62%
 Time  (mean ± σ):   348.192 ms ±  27.163 ms  ┊ GC (mean ± σ):  7.51% ± 1.09%
  ▁       ▁        █   ▁     ▁   ▁▁▁▁          ▁▁▁ ▁          ▁  
  █▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁█▁▁▁▁▁█▁▁▁████▁▁▁▁▁▁▁▁▁▁███▁█▁▁▁▁▁▁▁▁▁▁█ ▁
  296 ms           Histogram: frequency by time          396 ms <
 Memory estimate: 393.12 MiB, allocs estimate: 18.

Method 2

# preallocation
XtWt = Matrix{Float64}(undef, p, n)
XtX = Matrix{Float64}(undef, p, p)
Xty = Vector{Float64}(undef, p)

using InplaceOps

function inplaceWLS(X, y, w, XtWt, XtX, Xty)
    # XtWt = X' * W
    @! XtWt = transpose(X) * Diagonal(w);
    # XtX = X' * W * X
    @! XtX = XtWt * X
    # Xty = X' * W * z
    @! Xty = XtWt * y
    # Cholesky on XtX
    LAPACK.potrf!('U', XtX)
    # Two triangular solves to solve (XtX) \ (Xty)
    BLAS.trsv!('U', 'T', 'N', XtX, Xty)
    BLAS.trsv!('U', 'N', 'N', XtX, Xty)
end

Benchmark Method 2

First check correctness of Method 2

res2 = inplaceWLS(X, y, w, XtWt, XtX, Xty)
norm(res2 - res1)

2.7405547252184867e-16

bm2 = @benchmark inplaceWLS($X, $y, $w, $XtWt, $XtX, $Xty)
bm2

BenchmarkTools.Trial: 1919 samples with 1 evaluation.
 Range (min … max):  2.100 ms …   4.082 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.390 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.592 ms ± 386.829 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
            ▆█▁                                                
  ▂▂▃▄▅▇▅▄▃▇███▅▄▃▃▂▂▂▂▂▂▁▁▁▁▁▁▂▁▂▂▂▂▂▂▂▃▂▃▃▃▃▃▃▂▂▂▂▃▂▂▂▂▂▂▁▂ ▂
  2.1 ms          Histogram: frequency by time        3.49 ms <
 Memory estimate: 0 bytes, allocs estimate: 0.

Course goals (cont’d)

“Do not reinvent the wheel” mind: if existing software tools readily solve the problem, then use it. But be aware the inner workings.
- Least squares: inplaceWLS(). Will it work for a sample of \(n=10^7\) and \(p=10^9\)?
- Optimization: recognize major classes of optimization problems and choose an appropriate solver. Will it work for a sample of \(n=10^5\) and \(p=10^6\)?

Course logistics

Course webpage: https://won-j.github.io/M1300_000200-2023fall/.
Syllabus.
Check the Schedule and Announcements sections frequently.
Jupyter notebooks will be posted before each lecture.