Introduction

Advanced Statistical Computing

Joong-Ho Won

Seoul National University

Fall 2023

Course Overview

What is this course about?

  • Often statisticians need to implement their own methods, test new algorithms, or tailor classical methods to new types of data (big, streaming).

  • This entails at least two essential skills: programming and fundamental knowledge of algorithms.

I would in either case call algorithms the statistical skeleton.

Bühlmann, P., & Van De Geer, S. (2018). Statistics for big data: A perspective. Statistics & Probability Letters, 136, 37–41.

  • This course focuses on algorithms, mostly those in numerical linear algebra and numerical optimization.

What is this course NOT about?

  • Not a course on statistical packages. It does not answer questions such as How to fit a linear mixed model in R, Julia, SAS, SPSS, or Stata?

  • Not a pure programming course, although programming is important and we do homework in Julia.
    Undergraduate course 326.312 (Statistical Computing and Labs) focuses on programming in R.

  • Not a course on data science. My previous course 326.621a-2018 (Introduction to Data Science) focused on some software tools for data scientists.

Course goals

The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.

  • For a common numerical task in statistics, say solving the weighted least squares problem \[ \widehat \beta = ({\bf X}^T {\bf W} {\bf X})^{-1} {\bf X}^T{\bf W}{\bf y}, \] where \(\mathbf{W}=\text{diag}(w)\) is diagonal, we need to know which methods/algorithms are out there and what are their advantages and disadvantages.

  • You will fail this course if you (syntactically correctly) code inv(X' * W * X) * X' * W * y

Experiment

versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 2 on 8 virtual cores
using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
Status `~/Dropbox/class/M1399.000200/2023/M1300_000200-2023fall/lectures/01-intro/Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.2
  [505f98c9] InplaceOps v0.3.0

Simulate data

using LinearAlgebra, Random

# Random seed for reproducibility
Random.seed!(2023)
# samples, features
n, p = 5000, 100
# design matrix
X = [ones(n) randn(n, p - 1)]
# responses
y = randn(n)
# weights
w = 0.25 * rand(n);

Method 1

# method 1 
res1 = inv(X' * diagm(w) * X) * X' * diagm(w) * y
100-element Vector{Float64}:
  0.03893958882251126
  0.03173648898397523
  0.016186796334722572
  0.00358850210061397
 -0.0044022378766381785
  0.024896291106910624
  0.03953100089520529
  0.0013222671721829441
 -0.014988787442492648
 -0.009560194750918379
  0.0011105379474297915
  0.01540484587563391
  0.0126857414133708
  ⋮
  0.011669004887653427
 -0.02802344657262936
  0.03177302433513665
  0.014037511751114791
  0.004110124419340505
 -0.018736555772605728
 -0.008352804824917221
  0.03330393468803309
 -0.004453763082797064
  0.022698106424626115
 -0.004316953749865081
 -0.008959026076773076

Benchmark Method 1

using BenchmarkTools

bm1 = @benchmark ((inv($X' * diagm($w) * $X) * $X') * diagm($w)) * $y
bm1
BenchmarkTools.Trial: 15 samples with 1 evaluation.
 Range (minmax):  296.434 ms395.980 ms   GC (min … max): 4.00% … 6.71%
 Time  (median):     348.685 ms                GC (median):    7.62%
 Time  (mean ± σ):   348.192 ms ±  27.163 ms   GC (mean ± σ):  7.51% ± 1.09%
  ▁       ▁        █   ▁     ▁   ▁▁▁          ▁▁▁ ▁          ▁  
  █▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁█▁▁▁▁▁█▁▁▁███▁▁▁▁▁▁▁▁▁▁███▁█▁▁▁▁▁▁▁▁▁▁█ ▁
  296 ms           Histogram: frequency by time          396 ms <
 Memory estimate: 393.12 MiB, allocs estimate: 18.

Method 2

# preallocation
XtWt = Matrix{Float64}(undef, p, n)
XtX = Matrix{Float64}(undef, p, p)
Xty = Vector{Float64}(undef, p)

using InplaceOps

function inplaceWLS(X, y, w, XtWt, XtX, Xty)
    # XtWt = X' * W
    @! XtWt = transpose(X) * Diagonal(w);
    # XtX = X' * W * X
    @! XtX = XtWt * X
    # Xty = X' * W * z
    @! Xty = XtWt * y
    # Cholesky on XtX
    LAPACK.potrf!('U', XtX)
    # Two triangular solves to solve (XtX) \ (Xty)
    BLAS.trsv!('U', 'T', 'N', XtX, Xty)
    BLAS.trsv!('U', 'N', 'N', XtX, Xty)
end

Benchmark Method 2

First check correctness of Method 2

res2 = inplaceWLS(X, y, w, XtWt, XtX, Xty)
norm(res2 - res1)
2.7405547252184867e-16
bm2 = @benchmark inplaceWLS($X, $y, $w, $XtWt, $XtX, $Xty)
bm2
BenchmarkTools.Trial: 1919 samples with 1 evaluation.
 Range (minmax):  2.100 ms  4.082 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     2.390 ms                GC (median):    0.00%
 Time  (mean ± σ):   2.592 ms ± 386.829 μs   GC (mean ± σ):  0.00% ± 0.00%
            ▆█                                                
  ▂▂▃▄▅▇▅▄▃▇██▅▄▃▃▂▂▂▂▂▁▁▁▁▁▁▂▁▂▂▂▂▂▂▂▃▂▃▃▃▃▃▃▂▂▂▂▃▂▂▂▂▂▂▁▂ ▂
  2.1 ms          Histogram: frequency by time        3.49 ms <
 Memory estimate: 0 bytes, allocs estimate: 0.

Course goals (cont’d)

  • “Do not reinvent the wheel” mind: if existing software tools readily solve the problem, then use it. But be aware the inner workings.
    • Least squares: inplaceWLS(). Will it work for a sample of \(n=10^7\) and \(p=10^9\)?
    • Optimization: recognize major classes of optimization problems and choose an appropriate solver. Will it work for a sample of \(n=10^5\) and \(p=10^6\)?

Course logistics