Statistics, the science of data analysis, is the applied mathematics in the 21st century.
People (scientists, goverment, health professionals, companies) collect data in order to answer certain questions. Statisticians's job is to help them extract knowledge and insights from data.
Must-read for students of statistics:
If existing software tools readily solve the problem, use them.
Often statisticians need to implement their own methods, test new algorithms, or tailor classical methods to new types of data (big, streaming).
This entails at least two essential skills: programming and fundamental knowledge of algorithms.
Not a course on statistical packages. It does not answer questions such as How to fit a linear mixed model in R, Julia, SAS, SPSS, or Stata?
Not a pure programming course, although programming is important and we do homework in Julia.
Undergraduate course 326.312 (Statistical Computing and Labs), taught concurrently in this semester, focuses on programming in R.
Not a course on data science. My previous course 326.621a-2018 (Introduction to Data Science) focused on some software tools for data scientists.
This course focuses on algorithms, mostly those in numerical linear algebra and numerical optimization.
To quote James Gentle
The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.
For a common numerical task in statistics, say solving the least squares problem $$ \widehat \beta = ({\bf X}^T {\bf X})^{-1} {\bf X}^T {\bf y}, $$ we need to know which methods/algorithms are out there and what are their advantages and disadvantages. You will fail this course if you use
inv(X' * X) * X' * y
Using X \ y
in Julia/Matlab (or solve(X, y)
in R) is correct but not the purpose of this course. We want to understand what computer is doing when calling X \ y
.
Course webpage: https://won-j.github.io/M1399_000200-2020fall/.
Check the Schedule and Announcements sections frequently.
Jupyter notebooks will be posted before each lecture.
This lecture note has evolved from Dr. Hua Zhou's 2019 Winter Statistical Computing course notes available at http://hua-zhou.github.io/teaching/biostatm280-2019spring/index.html.