\(\DeclareMathOperator*{\argmin}{arg\,min}\)
This lecture note is based on Dr. Hua Zhou’s 2018 Winter Statistical Computing course notes available at http://hua-zhou.github.io/teaching/biostatm280-2018winter/index.html.
This course introduces some computing skills and software tools for handling potentially big data.
Statistics, the science of data analysis, is the applied mathematics in the 21st century.
Data is increasing in volume, velocity, and variety.
Data Size | Bytes | Storage Mode |
---|---|---|
tiny | \(10^2\) | piece of paper |
small | \(10^4\) | a few pieces of paper |
medium | \(10^6\) (MB) | a floppy disk |
large | \(10^8\) | hard disk |
huge | \(10^9\) (GB) | hard disk(s) |
massive | \(10^{12}\) (TB) | hard disk(s); RAID storage |
This course introduces some computing skills and software tools for handling potentially big data.
Read syllabus for a tentative list of topics and course logistics.
Huber, Peter J. 1994. “Huge Data Sets.” In COMPSTAT 1994 (Vienna), 3–13. Heidelberg: Physica.
———. 1996. “Massive Data Sets Workshop: The Morning After.” In Massive Data Sets: Proceedings of a Workshop, 169–84. Washington: National Academy Press.