Statistics as a discipline exists to extract meaningful information from data. Thus statistics and statisticians now play an ever more central role in a data-driven world. Statistics is the data science. Applied statistics aims at answering domain questions by collecting and analyzing data through statistical critical thinking and statistical methodology. Machine learning is a recent and thriving field of statistics that explicitly takes computation into account and its researchers affiliate with departments of computer science, statistics, electrical engineering, and mathematics.
This two-week course will introduce participants to a broad array of modern statistical concepts and techniques with a focus on critical thinking and practical data analysis. Each morning will comprise two lectures while the afternoon sessions will take place in a lab setting. We will make extensive use of the statistical software R and students are expected to have at least rudimentary knowledge of R prior to the course. The course will cover exploratory data analysis (visualization, dimension reduction, clustering), statistical modeling (linear models, generalized linear models, logistic regression, graphical models), and statistical computation (Monte Carlo, Markov chain Monte Carlo, convex optimization). We will also cover regularized and large-scale modeling techniques such as boosting and the lasso as well as model averaging techniques. We will consider both frequentist and Bayesian perspectives. Specific applications we will consider include inference from large-scale observational healthcare data, localization in wireless networks, fMRI brain signals evoked by natural stimuli, remote sensing, and text analysis.
The intended audience is mathematical scientists with interests in data analysis but with a limited statistical background. We assume participants will have had an introductory course in statistics and will be familiar with elementary statistical concepts such as sampling, confidence intervals, and hypothesis tests. We will make extensive use of R and will provide tutorial materials in advance for participants who are not familiar with basic R.
The organizers of the IMA New Directions Short Course on Applied Statistics and Machine Learning have provided a recommended reading list for the course that will provide an introduction to the topics that will be discussed in the course (from more introductory to more advanced):
- Statistics, by David Freedman, Robert Pisani, and Roger Purves (2007)
- Seeing Through Statistics, by Jessica Utts (2005)
- Mathematical Statistics and Data Analysis, by John A. Rice (2008)
Online tutorials as an introduction to R:
Books as an introduction to R: