| Day | Topic | Reading | Optional Reading | |
| 1/04/12 | Introduction | B 1-2; HTF 1 | The discipline of machine learning by T. Mitchell; Statistical modeling: The two cultures in Statistical Science, 16(3):199-231, 2001 by L. Breiman; Wald Lecture 2002: Machine learning by L. Breiman | |
| 1/09/12 | Linear prediction | B 3.1.1-2 | Norms in A.1.2-3, Convex Optimization by Boyd and Vandenberghe | |
| 1/11/12 | Generalized linear prediction | HTF 5.1-2, 5.7, 5.9, 6.1-3 | ||
| 1/16/12 | Martin Luther King Day, University closed | |||
| 1/18/12 | Regularization, neural networks | B 3.1, 5; HTF 3.4, 5.4, 11 | Lecture notes by David McAllester; Neural network at Wikipedia; The next generation of neural network, Video by Geoffrey Hinton | |
| 1/23/12 | Learning theory: Bias-variance | B 3.2; HTF 2.9, 7 | Lecture notes by David McAllester; Neural networks and the bias/variance dilemma in Neural Computation by Geman, Bienenstock and Doursat | |
| 1/25/12 | Automated complexity control | HTF 7 | ||
| 1/30/12 | Linear classification, support vector machines | B 7.1.1-2; HTF 4.5, 12.1-3 | Tutorial by Ron Meir; Excerpt from Vapnik's The Natural of Statistical Learning Theory; Support vector machines with applications; SVM at Wikipedia; | |
| 2/01/12 | Duality | HTF 4.5, 12.1-3 | Chapter 5, Convex Optimization by Boyd and Vandenberghe; The entire regularization path for the support vector machine at NIPS 2004 by Hastie et al. | |
| 2/06/12 | Kernels | B 6.1-3; HTF 5.8, 6.1-2, 6.7 | Lecture notes by David McAllester | |
| 2/08/12 | Kernels; Multiclass prediction (skipped) | B 7.1.3 | On the algorithmic implementation of multiclass kernel-based vector machines in JMLR 2001 by Crammer and Singer | |
| 2/13/12 | Learning theory: Uniform convergence, Vapnik-Chervonenkis dimension | HTF 7.9 | Lecture notes by Andew Ng; Lecture notes by David McAllester; VC dimension at Wikipedia; On the uniform convergence of relative frequencies of events to their probabilities by Vapnik and Chervonenkis, 1971 | |
| 2/15/12 | Combining classifiers, boosting | B 14.3; HTF 10 | Toy example, training error proof, overview and slides and video by Schapire; Lecture notes by David McAllester; Evidence contrary to the statistical view of boosting in JMLR 2008 by Mease and Wyner. | |
| 2/20/12 | Probability models, Bayes decision theory | B 1.5 | II.B and IV.B, An Introduction to Signal Detection and Estimation, by V. Poor; Decision Theory at Wikipedia | |
| 2/22/12 | Bayesian networks and Markov random fields | B 8.1, 8.3; HTF 17 | Graphical models; Bayesian networks; Markov random fields at Wikipedia | |
| 2/27/12 | Maximum likelihood estimation | B 9.2 | Generative and discriminative classifiers: naive Bayes and logistic regression by Tom Mitchell | |
| 2/29/12 | Expectation-Maximization algorithm | B 9.2-4; HTF 8.5 | Lecture notes, Lecture notes by Andew Ng; Lecture notes by David McAllester; EM algorithm at Wikipedia | |
| 3/05/12 | Hidden Markov models | B 13.1-3 | HMM tutorial and examples by L. Rabiner; Lecture notes by David McAllester; HMM at Wikipedia | |
| 3/07/12 | Structured prediction: conditional random fields | CRF paper and video by John Lafferty; Lecture notes by David McAllester; Conditional random fields at Wikipedia | ||
| 3/12/12 | Structured prediction: max-margin Markov networks (skip) | M3N paper and slides and video by Taskar et al; M3N tutorial by S. Lacoste-Julien | ||
| 3/12/12 | Challenges in statistical machine learning | Challenges in statistical machine learning and video by J. Lafferty | ||
| 3/14/11 | Project presentation |