CS771: Natural Language Processing Techniques
Winter 2010


Information
Syllabus
Assignments
Project

Tentative Syllabus

Day Topic Reading Optional Reading
1/04/10 Introduction, basics of information theory JM: 1; MS: 1-3 Statistical methods in Encyclopedia of Cognitive Science by Steve Abney; The dawn of statistical ASR and MT in Computational Linguistics, 35(4): 483-494, 2009 by Fred Jelinek.
1/06/10 Source-channel model, zipf's law and n-grams JM: 9.1, 25.3, 4.4, 4.10-11; MS: 2.2.4, 1.4.3 Information retrieval as statistical translation Adam Berger and John Lafferty, SIGIR 1999
1/11/10 Smoothing techniques: Laplace, Good-Turing JM: 4.5; MS: 6.2 Good, Jelinek, Mercer, and Robins on Turing's estimate of probabilities in American Journal of Mathematical and Management Sciences, 11, 229-308, 1991 by Arthur Nadas; Always Good Turing: asymptotically optimal probability estimation in Science, 302(5644):427-431 by Alon Orlitsky et al.
1/13/10 Deleted linear interpolation; EM algorithm for linear interpolation JM: 4.6 Interpolated estimation of Markov source parameters from sparse data in Pattern Recognition in Practice, 381-397, 1981 by F. Jelinek and R. Mercer; Convexity, maximum likelihood and all that By Adam Berger
1/18/10 Martin Luther King Day, University closed
1/20/10 Back-off: Katz, Kneser-Ney; Large n-gram models; Limitations of n-grams JM: 4.7-9; MS: 6.3 A hierarchical Bayesian language model based on Pitman-Yor processes by Yee Whye Teh, ACL 2006; Large language models in machine translation by Thomas Brants et al. ENMLP 2007; An empirical study of smoothing techniques for language modeling in Computer Speech & Language. 13(4):319-358, 1999 by S. Chen and J. Goodman; Up from trigrams! - the struggle for improved language models. EUROSPEECH, 1037-1040, 1991 by Fred Jelinek
1/25/10 EM algorithm; Hidden Markov models JM: 6; MS: 9
1/27/10 Part-of-speech tagging, Named Entity recognition; Conditional random fields JM: 6,22 Conditional random fields: probabilistic models for sand labeling sequence data by Lafferty et al., ICML 2001; Identifying gene and protein mentions in text using conditional random fields in Bioinformatics, 2005 by McDonald and Pereira
2/01/10 Semi-supervised conditional random fields JM: 6 Semi-supervised conditional random fields for improved sequence segmentation and labeling by Jiao et al., ACL 2006; Efficient computation of entropy gradient for semi-supervised conditional random fields in NAACL/HLT, 2007 by Mann and McCallum
2/03/10 Grammars and parsing JM: 12-13
2/08/10 Probabilistic context-free grammars: inside-outside algorithm JM: 14.1-5; MS: 11 Lafferty's notes
2/10/10 Probabilistic lexicalized CFGs JM: 14.6 Head-driven statistics methods for natural language parsing in Computational Linguistics, 29(4): 589-637, 2003 by Collins; Intricacies of Collins' parsing model in Computational Linguistics, 30(4): 479-511, 2004 by Bikel
2/15/10 Parser based language modeling: structured language model, generalized inside-outside algorithm Structured language model in Computer Speech and Language, 14(4):283-332, 2000 by Chelba and Jelinek; Stochastic analysis of structured language modeling in Mathematical Foundations of Speech and Language Processing, 37-72, 2004 by Jelinek
2/17/10 Word sense disambiguation JM: 20.1-5 Unsupervised word sense disambiguation rivaling supervised methods in ACL 1995 by Yarowsky
2/22/10 Topic models: LSA; Topic based language modeling MS: 15.4 Indexing by latent semantic analysis in Journal of the American Society for Information Science, 41(6):391-407, 1990 by Deerwester et al.; Exploiting latent semantic information in statistical language modeling in Proceedings of the IEEE, 88(8);1279-1296, 2000 by Bellegarda
2/24/10 Topic models: PLSA; Composite language model Unsupervised learning by probabilistic latent semantic analysis in Machine Learning, 42(1):177-196, 2001 by Hoffman
3/01/10 Topic models: LDA Latent Dirichlet allocation in Journal of Machine Learning Research, 3:993-1022, 2003 by Blei et al.
3/03/10 Machine translation: IBM models 1-5; phrase- and parsing based models JM: 25.5, 25.11 The mathematics of statistical machine translation: parameter estimation in Computational Linguistics, 29(2): 263-311, 1993 by Brown et al.; Hierarchical phrase-based translation in Computational Linguistics, 33(2):201-228, 2007 by Chiang; Statistical machine translation at WikiPedia; video: Statistical machine translation by Och at Google
3/08/10 Machine translation: decoding algorithms JM 25.8
3/10/10 Machine translation: BLEU JM 25.9 BLEU: a Method for Automatic Evaluation of Machine Translation in ACL 2002 by Papineni et al.; BLEU at WikiPedia
3/15/10 Project presentations