CompLing1 Fall 2011

Computational Linguistics 1

Schedule

Please note that this is a somewhat approximate schedule, and is subject to change.

Readings are to be completed before class. "SaLP1" refers to the first edition Jurafsky & Martin textbook while "SaLP2" refers to the second edition of the textbook Speech and Language Processing; other readings are linked from within the schedule and at the bottom of the page.

Schedule

Finite-State models
Date	Topic	Reading	Homework Due
1 Sept	Introduction to Computational Linguistics Class administrivia, programming and tutorials (Kristy's Unix scripting tutorial and Jimmy Lin's Python tutorial), NLP applications, statistical modeling	SaLP 1.1-1.4; WATSON on Jeopardy!
6 Sept	Introduction to Finite-State Models Regular expressions, Chomsky hierarchy, automata and transducers	SaLP Ch 2
8 Sept	N-grams Orthography and morphology	SaLP 3.1-3.3	HW0
13 Sept	N-grams with Ambiguity Phonology and pronunciation	SaLP1 4.1-4.3, 4.6, 5.7 SaLP2 7.1-7.3, 7.5
15 Sept	Probabilistic N-grams Language modeling, log-likelihood, backoff	SaLP1 5.4, 6.1-6.2 SaLP2 9.2, 4.1-4.2, 4.5 Optional: CG98
20 Sept	LMs, Class-based Models Backoff models for LMs, OOV handling, class-based modeling for OOVs	SaLP1 6.3, 8.7 SaLP2 5.1-5.5, 5.8	HW1
22 Sept	Part-of-speech (POS) Tagging Word classes, part-of-speech tagging	SaLP1 8.1-8.4 SaLP2 5.1-5.4
27 Sept	Markov Chains, Hidden Markov Models (HMMs) Markov order 1, decoding	SaLP1 7.2-7.3 SaLP2 6.1-6.2
29 Sept	Forward Algorithm, Viterbi Dynamic Programming, minimum edit distance?	SaLP1 8.5 SaLP2 6.4	HW2
4 Oct	More Tagging "Shallow" parsing, other finite-state tagging tasks	SP03
6 Oct	More Dynamic Programming Forward-Backward algorithm	Wikipedia
11 Oct	Machine Learning in NLP Unsupervised learning, Expectation Maximization (EM) algorithm	Bil97
13 Oct	More Machine Learning in NLP Supervised learning: perceptron, conditional random fields (CRFs), SVMs	Col02; LMP01 Optional: SM06	HW3
Context-Free Models
18 Oct	Introduction to Context-Free Models Trees, re-visit Chomsky hierarchy, O(n³) Midterm Handed Out Take-home. Open-book, open-note, open-laptop...but not open-internet.	SaLP1 Ch 9 SaLP2 Ch 12.1-12.3, 12.8-12.9
20 Oct	Context-Free Grammars (CFGs) Treebanks, probabilistic CFGs	SaLP1 9.1-9.8; SaLP2 12.1-12.4; PTB93
25 Oct	Context-Free Parsing CYK Algorithm, grammar transformations NLTK demo, by Prof. Jordan Boyd-Graber	SaLP1 12.1-12.3 SaLP2 13.4, 14.1-14.6 Optional: JR00; Res92	Midterm Due
27 Oct	Class cancelled - instructor out of town
Beyond Context-Free; Semantics and Meaning
1 Nov	More on Context-Free Grammars Left-corner grammar transformations, Earley parsing algorithm	Optional: JR00; Res92
3 Nov	Other Parsing Models Context-sensitive models (unification, TAG, CCG)	SaLP1 11.1-11.3, 13.3 SaLP2 15.1-15.3, 16.3	HW4
8 Nov	Semantics Word sense disambiguation (WSD), coreference, semantic role labeling (SRL)	SaLP1 16.1-16.3, 17.1-17.2 SaLP2 19.1-19.4, 20.1, 20.4, 20.6
10 Nov	More Semantics (continued from previous lecture)
15 Nov	Statistics and Noise Normalization, pre- and post-processing	Norm01
17 Nov	More Text Normalization (continued from previous lecture)		HW5
Applications
22 Nov	Machine Translation Translation and language models, bi-text parsing	Chi05; CKtut06
24 Nov	No class -- Thanksgiving
29 Nov	Speech Recognition (ASR)	MPR08 Sec 4
1 Dec	Speech Synthesis (TTS)	SaLP1 4.6-4.7 / SaLP2 8.2-8.3
6 Dec	Information Retrieval (IR), Question Answering (QA)	Lin06 Optional: RH02, DFL07	HW6
8 Dec	Automatic Summarization, Information Extraction (IE)	GKM05; D3M05
?	Health Applications Automated diagnostics and assistive technology	RMH07; SaLP1 6.7 / SaLP2 4.11
13 Dec	Class Summary Algorithms, "toolkit"		HW7
15 Dec	No class -- Finals Week Begins
19 Dec (Mon)	Final Exam (Takehome) Due: 12:30pm

References

Bil97		Jeff Bilmes. A Gentle Tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI, 1997.
CJ05		Eugene Charniak and Mark Johnson. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), 2005.
CG98		Stanley Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Technical report TR-10-98, Harvard University, 1998.
Chi05		David Chiang. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the Annual Meeting of the ACL, pp. 263-270, 2005.
CKtut06		David Chiang and Kevin Knight. An introduction to synchronous grammars: part of a tutorial given at ACL 2006.
Col02		Michael Collins. Discriminative training methods for Hidden Markov Models: theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1-8, 2002.
PRank01		Koby Crammer and Yoram Singer. PRanking with Ranking. In Proceedings of the Fourteenth Annual Conference on Neural Information Processing Systems (NIPS), 2001.
D3M05		Hal Daumé III and Daniel Marcu. Induction of word and phrase alignments for automatic document summarization. Computational Linguistics, 31(4):505-530, December 2005.
DFL07		Dina Demner-Fushman and Jimmy Lin. Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Computational Linguistics, 33(1):63-103, 2007.
GKM05		Trond Grenager, Dan Klein and Chris Manning. Unsupervised Learning of Field Segmentation Models for Information Extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 371-378, 2005.
HC05		Liang Huang and David Chiang. Better k-best parsing. In Proceedings of the Ninth International Workshop on Parsing Technology (IWPT), pp. 53-64, 2005.
JR00		Mark Johnson and Brian Roark. Compact non-left-recursive grammars using the selective left-corner transform and factoring. In Proceedings of COLING, pp. 355-361, 2000.
LMP01		John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional Random Fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML), 2001.
Lin06		Jimmy Lin. The Role of Information Retrieval in Answering Complex Questions. In Proceedings of COLING/ACL Poster Sessions, pp. 523-530, 2006.
PTB93		Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313-330, June 1993.
Mil05		Rada Mihalcea. Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling. In Proceedings of HLT-EMNLP, 2005.
MPR08		Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Speech recognition with weighted finite-state transducers. In Handbook on Speech Processing and Speech Communication, Part E: Speech Recognition, 2008.
OER05		Jahna Otterbacher, Gunes Erkan and Dragomir R. Radev. Using Random Walks for Question-focused Sentence Retrieval. In Proceedings of HLT-EMNLP, 2005.
RH02		Deepak Ravichandran and Eduard Hovy. Learning Surface Text Patterns for a Question Answering System. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 41-47, 2002.
Res92		Philip Resnik. Left-Corner Parsing and Psychological Plausibility. In Proceedings of COLING, 1992.
RMH07		Brian Roark, Margaret Mitchell, and Kristy Hollingshead. Syntactic complexity measures for detecting Mild Cognitive Impairment. In Proceedings of the ACL 2007 Workshop on Biomedical Natural Language Processing (BioNLP), pages 1-8, 2007.
SP03		Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields. Proceedings of the HLT-NAACL Annual Meeting, pp. 134-141, 2003.
Norm01		Richard Sproat, Alan Black, Stanley Chen, Shankar Kumar, Mari Ostendorf, and Christopher Richards. Normalization of non-standard words. Computer Speech and Language, 15(3):287-333, 2001.
SM06		Charles Sutton and Andrew McCallum. An introduction to Conditional Random Fields for relational learning. Book chapter in Introduction to Statistical Relational Learning. Edited by Lise Getoor and Ben Taskar. MIT Press, 2006.

Back to class main page.