schedule | CS 5624

Note: Since this is the first time the class is being taught, the schedule may adjust if we need more or less time on certain topics.

Date	Lecture	Readings	Logistics
1/21	Introduction [ slides ]	No associated readings
Language Modeling
1/23	Language modeling [ slides ]	Jurafsky and Martin, Chapter 3.1-3.5	Homework 0 released on Piazza (due 2/7)
1/28	Neural language models [ slides ]	Jurafsky and Martin, Chapter 7.1-7.4 and 7.6 Bengio et al. (2003) A Neural Probabilistic Language Model
1/30	Backpropagation [ slides ]	Jurafsky and Martin, Chapter 7.5 and 7.7	Quiz 0 released on Piazza (due 2/7)
2/4	Class canceled because Tu was sick
2/6	Word Embeddings [ slides ]	Jurafsky and Martin, Chapter 6 Mikolov et al. (2013a) Distributed Representations of Words and Phrases and their Compositionality Mikolov et al. (2013b) Efficient Estimation of Word Representations in Vector Space [optional] Pennington et al. (2014) GloVe: Global Vectors for Word Representation
Transformers and the Evolution of LLMs
2/11	Class canceled due to inclement weather
2/13	Transformers [ slides ]	Bahdanau et al. (2014) Neural Machine Translation by Jointly Learning to Align and Translate Vaswani et al. (2017) Attention Is All You Need Jay Alammar's blog Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) Jay Alammar's blog The Illustrated GPT-2 (Visualizing Transformer Language Models)
2/18	The Era of BERT [ slides ]	Devlin et al. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Raffel et al. (2019) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2/20	Scaling LLM Pretraining [ slides ]	Kaplan et al. (2020) Scaling Laws for Neural Language Models Hoffmann et al. (2022) Training Compute-Optimal Large Language Models Li et al. (2025) (Mis)Fitting: A Survey of Scaling Laws
LLM Capabilities and Evaluation
2/25	LLM Prompting [ slides ]	Brown et al. (2020) Language Models are Few-Shot Learners Wei et al. (2022) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
2/27	LLM Decoding [ slides ]	Holtzman et al. (2019) The Curious Case of Neural Text Degeneration
3/4	Instruction tuning [ slides ]	Wei et al. (2021) Finetuned Language Models Are Zero-Shot Learners Chung et al. (2022) Scaling Instruction-Finetuned Language Models [optional] Sanh et al. (2021) Multitask Prompted Training Enables Zero-Shot Task Generalization [optional] Longpre et al. (2023) The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
3/6	LLM Alignment [ slides ]	Ouyang et al. (2022) Training language models to follow instructions with human feedback Bai et al. (2022) Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback Rafailov et al. (2023) Direct Preference Optimization: Your Language Model is Secretly a Reward Model
3/11	No classes (Spring break)
3/13	No classes (Spring break)
3/18	LLM Evaluation [ slides ]	Zheng et al. (2023) Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena [optional] Vu et al. (2024) Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation
Improving LLM Efficiency and Adaptability
3/20	Parameter-efficient fine-tuning [ slides ]	Lester et al. (2021) The Power of Scale for Parameter-Efficient Prompt Tuning Hu et al. (2021) LoRA: Low-Rank Adaptation of Large Language Models [optional] Vu et al. (2022) SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
3/25	Mixture of Experts [ slides ]	Fedus et al. (2021) Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Shen et al. (2023) Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models [optional] Zoph et al. (2022) ST-MoE: Designing Stable and Transferable Sparse Expert Models [optional] Lepikhin et al. (2020) GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
3/27	Model Merging [ slides ]	Ilharco et al. (2022) Editing Models with Task Arithmetic Yadav et al. (2023) TIES-Merging: Resolving Interference When Merging Models
4/1	Distillation, quantization, and pruning [ slides ]	Hinton et al. (2015) Distilling the Knowledge in a Neural Network Maarten Grootendorst's blog A Visual Guide to Quantization Frankle and Carbin (2018) The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
4/3	Long-context LLMs [ slides ]	Dao et al. (2022) FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Gao et al. (2024) How to Train Long-Context Language Models (Effectively)
Advanced LLMs and Compound AI Systems
4/8	Advanced reasoning & Test-time scaling [ slides ]	DeepSeek-AI (2025) DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Muennighoff et al. (2025) s1: Simple test-time scaling Brown et al. (2024) Large language monkeys: Scaling inference compute with repeated sampling [optional] Geiping et al. (2025) Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
4/10	Advanced reasoning & Test-time scaling (cont'd) [ slides ]	Ye et al. (2025) LIMO: Less is More for Reasoning Yu et al. (2025) Z1: Efficient Test-time Scaling with Code [optional] Xiang et al. (2025) Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought
4/15	Retrieval-augmented generation (RAG) & Tool-use LLMs [ slides ]	Lewis et al. (2020) Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Schick et al. (2023) Toolformer: Language Models Can Teach Themselves to Use Tools [optional] Jin et al. (2025) Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [optional] OpenAI (2025) Introducing deep research
4/17	LLM Agents [ slides ]	Yao et al. (2022) ReAct: Synergizing Reasoning and Acting in Language Models Wu et al. (2023) AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Andrew Ng Agentic Design Patterns Part 1-5 [optional] Madaan et al. (2023) Self-Refine: Iterative Refinement with Self-Feedback [optional] Shinn et al. (2023) Reflexion: Language Agents with Verbal Reinforcement Learning [optional] Shen et al. (2023) HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face [optional] Qian et al. (2023) ChatDev: Communicative Agents for Software Development
Other topics
4/22	Multimodal LLMs [ slides ]	Wang et al. (2024) Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution McKinzie et al. (2024) MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Sebastian Raschka Understanding Multimodal LLMs [optional] Deitke et al. (2024) Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models [optional] Wang et al. (2024) Emu3: Next-Token Prediction is All You Need [optional] Jiang et al. (2025) Token-Efficient Long Video Understanding for Multimodal LLMs
4/24	LLM Safety and Security [ slides ]	Qi et al. (2025) Safety Alignment Should be Made More Than Just a Few Tokens Deep Nasr et al. (2023) Scalable Extraction of Training Data from (Production) Language Models Wei et al. (2023) Jailbroken: How Does LLM Safety Training Fail? [optional] Shen et al. (2023) "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models [optional] Kumar et al. (2025) OverThink: Slowdown Attacks on Reasoning LLMs
4/29	No classes
5/1	No classes
5/6	Project presentations [ slides ]
5/8	Project presentations [ slides ]