schedule | CS 6804

Note: Since this is the first time the class is being taught, the schedule may adjust if we need more or less time on certain topics.

Date	Lecture	Readings
Transformers & Pretraining Scaling
1/21	Introduction & Transformers [ slides ]	No associated readings
1/26	No classes (canceled due to inclement weather)
1/28	Transformers (cont'd) [ slides ]	Vaswani et al. (2017) Attention Is All You Need Jay Alammar's blog The Illustrated GPT-2 (Visualizing Transformer Language Models)
2/2	Transformers (cont'd) [ slides ]	Devlin et al. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Raffel et al. (2019) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Radford et al. (2019) Language Models are Unsupervised Multitask Learners
2/4	Pretraining scaling [ slides ]	Hoffmann et al. (2022) Training Compute-Optimal Large Language Models [optional] Li et al. (2025) (Mis)Fitting: A Survey of Scaling Laws
2/9	Decoding and inference [ slides ]	Holtzman et al. (2019) The Curious Case of Neural Text Degeneration Brown et al. (2020) Language Models are Few-Shot Learners Wei et al. (2022) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Anthropic blog Effective context engineering for AI agents
2/11	Multimodal models [ slides ]	Dosovitskiy et al. (2020) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Bai et al. (2025) Qwen3-VL Technical Report
Efficient training & inference
2/16	Efficient training [ slides ]	Fedus et al. (2021) Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Hu et al. (2021) LoRA: Low-Rank Adaptation of Large Language Models
2/18	Efficient training (cont'd) & inference [ slides ]	Dao et al. (2022) FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Tandon et al. (2025) End-to-End Test-Time Training for Long Context
Post-training & Reinforcement Learning
2/23	Post-training [ slides ]	Wei et al. (2021) Finetuned Language Models Are Zero-Shot Learners Longpre et al. (2023) The Flan Collection: Designing Data and Methods for Effective Instruction Tuning Ouyang et al. (2022) Training language models to follow instructions with human feedback Bai et al. (2022) Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
2/25	Post-training (cont'd) [ slides ]	Rafailov et al. (2023) Direct Preference Optimization: Your Language Model is Secretly a Reward Model Lambert (2026) (Chapter 6)
3/2	Policy gradient algorithms [ slides ]	Lambert (2026) (Chapter 6) Shao et al. (2024) DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Policy gradient algorithms (cont'd) + Large Reasoning Models
3/4	Policy gradient algorithms (cont'd) + Test-time scaling [ slides ]	DeepSeek-AI (2025) DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Muennighoff et al. (2025) s1: Simple test-time scaling Brown et al. (2024) Large language monkeys: Scaling inference compute with repeated sampling [optional] Jolicoeur-Martineau (2025) Less is More: Recursive Reasoning with Tiny Networks
Spring break
3/9	No classes
3/11	No classes
Agents & Compound AI systems
3/16	Test-time scaling (cont'd) + Compound AI systems [ slides ]	Venkatraman et al. (2025) Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models Schick et al. (2023) Toolformer: Language Models Can Teach Themselves to Use Tools Jin et al. (2025) Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Agrawal et al. (2025) GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
3/18	Agents [ slides ]	Andrew Ng Agentic Design Patterns Part 1-5 Wang et al. (2024) OpenHands: An Open Platform for AI Software Developers as Generalist Agents Xie et al. (2024) OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Huang et al. (2025) Deep Research Agents: A Systematic Examination And Roadmap
Student presentations & discussions
3/23	(Kiymet & Enoch): LLM-as-a-Judge & LLM Evaluation [ slides ]	Zheng et al. (2023) Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Chen et al. (2024) MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark [optional] Liu et al. (2026) Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
3/25	(Neelesh & Sriram): Context Engineering & Agentic Memory [ slides ]	Xu et al. (2025) A-MEM: Agentic Memory for LLM Agents Zhang et al. (2025) Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models [optional] Anthropic's blog Effective context engineering for AI agents
3/30	(Mokshitha & Ishtiaque & Muhammad): Long-context processing [ slides ]	Sun et al. (2025) Scaling Long-Horizon LLM Agent via Context-Folding Zhang et al. (2025) Recursive Language Models [optional] Alizadeh et al. (2026) Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context
4/1	(Alexa & Simon & Hajra): AI for scientific research [ slides ]	Novikov et al. (2025) AlphaEvolve: A coding agent for scientific and algorithmic discovery Lu et al. (2026) Towards end-to-end automation of AI research [optional] Andrej Karpathy's autoresearch
4/6	(Caleb & Sangwook): Test-time discovery and optimization [ slides ]	Yuksekgonul et al. (2026) Learning to Discover at Test Time Lee et al. (2026) Meta-Harness: End-to-End Optimization of Model Harnesses [optional] optimize_anything: A Universal API for Optimizing any Text Parameter
4/8	(Rituraj & Jing): Deep Think [ slides ]	Venkatraman et al. (2026) Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models Sharma et al. (2026) PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference [optional] Singh et al. (2026) V1: Unifying Generation and Self-Verification for Parallel Reasoners
4/13	(Yu-Min & Yeana & Pin-Jie): Model interpretability [ slides ]	Anthropic's blog (2026) Emotion Concepts and their Function in a Large Language Model Sofroniew et al. (2026) Emotion Concepts and their Function in a Large Language Model
4/15	(Farhana & Bikash & Heajun): Diffusion models & Diffusion LLMs [ slides ]	Nie et al. (2026) Large Language Diffusion Models Miles et al. (2026) Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching [optional] Isaac Ke's video Diffusion Models for AI Image Generation
4/20	(Cameron & Briana): AI security & privacy [ slides ]	Betley et al. (2025) Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs Panfilov et al. (2026) Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs [optional] Wang et al. (2025) Persona Features Control Emergent Misalignment [optional] Liu et al. (2026) Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models [optional] Goel et al. (2026) Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models
4/22	(Umid & Jafar): AI bias [ slides ]	Cloud et al. (2025) Subliminal Learning: Language models transmit behavioral traits via hidden signals in data Schrodi et al. (2025) Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
4/27	(Sneha & Aravinda): Test-time scaling for evaluation [ slides ]	Kwok et al. (2026) LLM-as-a-Verifier: A General-Purpose Verification Framework Zhang et al. (2026) Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models
4/29	(Hani & Najibul): Chain-of-Thought monitorability and controllability [ slides ]	Guan et al. (2025) Monitoring Monitorability Yueh-Han et al. (2026) Reasoning Models Struggle to Control their Chains of Thought
Final exam
5/4	No classes
5/6	Exam (in-class) [ slides ]