The Work You Do
Component  Weight  Deadline  PDF and LaTex 

Assignment 0 (Not Graded)  0%  January 15, 2023 11:55pm  PDF and LaTex 
Assignment 1  10%  January 29, 2023 11:55pm  PDF and LaTex 
Assignment 2  10%  February 12, 2023 11:55pm  PDF and LaTex 
Midterm  20%  February 26, 2023 11:55pm  PDF and LaTex 
Project (Proposal)  10%  March 5, 2023 11:55pm  
Assignment 3  10%  March 12, 2023 11:55pm  PDF and LaTex 
Assignment 4  10%  March 26, 2023 11:55pm  PDF and LaTex 
Project (Presentation)  10%  April 11 and 12, 2023 (in class)  
Project (Report)  20%  April 21, 2023 11:55pm 
Late Policy
Late submissions will not be graded. Please submit your work by the deadline.
Course Project
Deadlines
 Proposal: March 5, 2023 11:55pm
 Presentations: April 11 and 12, 2023, in class
 Report: April 18, 2023 11:55pm
Objective and evaluation
The project should be done individually or in a group of two (maximum group size of two).
The goal is for everyone to get a taste of how it is to work on theoretical aspects of reinforcement learning. In the project, you do not actually need to produce research paper quality results (although if you do, no one will complain!). It is sufficient to demonstrate a thorough understanding of some aspect of the theory literature, such as:
 What are the interesting questions to ask (and what are less interesting questions?)
 What is known about a given topic (and what is not known)?
 Sorting out whether some assumption is critical for some result (or not).
When evaluating the reports, we will not care that much about originality (new results) than coherence, soundness and the quality of writing. In fact, a typical report is expected to be a readable (and possibly entertaining) summary of a topic in the area. Reports that contain original results are also welcome, just to earn full grade, originality is absolutely not required.
We strongly recommend to start small: Aim for writing a review of some results of interest. If time permits and as you feel fit, add new results.
Having said this, if you score a new result early on, it is also OK to start on writing that result down.
How to choose a topic?
 Choose a theory paper and rewrite it to make it better. Choose and pick of what you include in your report. It may be better proofs. It may be better exposition of the results. Be critical about assumptions (but not overly critical). It may be putting the results into a perspective. Aim for readable (but technically correct) writeups.
 Choose a problem that you care about in the area. Ask what is known. Write a summary about it. Be specific about what problems you are writing about. If time permits and with some luck, add new results. Aim for small things, like, such and such is known in topic A but only under condition B. Do these results extend to condition C? What conditions are necessary? How about slightly changing the problem, for example switching from finite horizon to infinite horizon objectives? Multicriteria?
 Choose an open question and try to answer it. Loads of open questions are mentioned in the class. When there is an upper bound, ask whether there is a matching lower bound. If not quite, try to reduce the gap. Ditto for lower bounds. Any time you see a bound you can ask: Is this tight? The endnotes of the lectures on this website list some of the open questions.
 It is a bit more risky, but possibly more rewarding, is to choose a nontheory paper and look at it through the eyes of a theoretician. Are there any hard claims that could be formulated (and possibly proved) in the context of the paper? If the paper is proposing algorithms, are there any conditions when the algorithm proposed will work “well”? How well? Put the results into the context of what is known. Example: Is TRPO a sound algorithm? Say, in the tabular setting?
Formatting
The reports should be typeset in latex and sent as a pdf document. The template is available here. The report should be maximum 9 pages long, the proposal maximum 2 pages long. They should have the standard structure:
 Introduction (what is the problem studied, why do we study it)
 Results (the “meat”)
 Conclusions/summary (what did we learn? what is the short takeaway from all of this? what’s next if anything?)
Examples of topics

Read “Is Plugin Solver SampleEfficient for Featurebased Reinforcement Learning?” (link) by Qiwen Cui and Lin F. Yang. Place the result in the framework of the class. Which problem are they solving? What are the pros and cons of what they do? What is the significance of the various assumptions? Are there assumptions we could be dropped? Relaxed? Would this extend to other model classes? What is the general lesson?

Read “VarianceAware Confidence Set: VarianceDependent Bound for Linear Bandits and HorizonFree Bound for Linear Mixture MDP” (link) by Zihan Zhang, Jiaqi Yang, Xiangyang Ji, Simon S. Du, which is a followup to “Is Reinforcement Learning More Difficult Than Bandits? A Nearoptimal Algorithm Escaping the Curse of Horizon” (link) by Zihan Zhang, Xiangyang Ji, Simon S. Du. Same questions as before. Do we expect the techniques to be practically useful? When? If not, why not? Can things be fixed up? Extensions to other settings, batch, planning? Infinite horizon problems would really test the limits.

We could go on and list all papers that appeared recently on RL theory (a long list, check out the RL Theory Seminar pages for some starting points). An alternative is to consider specific topics such as (1) how to deal with generalization (2) multiple criteria (3) robustness (4) long horizons beyond variance reduction (5) various model classes beyond linear and linear mixture MDPs (6) nonstationarity (7) valueaware model fitting – is it a good idea? (8) better ways of exploring? is Information Directed Sampling the way to go? (9) what are the limits of adaptive algorithms in RL?

For further inspiration, visit the project page of the class that Nan Jiang taught recently, or this page by Wen Sun and Sam Kakade.