|#||Deadline||PDF and LaTex|
|Assignment 0||January 10, 2022 11:55pm||PDF and LaTex|
|Assignment 1||January 25, 2022 11:55pm||PDF and LaTex|
|Assignment 2||February 14, 2022 11:55pm||PDF and LaTex|
|Midterm||February 21, 2022 11:55pm||PDF and LaTex|
|Assignment 3||March 14, 2022 11:55pm||PDF and LaTex|
|Assignment 4||March 30, 2022 11:55pm||PDF and LaTex|
Class participation marks (questions and voting)
An assignment can ask you to prepare questions for the upcoming classes and vote on the questions of others. This will happen at least for assignment 1, but it may happen for the other assignments as well. The purpose is to get everyone involved in the class and make the class interactive and fun! The most voted questions will be answered in class and we may even have a longer discussion around them. The public slack channel will be used for asking questions and voting on them. For this, a thread will be created on that slack. The thread will be bookmarked for easy identification (look up to the top of the slack channel). The thread will be created 5 days before the class and will be “closed” on the day of the class at 8am.
- Proposal: March 7, 2022 11:55pm
- Presentations: April 4 and 6, 2022, in class
- Report: April 25, 2022 11:55pm
Objective and evaluation
The project should be done individually or in a group of two (maximum group size of two).
The goal is for everyone to get a taste of how it is to work on theoretical aspects of reinforcement learning. In the project, you do not actually need to produce research paper quality results (although if you do, no one will complain!). It is sufficient to demonstrate a thorough understanding of some aspect of the theory literature, such as:
- What are the interesting questions to ask (and what are less interesting questions?)
- What is known about a given topic (and what is not known)?
- Sorting out whether some assumption is critical for some result (or not).
When evaluating the reports, we will not care that much about originality (new results) than coherence, soundness and the quality of writing. In fact, a typical report is expected to be a readable (and possibly entertaining) summary of a topic in the area. Reports that contain original results are also welcome, just to earn full grade, originality is absolutely not required.
We strongly recommend to start small: Aim for writing a review of some results of interest. If time permits and as you feel fit, add new results.
Having said this, if you score a new result early on, it is also OK to start on writing that result down.
How to choose a topic?
- Choose a theory paper and rewrite it to make it better. Choose and pick of what you include in your report. It may be better proofs. It may be better exposition of the results. Be critical about assumptions (but not overly critical). It may be putting the results into a perspective. Aim for readable (but technically correct) writeups.
- Choose a problem that you care about in the area. Ask what is known. Write a summary about it. Be specific about what problems you are writing about. If time permits and with some luck, add new results. Aim for small things, like, such and such is known in topic A but only under condition B. Do these results extend to condition C? What conditions are necessary? How about slightly changing the problem, for example switching from finite horizon to infinite horizon objectives? Multicriteria?
- Choose an open question and try to answer it. Loads of open questions are mentioned in the class. When there is an upper bound, ask whether there is a matching lower bound. If not quite, try to reduce the gap. Ditto for lower bounds. Any time you see a bound you can ask: Is this tight? The endnotes of the lectures on this website list some of the open questions.
- It is a bit more risky, but possibly more rewarding, is to choose a non-theory paper and look at it through the eyes of a theoretician. Are there any hard claims that could be formulated (and possibly proved) in the context of the paper? If the paper is proposing algorithms, are there any conditions when the algorithm proposed will work “well”? How well? Put the results into the context of what is known. Example: Is TRPO a sound algorithm? Say, in the tabular setting?
The reports should be typeset in latex and sent as a pdf document. The template is available here. The report should be maximum 9 pages long, the proposal maximum 2 pages long. They should have the standard structure:
- Introduction (what is the problem studied, why do we study it)
- Results (the “meat”)
- Conclusions/summary (what did we learn? what is the short take-away from all of this? what’s next if anything?)
Examples of topics
Read “Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?” (link) by Qiwen Cui and Lin F. Yang. Place the result in the framework of the class. Which problem are they solving? What are the pros and cons of what they do? What is the significance of the various assumptions? Are there assumptions we could be dropped? Relaxed? Would this extend to other model classes? What is the general lesson?
Read “Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and Horizon-Free Bound for Linear Mixture MDP” (link) by Zihan Zhang, Jiaqi Yang, Xiangyang Ji, Simon S. Du, which is a follow-up to “Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon” (link) by Zihan Zhang, Xiangyang Ji, Simon S. Du. Same questions as before. Do we expect the techniques to be practically useful? When? If not, why not? Can things be fixed up? Extensions to other settings, batch, planning? Infinite horizon problems would really test the limits.
We could go on and list all papers that appeared recently on RL theory (a long list, check out the RL Theory Seminar pages for some starting points). An alternative is to consider specific topics such as (1) how to deal with generalization (2) multiple criteria (3) robustness (4) long horizons beyond variance reduction (5) various model classes beyond linear and linear mixture MDPs (6) nonstationarity (7) value-aware model fitting – is it a good idea? (8) better ways of exploring? is Information Directed Sampling the way to go? (9) what are the limits of adaptive algorithms in RL?