Link Search Menu Expand Document

CMPUT 605: Theoretical Foundations of Reinforcement Learning W2023

The purpose of this course is to let students acquire a solid understanding of the theoretical foundations of reinforcement learning, as well as to give students a glimpse on what theoretical research looks like in the context of computer science. The topics will range from building up foundations (Markovian Decision Processes and the various special cases of it), to discussing solutions to two core problem settings:

  • planning/simulation optimization, and
  • online reinforcement learning

In each of these settings, we cover key algorithmic challenges and the core ideas to address these. Specific topics, ideas and algorithms covered include, for each topic:

  • complexity of planning/simulation optimization; large scale planning with function approximation;
  • efficient online learning: the role (and limits) of optimism; scaling up with function approximation.

While we will explore connections to (some) deep RL methods, mainly seeking an answer to the question of when we expect them to work well, the course will not focus on deep RL.


Students taking the course are expected to have an understanding of basic probability, basics of concentration inequalities, linear algebra and convex optimization. This background is covered in Chapters 2, 3, 5, 7, 26, and 38 of the Bandit Algorithms book. One very nice book that covers more, but is still highly recommended is A Second Course in Probability Theory. The book is available online and also in book format. Chapters 1, 3, 4, and 5 are most useful from here.

It will also be useful to recall foundations of mathematical analysis, such as completeness, metric spaces and alike, as we will start off with results that will require Banach’s fixed point theorem. This is covered, for example, in Appendix A of Csaba’s “little” RL book. The wikipedia page on Banach’s fixed point theorem is not that bad either.

Instruction Team

Lecture Time

Monday and Wednesdays from 1:00 PM - 2:30 PM (MST) in CSC 3-33.

Office Hours

Office hours are at 2:00pm - 4:00pm on the Friday before each assignment is due (Exceptions indicated below). Location: Breakout Room in CSC (Exact room to be announced around 1:55pm on the Friday).

  • Vlad Tkachuk: Jan 27 (Time changed to 10:00am - 12:00pm (noon)) and Feb 10
  • Alex Ayoub: Mar 10 and Mar 24

Slack Channel

We will use Slack for everything. We have a channel called #cmput-605-students on the Amii slack to discuss all topics related for this course. If you would like to join the channel please message Vlad Tkachuk ( for an invitation. All announcements will be made on #cmput-605-students. We strongly encourage all students to ask questions regarding course content on the Slack channel!

Lectures Notes

The lecture notes for this year’s class are under the heading LECTURE NOTES. The lecture notes for this year serve as the required text for this course. Lecture notes for the last two years are available on this site under headings WINTER 2022 LECTURE NOTES and WINTER 2021 LECTURE NOTES.


The work you will be required to do for this course includes 4 (marked) assignments, 1 midterm, and 1 final project. For all course work submissions please send your completed work to Vlad Tkachuk via private message on Slack before the due date. More details can be found in The Work You Do page.

Keywords: RL theory, Reinforcement Learning, Theoretical Reinforcement Learning