Adaptive Discretization for Episodic Reinforcement Learning in Metric Spaces

Sean Sinclair, Siddhartha Banerjee, Christina Lee Yu

May 2020

PDF Video Link arXiv code

Abstract

We present an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces. Our algorithm is based on a novel Q-learning policy with adaptive data-driven discretization. The central idea is to maintain a finer partition of the state-action space in regions which are frequently visited in historical trajectories, and have higher payoff estimates. We demonstrate how our adaptive partitions take advantage of the shape of the optimal Q-function and the joint space, without sacrificing the worst-case performance. In particular, we recover the regret guarantees of prior algorithms for continuous state-action spaces, which additionally require either an optimal discretization as input, and/or access to a simulation oracle. Moreover, experiments demonstrate how our algorithm automatically adapts to the underlying structure of the problem, resulting in much better performance compared both to heuristics and Q-learning with uniform discretization.

Type

Conference paper

Publication

2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems

Journal version: Sinclair et al. (2020).

online decision-making reinforcement learning

Siddhartha Banerjee

Associate Professor

Sid Banerjee is an associate professor in the School of Operations Research at Cornell, working on topics at the intersection of data-driven decision-making, market design, and algorithms for large-scale networks.

Adaptive Discretization for Episodic Reinforcement Learning in Metric Spaces

Abstract

Siddhartha Banerjee

Associate Professor

Related