We consider the problem of dividing limited resources to individuals arriving over $T$ rounds. Each round has a random number of individuals arrive, and individuals can be characterized by their type (i.e. preferences over the different resources). A …
Optimizing shared vehicle systems (bike/scooter/car/ride-sharing) is more challenging compared to traditional resource allocation settings due to the presence of *complex network externalities* -- changes in the demand/supply at any location affect …
We introduce the technique of adaptive discretization to design efficient model-based episodic reinforcement learning algorithms in large (potentially continuous) state-action spaces. Our algorithm is based on optimistic one-step value iteration …
We develop a framework for designing simple and efficient policies for a family of online allocation and pricing problems, that includes online packing, budget-constrained probing, dynamic pricing, and online contextual bandits with knapsacks. In …
We present an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces. Our algorithm is based on a novel Q-learning policy with adaptive data-driven discretization. The central idea is …
We consider a general class of finite-horizon online decision-making problems, where in each period a controller is presented a stochastic arrival and must choose an action from a set of permissible actions, and the final objective depends only on …
We consider a general class of finite-horizon online decision-making problems, where in each period a controller is presented a stochastic arrival and must choose an action from a set of permissible actions, and the final objective depends only on …
We consider a general class of finite-horizon online decision-making problems, where in each period a controller is presented a stochastic arrival and must choose an action from a set of permissible actions, and the final objective depends only on …
We develop a new framework for designing online policies given access to an oracle providing statistical information about an offline benchmark. Having access to such prediction oracles enables simple and natural Bayesian selection policies, and …
We present an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces. Our algorithm is based on a novel Q-learning policy with adaptive data-driven discretization. The central idea is …