Kianté Brantley

I am a fourth year PhD student in Computer Science at The University of Maryland College Park advised by Hal Daumé III. I am a member of the CLIP Lab and CORAL Lab. I did my undergrad degree in Computer Science at the University of Maryland Baltimore County. I also did my Master degree in Computer Science at the University of Baltimore County advised by Tim Oates.

Email  /  CV  /  Google Scholar  /  Semantic Scholar  /  Github  /  Twitter

profile photo
Research

I'm interested in designing algorithms that efficiently integrate domain knowledge into sequential decision making problems (e.g. reinforcement learning, imitation learning and structure prediction for natural language processing).

News

  • September 2020: Invited to Microsoft Research AI Breakthroughs Workshop 2020
  • June 2020: Awarded Microsoft Dissertation Grant
  • June 2020: Internship Microsoft Research Montréal

Publications
Constrained episodic reinforcement learning in concave-convex and knapsack settings
Kianté Brantley, Miroslav Dudik, Thodoris Lykouris Sobhan Miryoosefi Max Simchowitz Aleksandrs Slivkins Wen Sun
Conference on Neural Information Processing Systems (NeurIPS), 2020
[paper] [code]

We propose an algorithm for tabular episodic reinforcement learning with constraints. We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints, and for settings with hard constraints (knapsacks). Most of the previous work in constrained reinforcement learning is limited to linear constraints, and the remaining work focuses on either the feasibility question or settings with a single episode. Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.

Active Imitation Learning with Noisy Guidance
Kianté Brantley, Amr Sharaf, Hal Daumé III
Association for Computational Linguistics (ACL), 2020
[paper] [code] [poster] [slides]

Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks by learning near-optimal search policies. Such algorithms assume training-time access to an expert that can provide the optimal action at any queried state; unfortunately, the number of such queries is often prohibitive, frequently rendering these approaches impractical. To combat this query complexity, we consider an active learning setting in which the learning algorithm has additional access to a much cheaper noisy heuristic that provides noisy guidance. Our algorithm, LEAQI, learns a difference classifier that predicts when the expert is likely to disagree with the heuristic, and queries the expert only when necessary. We apply LEAQI to three sequence labeling tasks, demonstrating significantly fewer queries to the expert and comparable (or better) accuracies over a passive approach.

Disagreement-Regularized Imitation Learning
Kianté Brantley, Wen Sun, Mikael Henaff
International Conference on Learning Representations (ICLR), 2020 (Spotlight)
[paper] [code] [poster] [slides] [talk]

We present a simple and effective algorithm designed to address the covariate shift problem in imitation learning. It operates by training an ensemble of policies on the expert demonstration data, and using the variance of their predictions as a cost which is minimized with RL together with a supervised behavioral cloning cost. Unlike adversarial imitation methods, it uses a fixed reward function which is easy to optimize. We prove a regret bound for the algorithm which is linear in the time horizon multiplied by a coefficient which we show to be low for certain problems on which behavioral cloning fails. We evaluate our algorithm empirically across multiple pixel-based Atari environments and continuous control tasks, and show that it matches or significantly outperforms behavioral cloning and generative adversarial imitation learning

Non-monotonic sequential text generation
Sean Welleck, Kianté Brantley, Hal Daumé III, Kyunghyun Cho
International Conference on Machine Learning (ICML), 2019
[paper] [code] [poster] [slides]

Standard sequential generation methods assume a pre-specified generation order, such as text gener ation methods which generate words from left to right. In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation. Our framework operates by generating a word at an arbitrary position, and then recursively generating words to its left and then words to its right, yielding a binary tree. Learning is framed as imitation learning, including a coaching method which moves from imitating an oracle to reinforcing the policy’s own preferences. Experimental results demonstrate that using the proposed method, it is possible to learn policies which generate text without pre-specifying a generation order, while achieving competitive performance with conventional left-to-right generation.

Reinforcement Learning with Convex Constraints
Sobhan Miryoosefi*, Kianté Brantley*, Hal Daumé III, Miro Dudik, Robert Schapire
Conference on Neural Information Processing Systems (NeurIPS), 2019
[paper] [code] [poster] [slides]

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks, specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set. This captures previously studied constraints (such as safety and proximity to an expert), but also enables new classes of constraints (such as diversity). Our approach comes with rigorous theoretical guarantees and only relies on the ability to approximately solve standard RL tasks. As a result, it can be easily adapted to work with any model-free or model-based RL algorithm. In our experiments, we show that it matches previous algorithms that enforce safety via constraints, but can also enforce new properties that these algorithms cannot incorporate, such as diversity.

The umd neural machine translation systems at wmt17 bandit learning task
Amr Sharaf, Shi Feng, Khanh Nguyen, Kianté Brantley, Hal Daumé III
Second Conference on Machine Translation, 2017
[paper] [poster]

We describe the University of Maryland machine translation systems submitted to the WMT17 German-English Bandit Learning Task. The task is to adapt a translation system to a new domain, using only bandit feedback: the system receives a German sentence to translate, produces an English sentence, and only gets a scalar score as feedback. Targeting these two challenges (adaptation and bandit learning), we built a standard neural machine translation system and extended it in two ways: (1) robust reinforcement learning techniques to learn effectively from the bandit feedback, and (2) domain adaptation using data selection from a large corpus of parallel data.

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting
Kianté Brantley
University of Maryland, Baltimore County Master Thesis, 2016
[paper] [slides]

Determining the optimal size of a neural network is complicated. Neural networks, with many free parameters, can be used to solve very complex problems. However, these neural networks are susceptible to overfitting. BCAP (Brantley-Clark Artificial Neural Network Pruning Technique) addresses overfitting by combining duplicate neurons in a neural network hidden layer, thereby forcing the network to learn more distinct features. We compare hidden units using the cosine similarity, and combine those that are similar with each other within a threshold ϵ. By doing so the co-adaption of the neurons in the network is reduced because hidden units that are highly correlated (ie similar) are combined. In this paper we show evidence that BCAP is successful in reducing network size while maintaining accuracy, or improving accuracy of neural networks during and after training.

LDAexplore: Visualizing topic models generated using latent dirichlet allocation
Ashwinkumar Ganesan, Kianté Brantley, Shimei Pan, Jian Chen
extvis Workshop - Intelligent User Interfaces (IUI), 2015
[paper] [code] [slides]

We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate topics. One of the problems with methods like LDA is that users who apply them may not understand the topics that are generated. Also, users may find it difficult to search correlated topics and correlated documents. LDAExplore, tries to alleviate these problems by visualizing topic and word distributions generated from the document corpus and allowing the user to interact with them. The system is designed for users, who have minimal knowledge of LDA or Topic Modelling methods. To evaluate our design, we run a pilot study which uses the abstracts of 322 Information Visualization papers, where every abstract is considered a document. The topics generated are then explored by users. The results show that users are able to find correlated documents and group them based on topics that are similar.

Template from here