Skip to content

Cornell-RL/active_CB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Contextual Bandit with Active Learning

This is an implementation of the preference-based active learning algorithm for contextual bandit outlined in Contextual Bandits and Imitation Learning via Preference-Based Active Queries. This paper considers the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward. Under the assumption that the learner has access to a function class that can represent the expert's preference model under appropriate link functions, the paper proposed an algorithm that leverages an online regression oracle with respect to this function class for choosing its actions and deciding when to query.

Installation

git clone https://github.com/Cornell-RL/active_CB.git
cd active_cb
pip install numpy pandas torch ucimlrepo

Usage

python algo.py

This will run the preference learning algorithm on the Iris dataset. To run the reward learning algorithm or start with a different dataset, please follow

usage: algo.py [-h] [--dataset DATASET] [--query QUERY] [--model MODEL]

optional arguments:
  -h, --help         show this help message and exit
  --dataset DATASET  Name of the dataset (iris/car/knowledge)
  --query QUERY      Query type (active/passive)
  --model MODEL      Model type (reward/preference)

Running 1000 training iterations on the Iris dataset takes roughly three hours with evaluation. It is expected that running the algorithm on multi-class classification datasets with a large number of classes will take more episodes to converge and will take require a longer runtime.

Results

Here are the results on the Iris, Car Evaluation, and User Knowledge Modeling datasets. The hyperparameters required by the algorithm are set in the training loop based on the dataset.

Iris

Car

User Knowledge

Citation

@misc{sekhari2023contextual,
      title={Contextual Bandits and Imitation Learning via Preference-Based Active Queries}, 
      author={Ayush Sekhari and Karthik Sridharan and Wen Sun and Runzhe Wu},
      year={2023},
      eprint={2307.12926},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages