32nd International Conference on Machine Learning (ICML), Lille 2015

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

author: Pratik Gajane, SequeL lab, INRIA Lille - Nord Europe
published: Sept. 27, 2015, recorded: July 2015, views: 2286

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Lecture popularity: You need to login to cast your vote.

Description

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

Comment:
Name:
Email address:
URL:

make sure you have javascript enabled or clear this field:

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

See Also:

Related content

Report a problem or upload files

Description

Link this page

Write your own review or comment: