Our article “Balancing Exploration and Exploitation in Online Learning to Rank for IR” (to appear in Information Retrieval) addresses the question of whether online learning to rank systems need to balance exploration and exploitation.
Intuitively, when learning online from natural user interactions with a search system, a search engine needs to exploit what it has learned so far, in order to fulfill user expectations in search performance as well as possible. At the same time, it needs to explore, to ensure that the collected feedback is as informative as possible for future learning. Often, the results that would be most useful for the current search are not the same as those most useful for learning.
In our article we formalize the above problem as an exploration-exploitation dilemma. We develop two approaches for balancing exploration and exploitation in an online learning to rank for IR setting, one based on a pairwise, one based on a listwise learning approach. We show that, as hypothesized, balancing exploration and exploitation improves online performance in both types of approaches. However, the optimal balance depends on the approach, and on other factors, such as the amount of noise in user feedback.