Query Formulation with Query Auto-Completion

In our recent CIKM paper An Eye-tracking Study of User Interactions with Query Auto Completion (QAC), we looked at how searchers examine and interact query completions, and what this tells us about measuring QAC ranking quality. Accurately interpreting user interactions would allow us to optimize search for each individual user. However, measuring performance is a real bottleneck, because searcher behaviour itself is affected by the searcher’s previous experience, expectations, and the search engine.

Here’s an example video that shows how searchers examine QAC:


For full details, take a look at the paper here.


RUSSIR 2014: Lectures on Online Experimentation for Information Retrieval

This summer, I was invited to teach at RuSSIR – the Russian Summer School in Information Retrieval in Nizhny Novgorod. I very much enjoyed the enthusiasm and insightful questions of the students. Thanks again to all who attended my lectures, and to the organizers!

My 5 lectures broadly covered online evaluation and learning to rank for information retrieval. This includes topics such as AB-testing, interleaved comparisons, estimation of online metrics from exploration data, and bandits for online learning to rank. If you want to review some of the material, you can now access my slides here: http://1drv.ms/1yYLNlp.

To get you started, here are the slides from the first lecture:

Fast and Reliably Online Learning to Rank for Information Retrieval

Thesis Cover: Fast and Reliable Online Learning to Rank for Information Retrieval

Fast and Reliable Online Learning to Rank for Information Retrieval. Download a copy (PDF, 2.9 MB).

My PhD thesis is out! Download a copy from this page, or get in touch if you would like a printed copy. The public defense will be on May 28. at the Agnietenkapel in Amsterdam.


The amount of digital data we produce every day far surpasses our ability to process this data, and finding useful information in this constant flow of data has become one of the major challenges of the 21st century. Search engines are one way of accessing large data collections. Their algorithms have evolved far beyond simply matching search queries to sets of documents. Today’s most sophisticated search engines combine hundreds of relevance signals to provide the best possible results for each searcher.

Current approaches for tuning the parameters of search engines can be highly effective. However, they typically require considerable expertise and manual effort. They rely on supervised learning to rank, meaning that they learn from manually annotated examples of relevant documents for given queries. Obtaining large quantities of sufficiently accurate manual annotations is becoming increasingly difficult, especially for personalized search, access to sensitive data, or search in settings that change over time.

In this thesis, I develop new online learning to rank techniques, based on insights from reinforcement learning. In contrast to supervised approaches, these methods allow search engines to learn directly from users’ interactions. User interactions can typically be observed easily and cheaply, and reflect the preferences of real users. Interpreting user interactions and learning from them is challenging, because they can be biased and noisy. The contributions of this thesis include a novel interleaved comparison method, called probabilistic interleave, that allows unbiased comparisons of search engine result rankings, and methods for learning quickly and effectively from the resulting relative feedback.

The obtained analytical and experimental results show how search engines can effectively learn from user interactions. In the future, these and similar techniques can open up new ways for gaining useful information from ever larger amounts of data.

Practical Online Retrieval Evaluation

Thank you to all those who attended the ECIR 2013 tutorial on Practical Online Retrieval Evaluation that Filip Radlinski and I gave today. We updated some of the slides from the version that was provided on the conference USB stick. Below are the updated slides and links to additional resources.

For additional details and further reading, please see the references cited in the slides. A detailed list of references is included at the end of the presentation.

Balancing Exploration and Exploitation in Online Learning to Rank for IR

Our article “Balancing Exploration and Exploitation in Online Learning to Rank for IR” (to appear in Information Retrieval) addresses the question of whether online learning to rank systems need to balance exploration and exploitation.

Intuitively, when learning online from natural user interactions with a search system, a search engine needs to exploit what it has learned so far, in order to fulfill user expectations in search performance as well as possible. At the same time, it needs to explore, to ensure that the collected feedback is as informative as possible for future learning. Often, the results that would be most useful for the current search are not the same as those most useful for learning.

In our article we formalize the above problem as an exploration-exploitation dilemma. We develop two approaches for balancing exploration and exploitation in an online learning to rank for IR setting, one based on a pairwise, one based on a listwise learning approach. We show that, as hypothesized, balancing exploration and exploitation improves online performance in both types of approaches. However, the optimal balance depends on the approach, and on other factors, such as the amount of noise in user feedback.

A Probabilistic Method for Inferring Preferences from Clicks

At CIKM 2011, I presented our paper “A Probabilistic Method for Inferring Preferences from Clicks“. It proposes a new method, called Probabilistic Interleaving, for inferring feedback for comparing rankers using click data.

Interleaved comparison methods have been developed recently to compare two candidate rankings (e.g., two similar versions of the ranking algorithm of a search engine) by observing user behavior (in particular clicks on displayed result documents). This form of evaluation is highly scalable, as click data can be collected at very low cost, as long as the search engine is used. It is also assumed to more accurately capture true user preferences than the most common form of evaluation – manual editorial judgments.

Our proposed method addresses problems of sensitivity and fidelity that we identify in previous methods. It is based on a probabilistic interleaving process, where a result list that combines documents from both candidate rankers is generated in such a way that documents that were ranked highly by either ranker are likely to be displayed towards the top of the interleaved list. This interleaved list is displayed to the user, instead of showing either original ranking. User clicks on documents are then assigned to the original rankers in a way that reflects how highly each ranker initially placed these documents. The result is a comparison method that naturally re-weights clicks according to the distance between the original rankings. It is sensitive to even small changes between rankings and at the same time only infers large differences when the distance between the original rankings is large.

An intriguing direction for follow-up work is to investigate how effective the inferred feedback is beyond evaluation setups, e.g., for online learning to rank.