Less is Less: When Are Snippets Insufficient for Human vs Machine Relevance Estimation?

Gabriella Kazai; Bhaskar Mitra; Anlei Dong; Nick Craswell; Linjun Yang

Less is Less: When Are Snippets Insufficient for Human vs Machine Relevance Estimation?

Gabriella Kazai ,
Bhaskar Mitra ,
Anlei Dong ,
Nick Craswell ,
Linjun Yang

European Conference on Information Retrieval | November 2021

Published by Springer | Organized by BCS-IRSG

Preprint | PDF

Download BibTex

Traditional information retrieval (IR) ranking models process the full text of documents. Newer models based on Transformers, however, would incur a high computational cost when processing full text, so typically use query-biased snippets instead. The model’s summary of a document based on URL, title, and snippet (UTS) is akin to the summaries that appear on a search engine results page (SERP) to help searchers decide which result to click. This raises questions about when such summaries are sufficient for relevance estimation, both for the ranking model and for the human assessor, and whether humans and machines benefit from the full text in similar cases and in similar ways. To answer these questions, we study human and neural model based relevance assessments on 12k query-documents sampled from a commercial search engine log, where we expose only the document summaries or also the full documents to the assessors. We compare changes in the relevance assessments over a range of query and document properties, such as the length or type (e.g., navigational, question-answering) of query. Our findings show that the full text of documents is more beneficial to humans than for a BERT model, where most of the benefit is seen for tail and long queries, implying that more work is necessary on BERT style models.