Toward Understanding WH-Questions: A Statistical Analysis

Workshop on Machine Learning, Information Retrieval, and User Modeling, Sonthofen, Germany |

We describe research centering on the statistical analysis of WH-questions. This work is motivated by the long-term goal of enhancing the performance of information retrieval systems. We identified informational goals associated with users’ queries posed to an Internet resource, and built a statistical model which infers these informational goals from shallow linguistic features of user queries. This model was build by applying supervised machine learning techniques. The linguistic features were extracted from the queries and from the output of a natural language parser, and the high-level informational goals were identified by professional taggers.