WS17: Human-Centered Evaluation and Auditing of Language Models
11:10 – 12:40 | 14:10 – 15:40 | 16:20 – 17:50
Yu Lu Liu, Wesley Hanwen Deng, Michelle S. Lam, Motahhare Eslami, Juho Kim, Q. Vera Liao, Wei Xu, Jekaterina Novikova, Ziang Xiao
The recent advancements in Large Language Models (LLMs) have significantly impacted numerous, and will impact more, real-world applications. However, these models also pose significant risks to individuals and society. To mitigate these issues and guide future model development, responsible evaluation and auditing of LLMs are essential. This workshop aims to address the current «evaluation crisis» in LLM research and practice by bringing together HCI and AI researchers and practitioners to rethink LLM evaluation and auditing from a human-centered perspective. The workshop will explore topics around understanding stakeholders’ needs and goals with evaluation and auditing LLMs, establishing human-centered evaluation and auditing methods, developing tools and resources to support these methods, building community and fostering collaboration. By soliciting papers, organizing invited keynote and panel, and facilitating group discussions, this workshop aims to develop a future research agenda for addressing the challenges in LLM evaluation and auditing. Following a successful first iteration of this workshop at CHI 2024, we introduce the theme of «mind the context» for this second iteration, where participants will be encouraged to tackle the challenges and nuances of LLM evaluation and auditing in specific contexts.