Evals

Measuring the performance of LLM products

Title	Speaker	Date	Description
A Deep Dive on LLM Evaluation	Hailey Schoelkopf	Jul 8, 2024	Doing LLM evaluation right is crucial, but very challenging! We’ll cover the basics of how LLM evaluation can be performed, many (but not all) of the ways it can go wrong. We’ll also discuss tools available to make life easier, including the LM Evaluation Harness, along with domain-specific use cases.
Inspect, An OSS framework for LLM evals	JJ Allaire	Jul 6, 2024	This talk will cover using and extending Inspect, a new OSS Python framework for LLM evals. We’ll walk through the core concepts and design of Inspect and demonstrate its use for a variety of evaluation tasks. Inspect makes it very straightforward to implement simple “benchmark” style evaluations, but also has the depth and flexibility to support highly complex evals. Inspect supports both…