Evals

Measuring the performance of LLM products
Title Speaker Date Description
A Deep Dive on LLM Evaluation Hailey Schoelkopf Jul 8, 2024 Doing LLM evaluation right is crucial, but very challenging! We’ll cover the basics of how LLM evaluation can be performed, many (but not all) of the ways it can go wrong. We’ll also discuss tools available to make life easier, including the LM Evaluation Harness, along with domain-specific use cases.
Inspect, An OSS framework for LLM evals JJ Allaire Jul 6, 2024 This talk will cover using and extending Inspect, a new OSS Python framework for LLM evals. We’ll walk through the core concepts and design of Inspect and demonstrate its use for a variety of evaluation tasks. Inspect makes it very straightforward to implement simple “benchmark” style evaluations, but also has the depth and flexibility to support highly complex evals. Inspect supports both…
No matching items