Evals
Measuring the performance of LLM products
Title | Speaker | Date | Description |
---|---|---|---|
A Deep Dive on LLM Evaluation | Hailey Schoelkopf | Jul 8, 2024 | Doing LLM evaluation right is crucial, but very challenging! We’ll cover the basics of how LLM evaluation can be performed, many (but not all) of the ways it can go wrong. We’ll also discuss tools available to make life easier, including the LM Evaluation Harness, along with domain-specific use cases. |
Inspect, An OSS framework for LLM evals | JJ Allaire | Jul 6, 2024 | This talk will cover using and extending Inspect, a new OSS Python framework for LLM evals. We’ll walk through the core concepts and design of Inspect and demonstrate its use for a variety of evaluation tasks. Inspect makes it very straightforward to implement simple “benchmark” style evaluations, but also has the depth and flexibility to support highly complex evals. Inspect supports both… |
No matching items