chat
expand_more

GenAI Offline Evaluation

Using LLMs for large scale conversation evaluation.

Shrivu Thumbnail v1 1x1

What is the item?

  • Using GenAI models to evaluate GenAI products.

  • We provide a rubric and a judge grades responses to provide feedback on the safety and improvement of a change.

Why is it helpful to our customers?

  • Large scale automated evaluations allow Abnormal to ship updates faster and safer for customers.

Why is it interesting?

  • Traditional ML or software evaluations don’t work well for open-ended responses. The alternative is to use GenAI itself to help review and translate outputs into quantitative performance numbers.