Using GenAI models to evaluate GenAI products.
GenAI Offline Evaluation
Using LLMs for large scale conversation evaluation.
What is the item?
We provide a rubric and a judge grades responses to provide feedback on the safety and improvement of a change.
Why is it helpful to our customers?
Large scale automated evaluations allow Abnormal to ship updates faster and safer for customers.
Why is it interesting?
Traditional ML or software evaluations don’t work well for open-ended responses. The alternative is to use GenAI itself to help review and translate outputs into quantitative performance numbers.