LLM as Judge

  • Updated

Providing clear and insightful feedback is crucial for improving conversation quality and maintaining high standards in AI-customer interactions. We use fine-tuned LLMs to generate high-quality feedback on the success of the voice-based interaction with the AI assistant based on various criteria (LLM as Judge).

Why Useful

1. Great Feedback for Prompt Optimization:
Provides valuable insights to refine prompts effectively.

2. Monitor Quality of Calls:
Helps ensure consistent and high-quality interactions during client calls.

Judge Criteria
Feedback Criteria Answers the question / Content.
Persona Question: Does the agent talk about itself (its persona) as defined in its prompt?
Style Question: Is the style of the conversation consistent with the prompt (tone, wording like professional or casual, etc.)?
Steps Question: Does the agent follow the steps outlined in the prompt and in the correct order?
No Repetition Question: Does the agent avoid unnecessary repetition unless requested by the human?
Objections Question: Does the agent handle objections as defined in the prompt?
Objection Not Defined Question: How does the agent handle objections not mentioned in the prompt?
Knowledge Question: Does the agent use relevant knowledge beyond what's in its prompt if needed?
Goal Achievement Question: Did the agent achieve the call objective specified in its prompt?
Appointment Handling Question: If the user wants to book an appointment, are date, time, and timezone handled correctly?
User Sentiment Question: Was the human sentiment positive during the call?
Agent Sentiment Question: Was the agent sentiment positive during the call?
Call Completion Question: Did the call end with a proper goodbye instead of being cut off?
Answered by Human Question: Was the call answered by a human?
Call Summary Concise summary of the call (max 500 characters).

 

LLM as Judge data is listed for each individual call after it's completion in the call logs of the AI assistant.

Assistant Teams Image