Inspect AI
- Framework for evaluating and monitoring LLMs
- You will need to create a task
Create a task
- dataset: test case input and target responses
- solver: chat instance using InspectAI evaluation framework
- scorer: grade responses
Task: dataset
input, target
What is 2 + 2?, 4
What is 10 * 5?, 50
Task: solver + scorer
from chatlas import ChatOpenAI
from inspect_ai import Task, task
from inspect_ai.dataset import csv_dataset
from inspect_ai.scorer import model_graded_qa
chat = ChatOpenAI()
@task
def my_eval():
return Task(
dataset=csv_dataset("code/lecture06/evals/my_eval_dataset.csv"),
solver=chat.to_solver(),
scorer=model_graded_qa(model="openai/gpt-4o-mini"),
)
Get eval results
inspect eval code/lecture06/evals/evals.py