How do I write an eval using braintrust?
To write an eval using Braintrust, you'll need to follow a specific structure and use the Braintrust API. I'll guide you through the process step by step, based on the information provided in the Braintrust documentation 1.
import { Eval } from "braintrust";
import { someScorer } from "autoevals"; // Import the scorer you want to use
Eval
function with the following structure:Eval(
"Your Project Name",
{
data: () => {
// Your dataset goes here
},
task: (input) => {
// Your LLM call or task function goes here
},
scores: [
// Your scoring functions go here
],
// Optional: Add metadata about the experiment
}
);
Let's break down each part:
a. Project Name: Replace "Your Project Name" with the name of your project.
b. Data: This function should return an array of objects, each representing a test case. For example:
data: () => {
return [
{ input: "Sample input 1", expected: "Expected output 1" },
{ input: "Sample input 2", expected: "Expected output 2" },
// Add more test cases as needed
];
},
c. Task: This function takes an input and returns an output. This is where you'd typically make your LLM call. For example:
task: (input) => {
return "Hi " + input; // Replace with your actual LLM call
},
d. Scores: An array of scoring functions. You can use pre-defined scorers from autoevals
or create your own. For example:
scores: [someScorer],
e. Metadata (optional): You can add metadata about the experiment, such as the model you're using or configuration values.
run-evals.ts
file. Add a line like this:await runEval("path/to/your/eval.ts");
Here's a complete example of a simple eval:
import { Eval } from "braintrust";
import { ExactMatch } from "autoevals";
Eval(
"Greeting Bot",
{
data: () => {
return [
{ input: "Alice", expected: "Hi Alice" },
{ input: "Bob", expected: "Hi Bob" },
];
},
task: (input) => {
return "Hi " + input; // Replace with your LLM call
},
scores: [ExactMatch],
metadata: {
model: "gpt-3.5-turbo",
},
}
);
Remember to add this to your run-evals.ts
file:
await runEval("path/to/greeting-bot-eval.ts");
This eval will test a simple greeting bot, checking if it correctly prepends "Hi" to the given input names.
When you run this eval, Braintrust will execute the task for each input in your dataset, compare the output to the expected output using the specified scoring function(s), and log the results.
Is there any specific part of writing evals that you'd like me to explain further?
Sources
Can you retrieve my run-evals.ts file and show me how I'd add this line to run a new eval? It's the file that has an EVAL_MAP in it I believe.
I'll retrieve the content of your run-evals.ts
file and show you how to add a line to run a new eval. Let me analyze the file structure first.
After analyzing the run-evals.ts
file 2, I can see that it indeed contains an EVAL_MAP
. Here's how you would add a new eval to this file:
greetingBotEval
in a file named greeting-bot-eval.ts
. Add this import statement at the top of run-evals.ts
:import { greetingBotEval } from './path/to/greeting-bot-eval'
EVAL_MAP
. The EVAL_MAP
is an object where the keys are the names of the evals, and the values are the corresponding eval functions. Add your new eval like this:const EVAL_MAP = {
// ... existing evals ...
GREETING_BOT: greetingBotEval,
}
run-evals.ts
file would look something like this (with your new eval added):import { runRefusalsEval } from './refusals'
import { runV0BrainEval } from './v0-brain'
import { greetingBotEval } from './path/to/greeting-bot-eval'
const EVAL_MAP = {
REFUSALS: runRefusalsEval,
V0_BRAIN: runV0BrainEval,
GREETING_BOT: greetingBotEval,
}
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
})
async function main(only: string, options: CommandOptions) {
const evalKeys = only === 'all' ? Object.keys(EVAL_MAP) : only.split(',')
const evals: RunEvalFn[] = []
for (const key of evalKeys) {
if (EVAL_MAP[key] === undefined) {
throw new Error(`Unknown eval key: ${key}`)
}
evals.push(EVAL_MAP[key])
}
// ... rest of the file ...
}
// ... rest of the file ...
By adding your new eval to the EVAL_MAP
, you make it available to be run when the script is executed. The main
function will automatically include your new eval when running all evals or when specifically requested.
Remember to export your greetingBotEval
function from your greeting-bot-eval.ts
file so it can be imported into run-evals.ts
.
Is there anything else you'd like me to explain or clarify about adding new evals to your run-evals.ts
file?
Sources
okay, I want to add a new eval for response quality. I think braintrust might have some autoscorers i can use?