Braintrust Eval Script - v0 by Vercel

How do I write an eval using braintrust?

366d ago

To write an eval using Braintrust, you'll need to follow a specific structure and use the Braintrust API. I'll guide you through the process step by step, based on the information provided in the Braintrust documentation 1.

First, you'll need to import the necessary modules:

import { Eval } from "braintrust";
import { someScorer } from "autoevals"; // Import the scorer you want to use

Then, you'll create an Eval function with the following structure:

Eval(
  "Your Project Name",
  {
    data: () => {
      // Your dataset goes here
    },
    task: (input) => {
      // Your LLM call or task function goes here
    },
    scores: [
      // Your scoring functions go here
    ],
    // Optional: Add metadata about the experiment
  }
);

Let's break down each part:
a. Project Name: Replace "Your Project Name" with the name of your project.
b. Data: This function should return an array of objects, each representing a test case. For example:
data: () => { return [ { input: "Sample input 1", expected: "Expected output 1" }, { input: "Sample input 2", expected: "Expected output 2" }, // Add more test cases as needed ]; },
c. Task: This function takes an input and returns an output. This is where you'd typically make your LLM call. For example:
task: (input) => { return "Hi " + input; // Replace with your actual LLM call },
d. Scores: An array of scoring functions. You can use pre-defined scorers from autoevals or create your own. For example:
scores: [someScorer],
e. Metadata (optional): You can add metadata about the experiment, such as the model you're using or configuration values.
After creating your eval, you need to add it to the run-evals.ts file. Add a line like this:

await runEval("path/to/your/eval.ts");

Here's a complete example of a simple eval:

import { Eval } from "braintrust";
import { ExactMatch } from "autoevals";

Eval(
  "Greeting Bot",
  {
    data: () => {
      return [
        { input: "Alice", expected: "Hi Alice" },
        { input: "Bob", expected: "Hi Bob" },
      ];
    },
    task: (input) => {
      return "Hi " + input; // Replace with your LLM call
    },
    scores: [ExactMatch],
    metadata: {
      model: "gpt-3.5-turbo",
    },
  }
);

Remember to add this to your run-evals.ts file:

await runEval("path/to/greeting-bot-eval.ts");

This eval will test a simple greeting bot, checking if it correctly prepends "Hi" to the given input names.

When you run this eval, Braintrust will execute the task for each input in your dataset, compare the output to the expected output using the specified scoring function(s), and log the results.

Is there any specific part of writing evals that you'd like me to explain further?

Sources

braintrust-docs.txt

v0 Evals

366d ago

Can you retrieve my run-evals.ts file and show me how I'd add this line to run a new eval? It's the file that has an EVAL_MAP in it I believe.

366d ago

I'll retrieve the content of your run-evals.ts file and show you how to add a line to run a new eval. Let me analyze the file structure first.

After analyzing the run-evals.ts file 2, I can see that it indeed contains an EVAL_MAP. Here's how you would add a new eval to this file:

First, you need to import your new eval function at the top of the file. Let's assume you've created a new eval called greetingBotEval in a file named greeting-bot-eval.ts. Add this import statement at the top of run-evals.ts:

import { greetingBotEval } from './path/to/greeting-bot-eval'

Then, you need to add your new eval to the EVAL_MAP. The EVAL_MAP is an object where the keys are the names of the evals, and the values are the corresponding eval functions. Add your new eval like this:

const EVAL_MAP = {
  // ... existing evals ...
  GREETING_BOT: greetingBotEval,
}

The complete run-evals.ts file would look something like this (with your new eval added):

import { runRefusalsEval } from './refusals'
import { runV0BrainEval } from './v0-brain'
import { greetingBotEval } from './path/to/greeting-bot-eval'

const EVAL_MAP = {
  REFUSALS: runRefusalsEval,
  V0_BRAIN: runV0BrainEval,
  GREETING_BOT: greetingBotEval,
}

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout,
})

async function main(only: string, options: CommandOptions) {
  const evalKeys = only === 'all' ? Object.keys(EVAL_MAP) : only.split(',')

  const evals: RunEvalFn[] = []

  for (const key of evalKeys) {
    if (EVAL_MAP[key] === undefined) {
      throw new Error(`Unknown eval key: ${key}`)
    }
    evals.push(EVAL_MAP[key])
  }

  // ... rest of the file ...
}

// ... rest of the file ...

By adding your new eval to the EVAL_MAP, you make it available to be run when the script is executed. The main function will automatically include your new eval when running all evals or when specifically requested.

Remember to export your greetingBotEval function from your greeting-bot-eval.ts file so it can be imported into run-evals.ts.

Is there anything else you'd like me to explain or clarify about adding new evals to your run-evals.ts file?

Sources

run-evals.ts

v0 Evals

366d ago

okay, I want to add a new eval for response quality. I think braintrust might have some autoscorers i can use?

366d ago

First, you'll need to import the necessary modules:

import { Eval } from "braintrust";
import { someScorer } from "autoevals"; // Import the scorer you want to use

Then, you'll create an Eval function with the following structure:

Eval(
  "Your Project Name",
  {
    data: () => {
      // Your dataset goes here
    },
    task: (input) => {
      // Your LLM call or task function goes here
    },
    scores: [
      // Your scoring functions go here
    ],
    // Optional: Add metadata about the experiment
  }
);

Let's break down each part:
a. Project Name: Replace "Your Project Name" with the name of your project.
b. Data: This function should return an array of objects, each representing a test case. For example:
data: () => { return [ { input: "Sample input 1", expected: "Expected output 1" }, { input: "Sample input 2", expected: "Expected output 2" }, // Add more test cases as needed ]; },
c. Task: This function takes an input and returns an output. This is where you'd typically make your LLM call. For example:
task: (input) => { return "Hi " + input; // Replace with your actual LLM call },
d. Scores: An array of scoring functions. You can use pre-defined scorers from autoevals or create your own. For example:
scores: [someScorer],
e. Metadata (optional): You can add metadata about the experiment, such as the model you're using or configuration values.
After creating your eval, you need to add it to the run-evals.ts file. Add a line like this:

await runEval("path/to/your/eval.ts");

Here's a complete example of a simple eval:

import { Eval } from "braintrust";
import { ExactMatch } from "autoevals";

Eval(
  "Greeting Bot",
  {
    data: () => {
      return [
        { input: "Alice", expected: "Hi Alice" },
        { input: "Bob", expected: "Hi Bob" },
      ];
    },
    task: (input) => {
      return "Hi " + input; // Replace with your LLM call
    },
    scores: [ExactMatch],
    metadata: {
      model: "gpt-3.5-turbo",
    },
  }
);

Remember to add this to your run-evals.ts file:

await runEval("path/to/greeting-bot-eval.ts");

This eval will test a simple greeting bot, checking if it correctly prepends "Hi" to the given input names.

When you run this eval, Braintrust will execute the task for each input in your dataset, compare the output to the expected output using the specified scoring function(s), and log the results.

Is there any specific part of writing evals that you'd like me to explain further?