Skip to main content

Documentation Index

Fetch the complete documentation index at: https://laminar.sh/docs/llms.txt

Use this file to discover all available pages before exploring further.

Hardcoding datapoints in the evaluation file is fine for a quickstart. Once you’re actually iterating, the dataset lives in Laminar and the eval file just points at it. That way anyone on the team can edit the dataset without touching code, new datapoints added from production traces flow in automatically, and the dataset versioning lives with the data instead of with the script.

Point evaluate() at a Laminar dataset

Wrap the dataset name in LaminarDataset and pass that as data.
import { evaluate, LaminarDataset } from '@lmnr-ai/lmnr';

evaluate({
  data: new LaminarDataset('capitals-of-the-world'),
  executor: capitalOfCountry,
  evaluators: { accuracy, lengthOk },
  groupName: 'capitals',
});
The dataset name matches what you see in the Datasets section of Laminar. Each dataset row’s data, target, and metadata fields become the datapoint’s data, target, metadata.
LaminarDataset takes an optional fetchSize / fetch_size parameter controlling how many datapoints are fetched per network round-trip (default 25). For performance, set it to a multiple of the evaluation concurrency / batch size (default 5).

Getting datapoints into the dataset

Three paths, in rough order of how often they get used:
  • Add from a trace. On any trace in Laminar, click Add to dataset. This is how you turn production failures into regression fixtures. See Adding data.
  • Upload a CSV or JSONL. Useful for bootstrapping with external data.
  • Push via the SDK. See the client reference for LaminarClient.datasets.

Custom dataset sources

If the data lives somewhere Laminar doesn’t own (a production database, an S3 bucket, a remote API), extend EvaluationDataset and implement two methods: size and get.
import { EvaluationDataset, evaluate } from '@lmnr-ai/lmnr';

class DatabaseDataset extends EvaluationDataset {
  constructor(private rows: Array<{ country: string; capital: string }>) {
    super();
  }

  async size() {
    return this.rows.length;
  }

  async get(index: number) {
    const row = this.rows[index];
    return { data: { country: row.country }, target: row.capital };
  }
}

evaluate({
  data: new DatabaseDataset(await loadFromDb()),
  executor: capitalOfCountry,
  evaluators: { accuracy },
});
The class is modelled on PyTorch’s Dataset. Laminar’s evaluation runner calls size() once, then get(i) for every index, and fans execution out in parallel.

Export evaluator outputs back into a dataset

After an evaluation completes, you can turn any subset of its rows into a new dataset. From the SQL editor, write a query against evaluation_datapoints or spans, then click Export to Dataset. That dataset becomes your regression set for the next iteration of the prompt or model.

Next steps

Compare runs

Keep the dataset constant so comparisons across runs are apples-to-apples.

Datasets overview

How datasets are modelled in Laminar and how to populate them.

SDK reference

Full parameters for evaluate, LaminarDataset, and EvaluationDataset.