Hardcoding datapoints in the evaluation file is fine for a quickstart. Once you’re actually iterating, the dataset lives in Laminar and the eval file just points at it. That way anyone on the team can edit the dataset without touching code, new datapoints added from production traces flow in automatically, and the dataset versioning lives with the data instead of with the script.Documentation Index
Fetch the complete documentation index at: https://laminar.sh/docs/llms.txt
Use this file to discover all available pages before exploring further.
Point evaluate() at a Laminar dataset
Wrap the dataset name in LaminarDataset and pass that as data.
- TypeScript
- Python
data, target, and metadata fields become the datapoint’s data, target, metadata.
LaminarDataset takes an optional fetchSize / fetch_size parameter controlling how many datapoints are fetched per network round-trip (default 25). For performance, set it to a multiple of the evaluation concurrency / batch size (default 5).Getting datapoints into the dataset
Three paths, in rough order of how often they get used:- Add from a trace. On any trace in Laminar, click Add to dataset. This is how you turn production failures into regression fixtures. See Adding data.
- Upload a CSV or JSONL. Useful for bootstrapping with external data.
- Push via the SDK. See the client reference for
LaminarClient.datasets.
Custom dataset sources
If the data lives somewhere Laminar doesn’t own (a production database, an S3 bucket, a remote API), extendEvaluationDataset and implement two methods: size and get.
- TypeScript
- Python
Dataset. Laminar’s evaluation runner calls size() once, then get(i) for every index, and fans execution out in parallel.
Export evaluator outputs back into a dataset
After an evaluation completes, you can turn any subset of its rows into a new dataset. From the SQL editor, write a query againstevaluation_datapoints or spans, then click Export to Dataset. That dataset becomes your regression set for the next iteration of the prompt or model.
Next steps
Compare runs
Keep the dataset constant so comparisons across runs are apples-to-apples.
Datasets overview
How datasets are modelled in Laminar and how to populate them.
SDK reference
Full parameters for
evaluate, LaminarDataset, and EvaluationDataset.