How Debugger Caching Works

The cache lookup

The cache is served from the Laminar backend. Each cached response is keyed by the trace id of the run being replayed and a hash of the span’s input.

On a replay run, a supported integration does the following for each LLM call:

Computes the input hash for the call.

Asks the Laminar backend whether a cached response exists for that (trace_id, input_hash) pair.

On a hit, serves the recorded response and never calls the model.

On a miss, calls the model live, and from that point on the run stays in live mode: every subsequent call goes live too.

So a replay run is cached up to the first miss and live after it. You control where that boundary lands with LMNR_DEBUG_CACHE_UNTIL (see setup).

The same input-hash algorithm is implemented in every supported SDK and in the Laminar backend, so a response recorded by one matches the lookup done by another.

The input hash excludes system messages

The hash is computed over all input messages except any system message.

This is deliberate. The system prompt is the thing you iterate on most while debugging an agent, and if it were part of the hash, every edit to the system prompt would change the hash of every call and break the entire cache. Excluding it means you can rewrite the system prompt and still replay the rest of the run from cache, which is exactly the workflow the debugger is built for.

Supported caching integrations

Caching is supported in these integrations:

OpenAI (Python)

Anthropic (Python)

Google GenAI (Python)

LiteLLM (Python)

AI SDK (TypeScript, all providers): requires wrapLanguageModel, see setup

In TypeScript, the AI SDK is the only caching integration today; it covers all providers the AI SDK supports. The other integrations are Python-only.

The debugger itself runs with many more integrations than this. Recording, inspecting the transcript, and replaying work with everything Laminar integrates with and with manual LLM instrumentation. Caching is the only part limited to the list above: with an unsupported integration, every LLM call on a replay run goes live rather than serving from the recorded trace.

The full loop your coding agent runs: record, inspect, replay, and evaluate.

Overview

Tracing

Signals

Debugger

Evaluations

Datasets

Platform

The cache lookup

The input hash excludes system messages

Supported caching integrations

What’s next

The debugger process

Setup

​The cache lookup

​The input hash excludes system messages

​Supported caching integrations

​What’s next

The debugger process

Setup

The cache lookup

The input hash excludes system messages

Supported caching integrations

What’s next