The cache lookup
The cache is served from the Laminar backend. Each cached response is keyed by the trace id of the run being replayed and a hash of the span’s input. On a replay run, a supported integration does the following for each LLM call:- Computes the input hash for the call.
- Asks the Laminar backend whether a cached response exists for that
(trace_id, input_hash)pair. - On a hit, serves the recorded response and never calls the model.
- On a miss, calls the model live, and from that point on the run stays in live mode: every subsequent call goes live too.
LMNR_DEBUG_CACHE_UNTIL (see setup).
The same input-hash algorithm is implemented in every supported SDK and in the Laminar backend, so a response recorded by one matches the lookup done by another.
The input hash excludes system messages
The hash is computed over all input messages except any system message. This is deliberate. The system prompt is the thing you iterate on most while debugging an agent, and if it were part of the hash, every edit to the system prompt would change the hash of every call and break the entire cache. Excluding it means you can rewrite the system prompt and still replay the rest of the run from cache, which is exactly the workflow the debugger is built for.Supported caching integrations
Caching is supported in these integrations:- OpenAI (Python)
- Anthropic (Python)
- Google GenAI (Python)
- LiteLLM (Python)
- AI SDK (TypeScript, all providers): requires
wrapLanguageModel, see setup
What’s next
The debugger process
The full loop your coding agent runs: record, inspect, and replay.
Setup
Environment variables and the AI SDK special case.
