English · 简体中文
Translated to English for v0.3.2.
How-to: Error Handling Patterns¶
Every error in PAR is ultimately one of the Types.error_category variants:
type error_category =
| Timeout
| Invalid_input of string
| External_failure of string
| Rate_limited
| Permission_denied of string
| Internal of string
[@@deriving yojson]
Each category has different semantics and requires a different handling strategy.
Pattern 1: Tool errors — retry vs. escalate¶
Tool handlers return a handler_result:
type handler_result =
| Success of Yojson.Safe.t
| Error of {
category : error_category;
message : string;
retryable : bool;
metadata : (string * Yojson.Safe.t) list;
}
Decide whether to retry based on the retryable field or category:
| Error category | Retryable? | Recommended handling |
|---|---|---|
Timeout |
true | Retry (if it is a network timeout) |
Rate_limited |
true | Retry with backoff |
External_failure |
true | Immediate retry 1-2 times |
Invalid_input |
false | Do not retry — the schema is wrong, retrying won't help |
Permission_denied |
false | Do not retry — policy rejection, retrying is still rejection |
Internal |
depends | Check message; retry if transient |
Pattern 2: LLM errors — backoff¶
OpenAI / Anthropic return 429 during peak traffic. lib/middleware/rate_limit.ml handles this automatically (per-token throttling). For HTTP 5xx errors, use the retry_policy middleware (lib/middleware/retry.ml):
let config = {
...
llm_providers = [...];
retry_policy = Some {
max_attempts = 3;
initial_delay = 1.0;
backoff = Exponential { base = 2.0; max_delay = 30.0 };
retry_on = [Timeout; Rate_limited; External_failure];
jitter = Some 0.1;
};
...
}
3 retries, exponential backoff 1s → 2s → 4s, max 30s. Note the retry_on list — do not retry Invalid_input (pointless) or Internal (could be a bug; more retries makes it worse).
Pattern 3: Cancellation propagation¶
PAR's entire stack uses Eio.Switch.t for cooperative cancellation. Runtime.close triggers the entire switch to cancel:
Eio.Switch.run (fun sw ->
let rt = Runtime.create ~config ... sw in
(* ... start a long-running task ... *)
let cancel_switch = Eio.Switch.empty () in (* nested switch *)
Eio.Fiber.fork ~sw:cancel_switch (fun () ->
ignore (Runtime.invoke rt "agent-1" "long task" : _));
Eio.Time.sleep (Time.Span.of_sec 30.0);
Eio.Switch.turn_off cancel_switch; (* interrupt agent-1 *)
(* ... continue with other work ... *)
)
If a tool handler wraps its work with Cancellation.with_timeout, it automatically responds to cancellation.
Pattern 4: Custom error metadata¶
The metadata field on Error { ...; metadata = [...] } is a [("key", json)] association list. It gets persisted on the event bus for debugging:
Error {
category = External_failure "API timeout";
message = "openai call timed out after 30s";
retryable = true;
metadata = [
("http_status", `Int 504);
("request_id", `String "req_abc123");
("endpoint", `String "/v1/chat/completions");
];
}
Downstream monitoring can read event.metadata[0] to get the request_id and look it up in the provider dashboard.
Pattern 5: Do not swallow errors¶
(* Wrong: silently discard all errors *)
match Runtime.invoke rt "agent" input with
| Ok result -> process result
| Error _ -> () (* silent failure *)
(* Right: distinguish error categories *)
match Runtime.invoke rt "agent" input with
| Ok result -> process result
| Error e ->
match e with
| Timeout -> retry_with_backoff ()
| Rate_limited -> sleep 60.0; retry ()
| Invalid_input msg -> log_input_error msg; fail ()
| _ -> failwith (Printf.sprintf "unhandled: %s" (error_to_string e))
The 6 error_category variants in lib/core/types.ml are exhaustive — missing one produces an OCaml warning. Use exhaustiveness as a safety net:
Pattern 6: Use the event bus for observability¶
Do not rely on logs alone. Subscribe to Bash_invoked / Bash_completed / Tool_failed events and automatically export metrics:
Event_bus.subscribe rt.event_bus (function
| Bash_invoked { argv; risk; _ } ->
Metrics.increment_tool_call "bash" ~tags:["risk", risk]
| Bash_completed { exit_code; duration; _ } ->
Metrics.record_tool_latency "bash" duration;
if exit_code <> 0 then Metrics.increment_tool_error "bash"
| _ -> ()
);
The Runtime.metrics_snapshot API (v0.3.0+) is also available — you don't have to subscribe to events manually.