Cross-cutting concerns#
Things that don’t belong to a single component — they show up at multiple layers.
Anonymisation round-trip#
For every operation that reaches the LLM, the orchestrator wraps the user-facing text in an anonymise → call → de-anonymise sandwich:
sequenceDiagram
participant route as Route handler
participant orch as Orchestrator
participant anon as PresidioAnonymizer
participant llm as LLMPort
route->>orch: analyze(request, deadline)
orch->>anon: anonymize(user_message)
anon-->>orch: (anonymised_text, mapping)
orch->>llm: complete(system_message, anonymised_text, …)
llm-->>orch: structured response
orch->>anon: deanonymize(response_json, mapping)
anon-->>orch: response_with_pii_restored
orch-->>route: result
Notes:
The mapping lives in memory for the request and is discarded when the orchestrator method returns.
The de-anonymise step runs over the serialised response — substitutions are textual, so the round-trip is a string replacement, not a structured walk.
Call context and usage tracking#
qfa.services.call_context defines a ContextVar[CallContext | None] and a call_scope(tenant_id, operation) async context manager. Every public orchestrator method enters a scope; the TrackingLLMAdapter reads the context to attribute each LLM call to a tenant and operation.
Consequence: any new code path that calls LLMPort.complete outside an orchestrator method will raise MissingCallScopeError at runtime. The tracking adapter refuses to record an untyped call.
Deadlines, timeouts, retries#
Layer |
Concern |
Mechanism |
|---|---|---|
Route handler |
Per-request deadline |
|
Orchestrator |
Deadline check |
Before each LLM call: if remaining time is negative, raise |
Adapter ( |
Retry on transient errors |
|
Adapter ( |
Per-call timeout |
Passed through to |
Adapter ( |
Token budget guard |
Estimates |
Retry policy and token budget belong to the adapter because both are model-specific (different LiteLLM-routed models have different context windows and rate-limit behaviour).
Error → HTTP mapping#
The exception handlers in qfa.api.app translate domain errors into HTTP responses:
Exception |
HTTP |
|
|---|---|---|
Missing / invalid bearer token |
401 |
|
Pydantic validation failure |
422 |
|
413 |
|
|
504 |
|
|
|
422 |
|
|
502 |
|
502 |
|
|
|
503 |
|
Usage tracking disabled |
503 |
|
Unhandled |
500 |
|
All responses share the same envelope shape with a server-generated request_id.
Logging policy#
Hard prohibitions — never log at any level:
Feedback record content (
record.text/record.content)User prompt (
request.prompt) — log the character count insteadAssembled system or user messages sent to the LLM
LLM response text
API key values (protected by
SecretStr)
Safe to log: request_id, tenant_id, operation, record counts, estimated tokens, attempt numbers, model name, durations, HTTP status codes, prompt_tokens, completion_tokens, cost.
See Observability for what each log statement looks like at runtime.