ADR-003: Fully Async Concurrency Model#
Status#
Accepted
Context#
The backend uses FastAPI, an async-native ASGI framework. The primary I/O operation is calling the OpenAI API, which may take up to 2 minutes. The orchestrator includes retry logic with backoff delays between attempts.
The architect initially proposed a synchronous orchestrator using
time.sleep() for backoff, called from async route handlers via
asyncio.run_in_executor(None, ...).
The domain expert and devil’s advocate identified several problems with this approach:
Cancellation does not propagate. If the client disconnects or a gateway timeout fires, the async task is cancelled but the thread running the synchronous orchestrator continues, holding an LLM connection and burning resources.
Thread pool sizing. Each in-flight request occupies a thread. With a 2-minute timeout budget, a modest burst of 20 concurrent requests exhausts the default thread pool (40 threads), causing queuing delays that erode the timeout budget before the orchestrator even runs.
Sync/async mixing. If the LLM adapter uses the async OpenAI client (
AsyncOpenAI), calling it from a synchronous orchestrator requiresasyncio.run()inside the thread — creating a new event loop per call, which is an antipattern.
Decision#
The orchestrator, LLM client, and all I/O operations are fully async.
LLMPort.completeisasync def.LLMClientusesopenai.AsyncOpenAI/openai.AsyncAzureOpenAI.StandardOrchestrator.analyzeisasync def, usesasyncio.sleepfor backoff.Route handlers call
await orchestrator.analyze(...)directly.
Options Considered#
Option A: Sync orchestrator + run_in_executor (rejected)#
Pro: Simpler to reason about sequentially.
time.sleepand synchronous exception handling are straightforward.Con: Cancellation issues, thread pool exhaustion, sync/async mixing bugs. The “simplicity” is illusory — the impedance mismatch creates subtle correctness problems.
Option B: Fully async (chosen)#
Pro: Native cancellation propagation via
asyncio.Task.cancel(). No thread pool sizing concerns.asyncio.sleepis non-blocking — other requests can be served during backoff. Idiomatic FastAPI.Con: Async test fixtures require
pytest-asyncio. Slightly more ceremony in test setup.Mitigation:
pytest-asynciois lightweight and widely used.
Option C: Hybrid — async route, sync LLM call in executor (not chosen)#
Pro: Keeps the orchestrator simple.
Con: Same cancellation and thread pool problems as Option A, just with less code in the executor.
Consequences#
All port interfaces define
asyncmethods.Tests for the orchestrator use
pytest-asyncioandasync deftest functions.asyncio.sleepis patched in tests (nottime.sleep).The
openaiSDK’s async client (AsyncOpenAI) is used, which returns the same response types as the sync client.No thread pool is used for request handling. Uvicorn’s event loop handles all concurrency.
Participants#
Domain expert (identified cancellation propagation issue)
Devil’s advocate (proposed async as strictly simpler)
Architect (accepted the async model)