ADR-003: Fully Async Concurrency Model#

Status#

Accepted

Context#

The backend uses FastAPI, an async-native ASGI framework. The primary I/O operation is calling the OpenAI API, which may take up to 2 minutes. The orchestrator includes retry logic with backoff delays between attempts.

The architect initially proposed a synchronous orchestrator using time.sleep() for backoff, called from async route handlers via asyncio.run_in_executor(None, ...).

The domain expert and devil’s advocate identified several problems with this approach:

Cancellation does not propagate. If the client disconnects or a gateway timeout fires, the async task is cancelled but the thread running the synchronous orchestrator continues, holding an LLM connection and burning resources.
Thread pool sizing. Each in-flight request occupies a thread. With a 2-minute timeout budget, a modest burst of 20 concurrent requests exhausts the default thread pool (40 threads), causing queuing delays that erode the timeout budget before the orchestrator even runs.
Sync/async mixing. If the LLM adapter uses the async OpenAI client (AsyncOpenAI), calling it from a synchronous orchestrator requires asyncio.run() inside the thread — creating a new event loop per call, which is an antipattern.

Decision#

The orchestrator, LLM client, and all I/O operations are fully async.

LLMPort.complete is async def.
LLMClient uses openai.AsyncOpenAI / openai.AsyncAzureOpenAI.
StandardOrchestrator.analyze is async def, uses asyncio.sleep for backoff.
Route handlers call await orchestrator.analyze(...) directly.

Options Considered#

Option A: Sync orchestrator + run_in_executor (rejected)#

Pro: Simpler to reason about sequentially. time.sleep and synchronous exception handling are straightforward.
Con: Cancellation issues, thread pool exhaustion, sync/async mixing bugs. The “simplicity” is illusory — the impedance mismatch creates subtle correctness problems.

Option B: Fully async (chosen)#

Pro: Native cancellation propagation via asyncio.Task.cancel(). No thread pool sizing concerns. asyncio.sleep is non-blocking — other requests can be served during backoff. Idiomatic FastAPI.
Con: Async test fixtures require pytest-asyncio. Slightly more ceremony in test setup.
Mitigation: pytest-asyncio is lightweight and widely used.

Option C: Hybrid — async route, sync LLM call in executor (not chosen)#

Pro: Keeps the orchestrator simple.
Con: Same cancellation and thread pool problems as Option A, just with less code in the executor.

Consequences#

All port interfaces define async methods.
Tests for the orchestrator use pytest-asyncio and async def test functions.
asyncio.sleep is patched in tests (not time.sleep).
The openai SDK’s async client (AsyncOpenAI) is used, which returns the same response types as the sync client.
No thread pool is used for request handling. Uvicorn’s event loop handles all concurrency.

Participants#

Domain expert (identified cancellation propagation issue)
Devil’s advocate (proposed async as strictly simpler)
Architect (accepted the async model)