# ADR-003: Fully Async Concurrency Model ## Status Accepted ## Context The backend uses FastAPI, an async-native ASGI framework. The primary I/O operation is calling the OpenAI API, which may take up to 2 minutes. The orchestrator includes retry logic with backoff delays between attempts. The architect initially proposed a synchronous orchestrator using `time.sleep()` for backoff, called from async route handlers via `asyncio.run_in_executor(None, ...)`. The domain expert and devil's advocate identified several problems with this approach: 1. **Cancellation does not propagate.** If the client disconnects or a gateway timeout fires, the async task is cancelled but the thread running the synchronous orchestrator continues, holding an LLM connection and burning resources. 2. **Thread pool sizing.** Each in-flight request occupies a thread. With a 2-minute timeout budget, a modest burst of 20 concurrent requests exhausts the default thread pool (40 threads), causing queuing delays that erode the timeout budget before the orchestrator even runs. 3. **Sync/async mixing.** If the LLM adapter uses the async OpenAI client (`AsyncOpenAI`), calling it from a synchronous orchestrator requires `asyncio.run()` inside the thread — creating a new event loop per call, which is an antipattern. ## Decision The orchestrator, LLM client, and all I/O operations are fully async. - `LLMPort.complete` is `async def`. - `LLMClient` uses `openai.AsyncOpenAI` / `openai.AsyncAzureOpenAI`. - `StandardOrchestrator.analyze` is `async def`, uses `asyncio.sleep` for backoff. - Route handlers call `await orchestrator.analyze(...)` directly. ## Options Considered ### Option A: Sync orchestrator + run_in_executor (rejected) - **Pro**: Simpler to reason about sequentially. `time.sleep` and synchronous exception handling are straightforward. - **Con**: Cancellation issues, thread pool exhaustion, sync/async mixing bugs. The "simplicity" is illusory — the impedance mismatch creates subtle correctness problems. ### Option B: Fully async (chosen) - **Pro**: Native cancellation propagation via `asyncio.Task.cancel()`. No thread pool sizing concerns. `asyncio.sleep` is non-blocking — other requests can be served during backoff. Idiomatic FastAPI. - **Con**: Async test fixtures require `pytest-asyncio`. Slightly more ceremony in test setup. - **Mitigation**: `pytest-asyncio` is lightweight and widely used. ### Option C: Hybrid — async route, sync LLM call in executor (not chosen) - **Pro**: Keeps the orchestrator simple. - **Con**: Same cancellation and thread pool problems as Option A, just with less code in the executor. ## Consequences - All port interfaces define `async` methods. - Tests for the orchestrator use `pytest-asyncio` and `async def` test functions. - `asyncio.sleep` is patched in tests (not `time.sleep`). - The `openai` SDK's async client (`AsyncOpenAI`) is used, which returns the same response types as the sync client. - No thread pool is used for request handling. Uvicorn's event loop handles all concurrency. ## Participants - Domain expert (identified cancellation propagation issue) - Devil's advocate (proposed async as strictly simpler) - Architect (accepted the async model)