# ADR-003: Fully Async Concurrency Model

## Status

Accepted

## Context

The backend uses FastAPI, an async-native ASGI framework. The primary I/O
operation is calling the OpenAI API, which may take up to 2 minutes. The
orchestrator includes retry logic with backoff delays between attempts.

The architect initially proposed a synchronous orchestrator using
`time.sleep()` for backoff, called from async route handlers via
`asyncio.run_in_executor(None, ...)`.

The domain expert and devil's advocate identified several problems with
this approach:

1. **Cancellation does not propagate.** If the client disconnects or a
   gateway timeout fires, the async task is cancelled but the thread
   running the synchronous orchestrator continues, holding an LLM
   connection and burning resources.
2. **Thread pool sizing.** Each in-flight request occupies a thread. With
   a 2-minute timeout budget, a modest burst of 20 concurrent requests
   exhausts the default thread pool (40 threads), causing queuing delays
   that erode the timeout budget before the orchestrator even runs.
3. **Sync/async mixing.** If the LLM adapter uses the async OpenAI client
   (`AsyncOpenAI`), calling it from a synchronous orchestrator requires
   `asyncio.run()` inside the thread — creating a new event loop per call,
   which is an antipattern.

## Decision

The orchestrator, LLM client, and all I/O operations are fully async.

- `LLMPort.complete` is `async def`.
- `LLMClient` uses `openai.AsyncOpenAI` / `openai.AsyncAzureOpenAI`.
- `StandardOrchestrator.analyze` is `async def`, uses `asyncio.sleep` for
  backoff.
- Route handlers call `await orchestrator.analyze(...)` directly.

## Options Considered

### Option A: Sync orchestrator + run_in_executor (rejected)

- **Pro**: Simpler to reason about sequentially. `time.sleep` and synchronous
  exception handling are straightforward.
- **Con**: Cancellation issues, thread pool exhaustion, sync/async mixing
  bugs. The "simplicity" is illusory — the impedance mismatch creates subtle
  correctness problems.

### Option B: Fully async (chosen)

- **Pro**: Native cancellation propagation via `asyncio.Task.cancel()`.
  No thread pool sizing concerns. `asyncio.sleep` is non-blocking — other
  requests can be served during backoff. Idiomatic FastAPI.
- **Con**: Async test fixtures require `pytest-asyncio`. Slightly more
  ceremony in test setup.
- **Mitigation**: `pytest-asyncio` is lightweight and widely used.

### Option C: Hybrid — async route, sync LLM call in executor (not chosen)

- **Pro**: Keeps the orchestrator simple.
- **Con**: Same cancellation and thread pool problems as Option A, just
  with less code in the executor.

## Consequences

- All port interfaces define `async` methods.
- Tests for the orchestrator use `pytest-asyncio` and `async def` test
  functions.
- `asyncio.sleep` is patched in tests (not `time.sleep`).
- The `openai` SDK's async client (`AsyncOpenAI`) is used, which returns
  the same response types as the sync client.
- No thread pool is used for request handling. Uvicorn's event loop handles
  all concurrency.

## Participants

- Domain expert (identified cancellation propagation issue)
- Devil's advocate (proposed async as strictly simpler)
- Architect (accepted the async model)