qfa.adapters.llm_client#

LLM client adapter using LiteLLM for unified provider access.

Classes

LiteLLMClient(model, api_key, api_base, ...)

LLM adapter satisfying LLMPort via LiteLLM.

class qfa.adapters.llm_client.LiteLLMClient(model: str, api_key: str, api_base: str, api_version: str, chars_per_token: int, max_total_tokens: int)[source]#

Bases: LLMPort

LLM adapter satisfying LLMPort via LiteLLM.

Routes to any LLM provider based on the model string prefix (e.g. "azure/gpt-4", "azure_ai/mistral-large-2411"). Calculates per-call cost using LiteLLM’s built-in cost map or custom pricing registered via litellm.register_model().

Parameters:

model (str) – LiteLLM model identifier (e.g. "azure_ai/mistral-large-2411").
api_key (str) – API key for the provider.
api_base (str) – Base URL for the provider endpoint. Empty string if not needed.
api_version (str) – API version string. Empty string if not needed.

async complete(system_message: str, user_message: str, tenant_id: str, response_model: type[T_Response], timeout: float = 20.0) → LLMResponse[source]#

Send a completion request via LiteLLM.

Parameters:

system_message (str) – The system-level instruction for the model.
user_message (str) – The user-level message to complete.
timeout (float) – Maximum time in seconds to wait for a response.
tenant_id (str) – Tenant identifier passed as user for audit trail.

Returns:

The model’s response including token usage and cost.

Return type:

LLMResponse

Raises:

LLMTimeoutError – When the provider does not respond in time.
LLMRateLimitError – When the provider returns a rate-limit response.
LLMError – For any other provider error or empty response.