qfa.adapters.llm_client#
LLM client adapter using LiteLLM for unified provider access.
Classes
|
LLM adapter satisfying LLMPort via LiteLLM. |
- class qfa.adapters.llm_client.LiteLLMClient(model: str, api_key: str, api_base: str, api_version: str, chars_per_token: int, max_total_tokens: int)[source]#
Bases:
LLMPortLLM adapter satisfying LLMPort via LiteLLM.
Routes to any LLM provider based on the model string prefix (e.g.
"azure/gpt-4","azure_ai/mistral-large-2411"). Calculates per-call cost using LiteLLM’s built-in cost map or custom pricing registered vialitellm.register_model().- Parameters:
- async complete(system_message: str, user_message: str, tenant_id: str, response_model: type[T_Response], timeout: float = 20.0) LLMResponse[source]#
Send a completion request via LiteLLM.
- Parameters:
- Returns:
The model’s response including token usage and cost.
- Return type:
- Raises:
LLMTimeoutError – When the provider does not respond in time.
LLMRateLimitError – When the provider returns a rate-limit response.
LLMError – For any other provider error or empty response.