Commit 72d001fa authored by Your Name's avatar Your Name

Completed refactoring of the providers module

parent d1079827
......@@ -44,13 +44,34 @@ AISBF is a modular proxy server for managing multiple AI provider integrations.
## Directory Structure
```
geminiproxy/
aisbf/
├── aisbf/ # Main Python module
│ ├── __init__.py # Module initialization with exports
│ ├── config.py # Configuration management
│ ├── models.py # Pydantic models
│ ├── providers.py # Provider handlers
│ └── handlers.py # Request handlers
│ ├── handlers.py # Request handlers
│ ├── auth/ # Authentication modules
│ │ ├── __init__.py # Auth module exports
│ │ ├── kiro.py # Kiro auth manager
│ │ ├── claude.py # Claude OAuth2 auth (was claude_auth.py)
│ │ └── kilo.py # Kilo OAuth2 auth (was kilo_oauth2.py)
│ └── providers/ # Provider handlers (split into individual modules)
│ ├── __init__.py # Re-exports, PROVIDER_HANDLERS, get_provider_handler()
│ ├── base.py # BaseProviderHandler, AnthropicFormatConverter, AdaptiveRateLimiter
│ ├── google.py # GoogleProviderHandler
│ ├── openai.py # OpenAIProviderHandler
│ ├── anthropic.py # AnthropicProviderHandler
│ ├── claude.py # ClaudeProviderHandler (OAuth2-based)
│ ├── kilo.py # KiloProviderHandler (OAuth2-based)
│ ├── ollama.py # OllamaProviderHandler
│ └── kiro/ # Kiro provider (subpackage)
│ ├── __init__.py
│ ├── handler.py
│ ├── converters.py
│ ├── converters_openai.py
│ ├── models.py
│ ├── parsers.py
│ └── utils.py
├── config/ # Configuration files directory
│ ├── providers.json # Default provider configurations
│ └── rotations.json # Default rotation configurations
......@@ -63,7 +84,6 @@ geminiproxy/
├── start_proxy.sh # Development start script
├── aisbf.sh # Alternative start script
├── requirements.txt # Python dependencies
├── INSTALL.md # Installation guide
├── PYPI.md # PyPI publishing guide
├── DOCUMENTATION.md # Complete API documentation
└── README.md # Project documentation
......@@ -95,14 +115,23 @@ Pydantic models for data validation:
- `Provider` - Provider information
- `ErrorTracking` - Error tracking data
### aisbf/providers.py
Provider handler implementations:
- `BaseProviderHandler` - Base class with rate limiting and error tracking
- `GoogleProviderHandler` - Google GenAI integration
- `OpenAIProviderHandler` - OpenAI API integration
- `AnthropicProviderHandler` - Anthropic API integration
- `OllamaProviderHandler` - Ollama local integration
- `get_provider_handler()` - Factory function to get appropriate handler
### aisbf/providers/ (package)
Provider handler implementations, split into individual modules:
- `base.py` - `BaseProviderHandler`, `AnthropicFormatConverter`, `AdaptiveRateLimiter`, shared utilities
- `google.py` - `GoogleProviderHandler` - Google GenAI integration
- `openai.py` - `OpenAIProviderHandler` - OpenAI API integration
- `anthropic.py` - `AnthropicProviderHandler` - Anthropic API key integration
- `claude.py` - `ClaudeProviderHandler` - Claude OAuth2 integration
- `kilo.py` - `KiloProviderHandler` - Kilo Gateway OAuth2 integration
- `ollama.py` - `OllamaProviderHandler` - Ollama local integration
- `kiro/` - `KiroProviderHandler` - Kiro/Amazon Q Developer integration (subpackage)
- `__init__.py` - Re-exports all handlers, `PROVIDER_HANDLERS` dict, `get_provider_handler()` factory
### aisbf/auth/ (package)
Authentication modules:
- `kiro.py` - `KiroAuthManager` for Kiro/Amazon Q Developer auth
- `claude.py` - `ClaudeAuth` for Claude OAuth2 PKCE flow (was `aisbf/claude_auth.py`)
- `kilo.py` - `KiloOAuth2` for Kilo Device Authorization Grant (was `aisbf/kilo_oauth2.py`)
### aisbf/handlers.py
Request handling logic:
......@@ -327,8 +356,8 @@ When making changes:
## Common Tasks
### Adding a New Provider
1. Create handler class in `aisbf/providers.py` inheriting from `BaseProviderHandler`
2. Add to `PROVIDER_HANDLERS` dictionary
1. Create a new file `aisbf/providers/<provider_name>.py` with a handler class inheriting from `BaseProviderHandler`
2. Import and add the handler to `PROVIDER_HANDLERS` in `aisbf/providers/__init__.py`
3. Add provider configuration to `config/providers.json`
### Configuration Architecture
......@@ -538,8 +567,8 @@ Claude Code is an OAuth2-based authentication method for accessing Claude models
- Supports all Claude features: streaming, tools, vision, extended thinking
**Integration Architecture:**
- [`ClaudeAuth`](aisbf/claude_auth.py) class handles OAuth2 PKCE flow
- [`ClaudeProviderHandler`](aisbf/providers.py) in [`aisbf/providers.py`](aisbf/providers.py) manages API requests
- [`ClaudeAuth`](aisbf/auth/claude.py) class handles OAuth2 PKCE flow
- [`ClaudeProviderHandler`](aisbf/providers/claude.py) in [`aisbf/providers/claude.py`](aisbf/providers/claude.py) manages API requests
- Supports all standard AISBF features: streaming, tools, rate limiting, error tracking
**Configuration:**
......@@ -702,6 +731,18 @@ This AI.PROMPT file is automatically updated when significant changes are made t
### Recent Updates
**2026-04-03 - Provider Module Refactoring (Phase 2)**
- Split monolithic `aisbf/providers/__init__.py` (301K chars) into individual module files
- Created `aisbf/providers/base.py` with shared utilities: `BaseProviderHandler`, `AnthropicFormatConverter`, `AdaptiveRateLimiter`, `AISBF_DEBUG`
- Created individual handler files: `google.py`, `openai.py`, `anthropic.py`, `claude.py`, `kilo.py`, `ollama.py`
- Moved auth modules: `aisbf/claude_auth.py` → `aisbf/auth/claude.py`, `aisbf/kilo_oauth2.py` → `aisbf/auth/kilo.py`
- Updated `aisbf/auth/__init__.py` to export `ClaudeAuth` and `KiloOAuth2`
- Rewrote `aisbf/providers/__init__.py` as slim re-export module with `PROVIDER_HANDLERS` dict and `get_provider_handler()` factory
- Updated `aisbf/__init__.py` to import new handler classes (`ClaudeProviderHandler`, `KiloProviderHandler`)
- Updated `pyproject.toml` and `setup.py` with new file paths
- All existing `from aisbf.providers import X` imports continue to work (backward compatible)
- Deleted old `aisbf/claude_auth.py` and `aisbf/kilo_oauth2.py` (now in `aisbf/auth/`)
**2026-03-23 - TOR Hidden Service Support**
- Added full TOR hidden service support for exposing AISBF over TOR network
- Created aisbf/tor.py module with TorHiddenService class for managing TOR connections
......
......@@ -39,12 +39,16 @@ from .providers import (
GoogleProviderHandler,
OpenAIProviderHandler,
AnthropicProviderHandler,
ClaudeProviderHandler,
KiloProviderHandler,
OllamaProviderHandler,
get_provider_handler,
PROVIDER_HANDLERS
)
from .providers.kiro import KiroProviderHandler
from .auth.kiro import KiroAuthManager
from .auth.claude import ClaudeAuth
from .auth.kilo import KiloOAuth2
from .handlers import RequestHandler, RotationHandler, AutoselectHandler
from .utils import count_messages_tokens, split_messages_into_chunks, get_max_request_tokens_for_model
......@@ -73,11 +77,15 @@ __all__ = [
"OpenAIProviderHandler",
"AnthropicProviderHandler",
"OllamaProviderHandler",
"ClaudeProviderHandler",
"KiloProviderHandler",
"KiroProviderHandler",
"get_provider_handler",
"PROVIDER_HANDLERS",
# Auth
"KiroAuthManager",
"ClaudeAuth",
"KiloOAuth2",
# Handlers
"RequestHandler",
"RotationHandler",
......
......@@ -22,8 +22,12 @@ Why did the programmer quit his job? Because he didn't get arrays!
"""
from .kiro import KiroAuthManager, AuthType
from .claude import ClaudeAuth
from .kilo import KiloOAuth2
__all__ = [
"KiroAuthManager",
"AuthType",
"ClaudeAuth",
"KiloOAuth2",
]
This source diff could not be displayed because it is too large. You can view the blob instead.
"""
Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
AISBF - AI Service Broker Framework || AI Should Be Free
Anthropic provider handler (API key-based).
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Why did the programmer quit his job? Because he didn't get arrays!
"""
import time
from typing import Dict, List, Optional, Union
from anthropic import Anthropic
from ..models import Model
from ..config import config
from ..utils import count_messages_tokens
from .base import BaseProviderHandler, AnthropicFormatConverter, AISBF_DEBUG
class AnthropicProviderHandler(BaseProviderHandler):
def __init__(self, provider_id: str, api_key: str):
super().__init__(provider_id, api_key)
self.client = Anthropic(api_key=api_key)
async def handle_request(self, model: str, messages: List[Dict], max_tokens: Optional[int] = None,
temperature: Optional[float] = 1.0, stream: Optional[bool] = False,
tools: Optional[List[Dict]] = None, tool_choice: Optional[Union[str, Dict]] = None) -> Dict:
if self.is_rate_limited():
raise Exception("Provider rate limited")
try:
import logging
logging.info(f"AnthropicProviderHandler: Handling request for model {model}")
if AISBF_DEBUG:
logging.info(f"AnthropicProviderHandler: Messages: {messages}")
else:
logging.info(f"AnthropicProviderHandler: Messages count: {len(messages)}")
# Apply rate limiting
await self.apply_rate_limit()
# Check if native caching is enabled for this provider
provider_config = config.providers.get(self.provider_id)
enable_native_caching = getattr(provider_config, 'enable_native_caching', False)
min_cacheable_tokens = getattr(provider_config, 'min_cacheable_tokens', 1000)
logging.info(f"AnthropicProviderHandler: Native caching enabled: {enable_native_caching}")
if enable_native_caching:
logging.info(f"AnthropicProviderHandler: Min cacheable tokens: {min_cacheable_tokens}")
# Convert OpenAI messages to Anthropic format
system_message = None
anthropic_messages = []
for msg in messages:
role = msg.get('role')
content = msg.get('content')
if role == 'system':
system_message = content
logging.info(f"AnthropicProviderHandler: Extracted system message ({len(content) if content else 0} chars)")
elif role == 'tool':
tool_call_id = msg.get('tool_call_id', msg.get('name', 'unknown'))
tool_result_block = {
'type': 'tool_result',
'tool_use_id': tool_call_id,
'content': content or ""
}
if anthropic_messages and anthropic_messages[-1]['role'] == 'user':
last_content = anthropic_messages[-1]['content']
if isinstance(last_content, str):
anthropic_messages[-1]['content'] = [
{'type': 'text', 'text': last_content},
tool_result_block
]
elif isinstance(last_content, list):
anthropic_messages[-1]['content'].append(tool_result_block)
logging.info(f"AnthropicProviderHandler: Appended tool_result to existing user message")
else:
anthropic_messages.append({
'role': 'user',
'content': [tool_result_block]
})
logging.info(f"AnthropicProviderHandler: Created new user message with tool_result")
elif role == 'assistant':
tool_calls = msg.get('tool_calls')
if tool_calls:
content_blocks = []
if content and isinstance(content, str) and content.strip():
content_blocks.append({'type': 'text', 'text': content})
elif content and isinstance(content, list):
content_blocks.extend(content)
import json as _json
for tc in tool_calls:
tool_id = tc.get('id', f"toolu_{len(content_blocks)}")
function = tc.get('function', {})
tool_name = function.get('name', '')
arguments = function.get('arguments', {})
if isinstance(arguments, str):
try:
arguments = _json.loads(arguments)
except _json.JSONDecodeError:
logging.warning(f"AnthropicProviderHandler: Failed to parse tool arguments: {arguments}")
arguments = {}
content_blocks.append({
'type': 'tool_use',
'id': tool_id,
'name': tool_name,
'input': arguments
})
logging.info(f"AnthropicProviderHandler: Converted tool_call to tool_use: {tool_name}")
if content_blocks:
anthropic_messages.append({
'role': 'assistant',
'content': content_blocks
})
else:
if content is not None:
anthropic_messages.append({
'role': 'assistant',
'content': content
})
else:
logging.info(f"AnthropicProviderHandler: Skipping assistant message with None content")
elif role == 'user':
if isinstance(content, list):
content_blocks = []
for block in content:
if isinstance(block, dict):
block_type = block.get('type', '')
if block_type == 'text':
content_blocks.append(block)
elif block_type == 'image_url':
image_url_obj = block.get('image_url', {})
url = image_url_obj.get('url', '') if isinstance(image_url_obj, dict) else ''
if url.startswith('data:'):
try:
header, data = url.split(',', 1)
media_type = header.split(';')[0].replace('data:', '')
content_blocks.append({
'type': 'image',
'source': {
'type': 'base64',
'media_type': media_type,
'data': data
}
})
except (ValueError, IndexError) as e:
logging.warning(f"AnthropicProviderHandler: Failed to parse data URL: {e}")
elif url.startswith(('http://', 'https://')):
content_blocks.append({
'type': 'image',
'source': {
'type': 'url',
'url': url
}
})
else:
content_blocks.append(block)
elif isinstance(block, str):
content_blocks.append({'type': 'text', 'text': block})
anthropic_messages.append({
'role': 'user',
'content': content_blocks if content_blocks else content or ""
})
else:
anthropic_messages.append({
'role': 'user',
'content': content or ""
})
else:
logging.warning(f"AnthropicProviderHandler: Unknown message role '{role}', treating as user")
anthropic_messages.append({
'role': 'user',
'content': content or ""
})
logging.info(f"AnthropicProviderHandler: Converted {len(messages)} OpenAI messages to {len(anthropic_messages)} Anthropic messages")
if system_message:
logging.info(f"AnthropicProviderHandler: System message extracted ({len(system_message)} chars)")
# Apply cache_control if native caching is enabled
if enable_native_caching:
cumulative_tokens = 0
for i, msg in enumerate(anthropic_messages):
message_tokens = count_messages_tokens([{'role': msg['role'], 'content': msg['content'] if isinstance(msg['content'], str) else str(msg['content'])}], model)
cumulative_tokens += message_tokens
if i < len(anthropic_messages) - 2 and cumulative_tokens >= min_cacheable_tokens:
content = msg.get('content')
if isinstance(content, str) and content.strip():
msg['content'] = [
{
'type': 'text',
'text': content,
'cache_control': {'type': 'ephemeral'}
}
]
elif isinstance(content, list) and content:
content[-1]['cache_control'] = {'type': 'ephemeral'}
logging.info(f"AnthropicProviderHandler: Applied cache_control to message {i} ({message_tokens} tokens, cumulative: {cumulative_tokens})")
# Also apply cache_control to system message if present
if system_message:
system_message_param = [{
'type': 'text',
'text': system_message,
'cache_control': {'type': 'ephemeral'}
}]
else:
system_message_param = None
else:
system_message_param = system_message
# Convert OpenAI tools to Anthropic format
anthropic_tools = None
if tools:
anthropic_tools = []
for tool in tools:
if tool.get("type") == "function":
function = tool.get("function", {})
anthropic_tools.append({
"name": function.get("name", ""),
"description": function.get("description", ""),
"input_schema": function.get("parameters", {})
})
logging.info(f"AnthropicProviderHandler: Converted tool to Anthropic format: {function.get('name')}")
if not anthropic_tools:
anthropic_tools = None
# Convert OpenAI tool_choice to Anthropic format
anthropic_tool_choice = None
if tool_choice and anthropic_tools:
if isinstance(tool_choice, str):
if tool_choice == "auto":
anthropic_tool_choice = {"type": "auto"}
elif tool_choice == "required":
anthropic_tool_choice = {"type": "any"}
elif tool_choice == "none":
anthropic_tool_choice = None
elif isinstance(tool_choice, dict):
if tool_choice.get("type") == "function":
func_name = tool_choice.get("function", {}).get("name")
if func_name:
anthropic_tool_choice = {"type": "tool", "name": func_name}
# Build API call parameters
api_params = {
'model': model,
'messages': anthropic_messages,
'max_tokens': max_tokens or 4096,
'temperature': temperature,
}
if system_message_param:
api_params['system'] = system_message_param
if anthropic_tools:
api_params['tools'] = anthropic_tools
if anthropic_tool_choice:
api_params['tool_choice'] = anthropic_tool_choice
if AISBF_DEBUG:
import json as _json
logging.info(f"=== ANTHROPIC API REQUEST PAYLOAD ===")
debug_params = dict(api_params)
logging.info(f"Request keys: {list(debug_params.keys())}")
logging.info(f"Model: {debug_params.get('model')}")
logging.info(f"Messages count: {len(debug_params.get('messages', []))}")
logging.info(f"Tools count: {len(debug_params.get('tools', []) or [])}")
logging.info(f"Tool choice: {debug_params.get('tool_choice')}")
logging.info(f"System: {'present' if debug_params.get('system') else 'none'}")
logging.info(f"Full payload: {_json.dumps(debug_params, indent=2, default=str)}")
logging.info(f"=== END ANTHROPIC API REQUEST PAYLOAD ===")
response = self.client.messages.create(**api_params)
logging.info(f"AnthropicProviderHandler: Response received: {response}")
self.record_success()
# Dump raw response if AISBF_DEBUG is enabled
if AISBF_DEBUG:
logging.info(f"=== RAW ANTHROPIC RESPONSE ===")
logging.info(f"Raw response type: {type(response)}")
logging.info(f"Raw response: {response}")
logging.info(f"Raw response dir: {dir(response)}")
logging.info(f"=== END RAW ANTHROPIC RESPONSE ===")
logging.info(f"=== ANTHROPIC RESPONSE PARSING START ===")
logging.info(f"Response type: {type(response)}")
logging.info(f"Response attributes: {dir(response)}")
content_text = ""
tool_calls = None
try:
if hasattr(response, 'content') and response.content:
logging.info(f"Response has 'content' attribute")
logging.info(f"Content blocks: {response.content}")
logging.info(f"Content blocks count: {len(response.content)}")
text_parts = []
openai_tool_calls = []
call_id = 0
for idx, block in enumerate(response.content):
logging.info(f"Processing block {idx}")
logging.info(f"Block type: {type(block)}")
logging.info(f"Block attributes: {dir(block)}")
if hasattr(block, 'text') and block.text:
logging.info(f"Block {idx} has 'text' attribute")
text_parts.append(block.text)
logging.info(f"Block {idx} text length: {len(block.text)}")
if hasattr(block, 'type') and block.type == 'tool_use':
logging.info(f"Block {idx} is a tool_use block")
logging.info(f"Tool use block: {block}")
try:
import json as _json_tc
raw_input = block.input if hasattr(block, 'input') else {}
arguments_str = _json_tc.dumps(raw_input) if isinstance(raw_input, dict) else str(raw_input)
openai_tool_call = {
"id": block.id if hasattr(block, 'id') else f"call_{call_id}",
"type": "function",
"function": {
"name": block.name if hasattr(block, 'name') else "",
"arguments": arguments_str
}
}
openai_tool_calls.append(openai_tool_call)
call_id += 1
logging.info(f"Converted tool_use to OpenAI format: {openai_tool_call}")
except Exception as e:
logging.error(f"Error converting tool_use: {e}", exc_info=True)
content_text = "\n".join(text_parts)
logging.info(f"Combined text length: {len(content_text)}")
logging.info(f"Combined text (first 200 chars): {content_text[:200] if content_text else 'None'}")
if openai_tool_calls:
tool_calls = openai_tool_calls
logging.info(f"Total tool calls: {len(tool_calls)}")
for tc in tool_calls:
logging.info(f" - {tc}")
else:
logging.info(f"No tool calls found")
else:
logging.warning(f"Response does NOT have 'content' attribute or content is empty")
stop_reason_map = {
'end_turn': 'stop',
'max_tokens': 'length',
'stop_sequence': 'stop',
'tool_use': 'tool_calls'
}
stop_reason = getattr(response, 'stop_reason', 'stop')
finish_reason = stop_reason_map.get(stop_reason, 'stop')
logging.info(f"Anthropic stop_reason: {stop_reason}")
logging.info(f"Mapped finish_reason: {finish_reason}")
except Exception as e:
logging.error(f"AnthropicProviderHandler: Exception during response parsing: {e}", exc_info=True)
content_text = ""
logging.info(f"=== ANTHROPIC RESPONSE PARSING END ===")
# Build OpenAI-style response
openai_response = {
"id": f"anthropic-{model}-{int(time.time())}",
"object": "chat.completion",
"created": int(time.time()),
"model": f"{self.provider_id}/{model}",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": content_text if content_text else None
},
"finish_reason": finish_reason
}],
"usage": {
"prompt_tokens": getattr(getattr(response, "usage", None), "input_tokens", 0) or 0,
"completion_tokens": getattr(getattr(response, "usage", None), "output_tokens", 0) or 0,
"total_tokens": (getattr(getattr(response, "usage", None), "input_tokens", 0) or 0) + (getattr(getattr(response, "usage", None), "output_tokens", 0) or 0)
}
}
if tool_calls:
openai_response["choices"][0]["message"]["tool_calls"] = tool_calls
openai_response["choices"][0]["message"]["content"] = None
logging.info(f"Added tool_calls to response message")
logging.info(f"=== FINAL ANTHROPIC RESPONSE STRUCTURE ===")
logging.info(f"Response id: {openai_response['id']}")
logging.info(f"Response model: {openai_response['model']}")
logging.info(f"Response choices[0] message content: {openai_response['choices'][0]['message']['content']}")
logging.info(f"Response choices[0] message tool_calls: {openai_response['choices'][0]['message'].get('tool_calls')}")
logging.info(f"Response choices[0] finish_reason: {openai_response['choices'][0]['finish_reason']}")
logging.info(f"Response usage: {openai_response['usage']}")
logging.info(f"=== END FINAL ANTHROPIC RESPONSE STRUCTURE ===")
logging.info(f"AnthropicProviderHandler: Returning response dict (no validation)")
logging.info(f"Response dict keys: {openai_response.keys()}")
if AISBF_DEBUG:
logging.info(f"=== FINAL ANTHROPIC RESPONSE DICT ===")
logging.info(f"Final response: {openai_response}")
logging.info(f"=== END FINAL ANTHROPIC RESPONSE DICT ===")
return openai_response
except Exception as e:
import logging
logging.error(f"AnthropicProviderHandler: Error: {str(e)}", exc_info=True)
self.record_failure()
raise e
async def get_models(self) -> List[Model]:
"""
Return list of available Anthropic models.
"""
try:
import logging
logging.info("=" * 80)
logging.info("AnthropicProviderHandler: Starting model list retrieval")
logging.info("=" * 80)
await self.apply_rate_limit()
try:
logging.info("AnthropicProviderHandler: Attempting to fetch models from API...")
logging.info("AnthropicProviderHandler: Note: Anthropic doesn't currently provide a public models endpoint")
logging.info("AnthropicProviderHandler: Checking if endpoint is now available...")
response = self.client.models.list()
if response:
logging.info(f"AnthropicProviderHandler: ✓ API call successful!")
logging.info(f"AnthropicProviderHandler: Retrieved models from API")
models = [Model(id=model.id, name=model.id, provider_id=self.provider_id) for model in response]
for model in models:
logging.info(f"AnthropicProviderHandler: - {model.id}")
logging.info("=" * 80)
logging.info(f"AnthropicProviderHandler: ✓ SUCCESS - Returning {len(models)} models from API")
logging.info(f"AnthropicProviderHandler: Source: Dynamic API retrieval")
logging.info("=" * 80)
return models
except AttributeError as attr_error:
logging.info(f"AnthropicProviderHandler: ✗ API endpoint not available")
logging.info(f"AnthropicProviderHandler: Error: {type(attr_error).__name__} - {str(attr_error)}")
logging.info("AnthropicProviderHandler: Reason: Anthropic SDK doesn't expose models.list() method")
logging.info("AnthropicProviderHandler: Action: Falling back to static list")
except Exception as api_error:
logging.warning(f"AnthropicProviderHandler: ✗ Exception during API call")
logging.warning(f"AnthropicProviderHandler: Error type: {type(api_error).__name__}")
logging.warning(f"AnthropicProviderHandler: Error message: {str(api_error)}")
logging.warning("AnthropicProviderHandler: Action: Falling back to static list")
if AISBF_DEBUG:
logging.warning(f"AnthropicProviderHandler: Full traceback:", exc_info=True)
logging.info("-" * 80)
logging.info("AnthropicProviderHandler: Using static fallback model list")
logging.info("AnthropicProviderHandler: Note: This is the expected behavior for Anthropic provider")
static_models = [
Model(id="claude-3-7-sonnet-20250219", name="Claude 3.7 Sonnet", provider_id=self.provider_id, context_size=200000, context_length=200000),
Model(id="claude-3-5-sonnet-20241022", name="Claude 3.5 Sonnet", provider_id=self.provider_id, context_size=200000, context_length=200000),
Model(id="claude-3-5-haiku-20241022", name="Claude 3.5 Haiku", provider_id=self.provider_id, context_size=200000, context_length=200000),
Model(id="claude-3-opus-20240229", name="Claude 3 Opus", provider_id=self.provider_id, context_size=200000, context_length=200000),
Model(id="claude-3-haiku-20240307", name="Claude 3 Haiku", provider_id=self.provider_id, context_size=200000, context_length=200000),
Model(id="claude-3-sonnet-20240229", name="Claude 3 Sonnet", provider_id=self.provider_id, context_size=200000, context_length=200000),
]
for model in static_models:
logging.info(f"AnthropicProviderHandler: - {model.id} ({model.name})")
logging.info("=" * 80)
logging.info(f"AnthropicProviderHandler: ✓ Returning {len(static_models)} models from static list")
logging.info(f"AnthropicProviderHandler: Source: Static fallback configuration")
logging.info("=" * 80)
return static_models
except Exception as e:
import logging
logging.error("=" * 80)
logging.error(f"AnthropicProviderHandler: ✗ FATAL ERROR getting models: {str(e)}")
logging.error("=" * 80)
logging.error(f"AnthropicProviderHandler: Error details:", exc_info=True)
raise e
"""
Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
AISBF - AI Service Broker Framework || AI Should Be Free
Base provider handler and shared utilities for all provider implementations.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Why did the programmer quit his job? Because he didn't get arrays!
"""
import asyncio
import time
import os
import random
from typing import Dict, List, Optional, Union
from ..models import Provider, Model, ErrorTracking
from ..config import config
from ..utils import count_messages_tokens
from ..database import get_database
from ..batching import get_request_batcher
# Check if debug mode is enabled
AISBF_DEBUG = os.environ.get('AISBF_DEBUG', '').lower() in ('true', '1', 'yes')
class AnthropicFormatConverter:
"""
Shared utility class for converting between OpenAI and Anthropic message formats.
Used by both AnthropicProviderHandler and ClaudeProviderHandler.
All methods are static to allow usage without instantiation.
"""
# Anthropic stop_reason → OpenAI finish_reason mapping
STOP_REASON_MAP = {
'end_turn': 'stop',
'max_tokens': 'length',
'stop_sequence': 'stop',
'tool_use': 'tool_calls'
}
@staticmethod
def sanitize_tool_call_id(tool_call_id: str) -> str:
"""Sanitize tool call ID for Anthropic API (alphanumeric, underscore, hyphen only)."""
import re
return re.sub(r'[^a-zA-Z0-9_-]', '_', tool_call_id)
@staticmethod
def filter_empty_content(content) -> Union[str, list, None]:
"""Filter empty content from messages for Anthropic API compatibility."""
if content is None:
return None
if isinstance(content, str):
return None if content.strip() == "" else content
if isinstance(content, list):
filtered = []
for block in content:
if isinstance(block, dict):
if block.get('type') == 'text':
text = block.get('text', '')
if text and text.strip():
filtered.append(block)
else:
filtered.append(block)
else:
filtered.append(block)
return filtered if filtered else None
return content
@staticmethod
def extract_images_from_content(content) -> list:
"""
Convert OpenAI image_url content blocks to Anthropic image source format.
Handles:
- data:image/jpeg;base64,... → {"type": "image", "source": {"type": "base64", ...}}
- https://... → {"type": "image", "source": {"type": "url", ...}}
"""
import logging
logger = logging.getLogger(__name__)
if not isinstance(content, list):
return []
images = []
max_image_size = 5 * 1024 * 1024 # 5MB
for block in content:
if not isinstance(block, dict):
continue
if block.get('type') != 'image_url':
continue
image_url_obj = block.get('image_url', {})
url = image_url_obj.get('url', '') if isinstance(image_url_obj, dict) else ''
if not url:
continue
if url.startswith('data:'):
try:
header, data = url.split(',', 1)
media_type = header.split(';')[0].replace('data:', '')
if len(data) > max_image_size:
logger.warning(f"Image too large ({len(data)} bytes), skipping")
continue
images.append({
'type': 'image',
'source': {'type': 'base64', 'media_type': media_type, 'data': data}
})
except (ValueError, IndexError) as e:
logger.warning(f"Failed to parse data URL: {e}")
elif url.startswith(('http://', 'https://')):
images.append({
'type': 'image',
'source': {'type': 'url', 'url': url}
})
elif block.get('type') == 'image' and 'source' in block:
images.append(block)
return images
@staticmethod
def convert_messages_to_anthropic(messages: list, sanitize_ids: bool = True) -> tuple:
"""
Convert OpenAI messages to Anthropic format.
Handles:
- System message extraction (separate 'system' parameter)
- Tool role → user message with tool_result content blocks
- Assistant tool_calls → tool_use content blocks
- Multimodal content (images)
- Empty content filtering
Args:
messages: OpenAI format messages
sanitize_ids: Whether to sanitize tool call IDs
Returns:
Tuple of (system_message: str|None, anthropic_messages: list)
"""
import logging
import json
system_message = None
anthropic_messages = []
for msg in messages:
role = msg.get('role')
content = msg.get('content')
if role == 'system':
system_message = content
logging.info(f"AnthropicFormatConverter: Extracted system message ({len(content) if content else 0} chars)")
elif role == 'tool':
tool_call_id = msg.get('tool_call_id', msg.get('name', 'unknown'))
tool_result_block = {
'type': 'tool_result',
'tool_use_id': tool_call_id,
'content': content or ""
}
if anthropic_messages and anthropic_messages[-1]['role'] == 'user':
last_content = anthropic_messages[-1]['content']
if isinstance(last_content, str):
anthropic_messages[-1]['content'] = [
{'type': 'text', 'text': last_content},
tool_result_block
]
elif isinstance(last_content, list):
anthropic_messages[-1]['content'].append(tool_result_block)
else:
anthropic_messages.append({
'role': 'user',
'content': [tool_result_block]
})
elif role == 'assistant':
tool_calls = msg.get('tool_calls')
if tool_calls:
content_blocks = []
filtered = AnthropicFormatConverter.filter_empty_content(content)
if filtered:
if isinstance(filtered, str):
content_blocks.append({'type': 'text', 'text': filtered})
elif isinstance(filtered, list):
content_blocks.extend(filtered)
for tc in tool_calls:
raw_id = tc.get('id', f"toolu_{len(content_blocks)}")
tool_id = AnthropicFormatConverter.sanitize_tool_call_id(raw_id) if sanitize_ids else raw_id
function = tc.get('function', {})
arguments = function.get('arguments', {})
if isinstance(arguments, str):
try:
arguments = json.loads(arguments)
except json.JSONDecodeError:
arguments = {}
content_blocks.append({
'type': 'tool_use',
'id': tool_id,
'name': function.get('name', ''),
'input': arguments
})
if content_blocks:
anthropic_messages.append({
'role': 'assistant',
'content': content_blocks
})
else:
filtered = AnthropicFormatConverter.filter_empty_content(content)
if filtered is None:
continue
if isinstance(filtered, list):
text_parts = []
for block in filtered:
if isinstance(block, dict):
text_parts.append(block.get('text', ''))
elif isinstance(block, str):
text_parts.append(block)
content_str = '\n'.join(text_parts)
else:
content_str = filtered or ""
if content_str:
anthropic_messages.append({
'role': 'assistant',
'content': content_str
})
elif role == 'user':
if isinstance(content, list):
content_blocks = []
images = AnthropicFormatConverter.extract_images_from_content(content)
for block in content:
if isinstance(block, dict):
btype = block.get('type', '')
if btype == 'text':
content_blocks.append(block)
elif btype not in ('image_url', 'image'):
content_blocks.append(block)
elif isinstance(block, str):
content_blocks.append({'type': 'text', 'text': block})
content_blocks.extend(images)
anthropic_messages.append({
'role': 'user',
'content': content_blocks if content_blocks else content or ""
})
else:
anthropic_messages.append({
'role': 'user',
'content': content or ""
})
else:
logging.warning(f"AnthropicFormatConverter: Unknown role '{role}', treating as user")
anthropic_messages.append({
'role': 'user',
'content': content or ""
})
logging.info(f"AnthropicFormatConverter: Converted {len(messages)} OpenAI → {len(anthropic_messages)} Anthropic messages")
return system_message, anthropic_messages
@staticmethod
def convert_tools_to_anthropic(tools: list) -> Optional[list]:
"""
Convert OpenAI tools to Anthropic format with schema normalization.
Normalizes:
- ["string", "null"] → "string"
- Removes additionalProperties: false
- Cleans up required array for nullable fields
"""
import logging
if not tools:
return None
def normalize_schema(schema):
if not isinstance(schema, dict):
return schema
result = {}
for key, value in schema.items():
if key == "type" and isinstance(value, list):
non_null = [t for t in value if t != "null"]
result[key] = non_null[0] if len(non_null) == 1 else (non_null if non_null else "string")
elif key == "properties" and isinstance(value, dict):
result[key] = {k: normalize_schema(v) for k, v in value.items()}
elif key == "items" and isinstance(value, dict):
result[key] = normalize_schema(value)
elif key == "additionalProperties" and value is False:
continue
elif key == "required" and isinstance(value, list):
props = schema.get("properties", {})
cleaned = [f for f in value if f in props and not (isinstance(props.get(f, {}), dict) and isinstance(props[f].get("type"), list) and "null" in props[f]["type"])]
if cleaned:
result[key] = cleaned
else:
result[key] = value
return result
anthropic_tools = []
for tool in tools:
if tool.get("type") == "function":
function = tool.get("function", {})
anthropic_tools.append({
"name": function.get("name", ""),
"description": function.get("description", ""),
"input_schema": normalize_schema(function.get("parameters", {}))
})
logging.info(f"AnthropicFormatConverter: Converted tool: {function.get('name')}")
return anthropic_tools if anthropic_tools else None
@staticmethod
def convert_tool_choice_to_anthropic(tool_choice) -> Optional[dict]:
"""
Convert OpenAI tool_choice to Anthropic format.
"auto" → {"type": "auto"}
"none" → None
"required" → {"type": "any"}
{"type": "function", "function": {"name": "X"}} → {"type": "tool", "name": "X"}
"""
import logging
if not tool_choice:
return None
if isinstance(tool_choice, str):
if tool_choice == "auto":
return {"type": "auto"}
elif tool_choice == "none":
return None
elif tool_choice == "required":
return {"type": "any"}
else:
logging.warning(f"Unknown tool_choice: {tool_choice}")
return {"type": "auto"}
if isinstance(tool_choice, dict):
if tool_choice.get("type") == "function":
name = tool_choice.get("function", {}).get("name")
return {"type": "tool", "name": name} if name else {"type": "auto"}
return tool_choice
return {"type": "auto"}
@staticmethod
def convert_anthropic_response_to_openai(response_data: dict, provider_id: str, model: str) -> dict:
"""
Convert Anthropic API response (dict) to OpenAI chat completion format.
Handles text blocks, tool_use blocks, thinking blocks, usage metadata, stop reasons.
"""
import json
import logging
logger = logging.getLogger(__name__)
content_text = ""
tool_calls = []
thinking_text = ""
for block in response_data.get('content', []):
btype = block.get('type', '')
if btype == 'text':
content_text += block.get('text', '')
elif btype == 'tool_use':
tool_calls.append({
'id': block.get('id', f"call_{len(tool_calls)}"),
'type': 'function',
'function': {
'name': block.get('name', ''),
'arguments': json.dumps(block.get('input', {}))
}
})
elif btype == 'thinking':
thinking_text = block.get('thinking', '')
elif btype == 'redacted_thinking':
logger.debug("Found redacted_thinking block")
stop_reason = response_data.get('stop_reason', 'end_turn')
finish_reason = AnthropicFormatConverter.STOP_REASON_MAP.get(stop_reason, 'stop')
usage = response_data.get('usage', {})
input_tokens = usage.get('input_tokens', 0)
output_tokens = usage.get('output_tokens', 0)
cache_read = usage.get('cache_read_input_tokens', 0)
cache_creation = usage.get('cache_creation_input_tokens', 0)
openai_response = {
'id': f"{provider_id}-{model}-{int(time.time())}",
'object': 'chat.completion',
'created': int(time.time()),
'model': f'{provider_id}/{model}',
'choices': [{
'index': 0,
'message': {
'role': 'assistant',
'content': content_text if content_text else None
},
'finish_reason': finish_reason
}],
'usage': {
'prompt_tokens': input_tokens,
'completion_tokens': output_tokens,
'total_tokens': input_tokens + output_tokens,
'prompt_tokens_details': {'cached_tokens': cache_read, 'audio_tokens': 0},
'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0}
}
}
if tool_calls:
openai_response['choices'][0]['message']['tool_calls'] = tool_calls
if thinking_text:
openai_response['choices'][0]['message']['provider_options'] = {
'anthropic': {'thinking': thinking_text}
}
return openai_response
@staticmethod
def convert_anthropic_sdk_response_to_openai(response, provider_id: str, model: str) -> dict:
"""
Convert Anthropic SDK response object (with attributes) to OpenAI format.
"""
import json
import logging
logger = logging.getLogger(__name__)
content_text = ""
tool_calls = []
thinking_text = ""
for block in getattr(response, 'content', []):
btype = getattr(block, 'type', '')
if btype == 'text' or hasattr(block, 'text'):
content_text += getattr(block, 'text', '')
elif btype == 'tool_use':
raw_input = getattr(block, 'input', {})
tool_calls.append({
'id': getattr(block, 'id', f"call_{len(tool_calls)}"),
'type': 'function',
'function': {
'name': getattr(block, 'name', ''),
'arguments': json.dumps(raw_input) if isinstance(raw_input, dict) else str(raw_input)
}
})
elif btype == 'thinking':
thinking_text = getattr(block, 'thinking', '')
stop_reason = getattr(response, 'stop_reason', 'end_turn') or 'end_turn'
finish_reason = AnthropicFormatConverter.STOP_REASON_MAP.get(stop_reason, 'stop')
usage_obj = getattr(response, 'usage', None)
input_tokens = getattr(usage_obj, 'input_tokens', 0) or 0 if usage_obj else 0
output_tokens = getattr(usage_obj, 'output_tokens', 0) or 0 if usage_obj else 0
cache_read = getattr(usage_obj, 'cache_read_input_tokens', 0) or 0 if usage_obj else 0
cache_creation = getattr(usage_obj, 'cache_creation_input_tokens', 0) or 0 if usage_obj else 0
openai_response = {
'id': getattr(response, 'id', f"{provider_id}-{model}-{int(time.time())}"),
'object': 'chat.completion',
'created': int(time.time()),
'model': f'{provider_id}/{model}',
'choices': [{
'index': 0,
'message': {
'role': 'assistant',
'content': content_text if content_text else None
},
'finish_reason': finish_reason
}],
'usage': {
'prompt_tokens': input_tokens,
'completion_tokens': output_tokens,
'total_tokens': input_tokens + output_tokens,
'prompt_tokens_details': {'cached_tokens': cache_read, 'audio_tokens': 0},
'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0}
}
}
if tool_calls:
openai_response['choices'][0]['message']['tool_calls'] = tool_calls
if thinking_text:
openai_response['choices'][0]['message']['provider_options'] = {
'anthropic': {'thinking': thinking_text}
}
return openai_response
class AdaptiveRateLimiter:
"""
Adaptive Rate Limiter that learns optimal rate limits from 429 responses.
Features:
- Tracks 429 patterns per provider
- Implements exponential backoff with jitter for retries
- Learns optimal rate limits from historical 429 data
- Adds rate limit headroom (stays below limits)
- Gradually recovers rate limits after cooldown periods
"""
def __init__(self, provider_id: str, config: Dict = None):
self.provider_id = provider_id
# Configuration with defaults
self.enabled = config.get('enabled', True) if config else True
self.initial_rate_limit = config.get('initial_rate_limit', 0) if config else 0
self.learning_rate = config.get('learning_rate', 0.1) if config else 0.1
self.headroom_percent = config.get('headroom_percent', 10) if config else 10 # Stay 10% below learned limit
self.recovery_rate = config.get('recovery_rate', 0.05) if config else 0.05 # 5% recovery per successful request
self.max_rate_limit = config.get('max_rate_limit', 60) if config else 60 # Max 60 seconds between requests
self.min_rate_limit = config.get('min_rate_limit', 0.1) if config else 0.1 # Min 0.1 seconds between requests
self.backoff_base = config.get('backoff_base', 2) if config else 2
self.jitter_factor = config.get('jitter_factor', 0.25) if config else 0.25 # 25% jitter
self.history_window = config.get('history_window', 3600) if config else 3600 # 1 hour history window
self.consecutive_successes_for_recovery = config.get('consecutive_successes_for_recovery', 10) if config else 10
# Learned rate limit (starts with configured value)
self.current_rate_limit = self.initial_rate_limit
self.base_rate_limit = self.initial_rate_limit # Original configured limit
# 429 tracking
self._429_history = [] # List of (timestamp, wait_seconds) tuples
self._consecutive_429s = 0
self._consecutive_successes = 0
# Statistics
self.total_429_count = 0
self.total_requests = 0
self.last_429_time = None
def record_429(self, wait_seconds: int):
"""Record a 429 response and adjust rate limit accordingly."""
import logging
logger = logging.getLogger(__name__)
current_time = time.time()
# Record this 429 in history
self._429_history.append((current_time, wait_seconds))
self.total_429_count += 1
self._consecutive_429s += 1
self._consecutive_successes = 0
self.last_429_time = current_time
# Clean old history
self._cleanup_history()
# Calculate new rate limit using exponential backoff
# New limit = current_limit * backoff_base + wait_seconds from server
new_limit = self.current_rate_limit * self.backoff_base + wait_seconds
# Apply learning rate adjustment
new_limit = self.current_rate_limit + (new_limit - self.current_rate_limit) * self.learning_rate
# Apply headroom (stay below the limit)
new_limit = new_limit * (1 - self.headroom_percent / 100)
# Clamp to min/max
self.current_rate_limit = max(self.min_rate_limit, min(self.max_rate_limit, new_limit))
logger.info(f"[AdaptiveRateLimiter {self.provider_id}] 429 recorded: wait_seconds={wait_seconds}, "
f"new_rate_limit={self.current_rate_limit:.2f}s, consecutive_429s={self._consecutive_429s}")
def record_success(self):
"""Record a successful request and gradually recover rate limit."""
import logging
logger = logging.getLogger(__name__)
self.total_requests += 1
self._consecutive_successes += 1
self._consecutive_429s = 0
# Gradually recover rate limit after successful requests
if self._consecutive_successes >= self.consecutive_successes_for_recovery:
# Recovery: move back towards base rate limit
if self.current_rate_limit < self.base_rate_limit:
old_limit = self.current_rate_limit
self.current_rate_limit = self.current_rate_limit + (self.base_rate_limit - self.current_rate_limit) * self.recovery_rate
# Clamp to not exceed base
self.current_rate_limit = min(self.current_rate_limit, self.base_rate_limit)
if old_limit != self.current_rate_limit:
logger.info(f"[AdaptiveRateLimiter {self.provider_id}] Rate limit recovery: "
f"{old_limit:.2f}s -> {self.current_rate_limit:.2f}s")
# Reset consecutive successes counter after recovery
self._consecutive_successes = 0
def get_rate_limit(self) -> float:
"""Get the current adaptive rate limit."""
return self.current_rate_limit
def get_wait_time(self) -> float:
"""Get the wait time before next request based on adaptive rate limiting."""
if not self.enabled or self.current_rate_limit <= 0:
return 0
# Use current adaptive rate limit
return self.current_rate_limit
def calculate_backoff_with_jitter(self, attempt: int, base_wait: int = None) -> float:
"""
Calculate exponential backoff wait time with jitter.
Args:
attempt: Current retry attempt number (0-indexed)
base_wait: Optional base wait time from server response
Returns:
Wait time in seconds with jitter applied
"""
import logging
logger = logging.getLogger(__name__)
# Calculate exponential backoff
if base_wait is not None and base_wait > 0:
# Use server-provided wait time as base
wait_time = base_wait
else:
# Use exponential backoff: base * 2^attempt
wait_time = self.backoff_base ** attempt
# Apply jitter: random factor between (1 - jitter_factor) and (1 + jitter_factor)
jitter_multiplier = 1 + random.uniform(-self.jitter_factor, self.jitter_factor)
wait_time = wait_time * jitter_multiplier
# Clamp to reasonable limits (1 second to 300 seconds)
wait_time = max(1, min(300, wait_time))
logger.info(f"[AdaptiveRateLimiter {self.provider_id}] Backoff calculation: attempt={attempt}, "
f"base_wait={base_wait}, jitter_multiplier={jitter_multiplier:.2f}, "
f"final_wait={wait_time:.2f}s")
return wait_time
def _cleanup_history(self):
"""Remove old entries from 429 history."""
current_time = time.time()
cutoff_time = current_time - self.history_window
self._429_history = [(ts, ws) for ts, ws in self._429_history if ts > cutoff_time]
def get_stats(self) -> Dict:
"""Get rate limiter statistics."""
self._cleanup_history()
return {
'provider_id': self.provider_id,
'enabled': self.enabled,
'current_rate_limit': self.current_rate_limit,
'base_rate_limit': self.base_rate_limit,
'total_429_count': self.total_429_count,
'total_requests': self.total_requests,
'consecutive_429s': self._consecutive_429s,
'consecutive_successes': self._consecutive_successes,
'recent_429_count': len(self._429_history),
'last_429_time': self.last_429_time
}
def reset(self):
"""Reset the adaptive rate limiter to initial state."""
import logging
logger = logging.getLogger(__name__)
self.current_rate_limit = self.initial_rate_limit
self._429_history = []
self._consecutive_429s = 0
self._consecutive_successes = 0
self.total_429_count = 0
self.total_requests = 0
self.last_429_time = None
logger.info(f"[AdaptiveRateLimiter {self.provider_id}] Reset to initial state")
# Global adaptive rate limiters registry
_adaptive_rate_limiters: Dict[str, AdaptiveRateLimiter] = {}
def get_adaptive_rate_limiter(provider_id: str, config: Dict = None) -> AdaptiveRateLimiter:
"""Get or create an adaptive rate limiter for a provider."""
global _adaptive_rate_limiters
if provider_id not in _adaptive_rate_limiters:
_adaptive_rate_limiters[provider_id] = AdaptiveRateLimiter(provider_id, config)
return _adaptive_rate_limiters[provider_id]
def get_all_adaptive_rate_limiters() -> Dict[str, AdaptiveRateLimiter]:
"""Get all adaptive rate limiters."""
global _adaptive_rate_limiters
return _adaptive_rate_limiters
class BaseProviderHandler:
def __init__(self, provider_id: str, api_key: Optional[str] = None):
self.provider_id = provider_id
self.api_key = api_key
self.error_tracking = config.error_tracking[provider_id]
self.last_request_time = 0
self.rate_limit = config.providers[provider_id].rate_limit
# Add model-level rate limit tracking
self.model_last_request_time = {} # {model_name: timestamp}
# Token usage tracking for rate limits
self.token_usage = {} # {model_name: {"TPM": [], "TPH": [], "TPD": []}}
# Initialize batcher
self.batcher = get_request_batcher()
# Initialize adaptive rate limiter
adaptive_config = None
if config.aisbf and config.aisbf.adaptive_rate_limiting:
adaptive_config = config.aisbf.adaptive_rate_limiting.dict()
self.adaptive_limiter = get_adaptive_rate_limiter(provider_id, adaptive_config)
def parse_429_response(self, response_data: Union[Dict, str], headers: Dict = None) -> Optional[int]:
"""
Parse 429 rate limit response to extract wait time in seconds.
Checks multiple sources:
1. Retry-After header (seconds or HTTP date)
2. X-RateLimit-Reset header (Unix timestamp)
3. Response body fields (retry_after, reset_time, etc.)
4. X-RateLimit-* headers for auto-configuration
Returns:
Wait time in seconds, or None if cannot be determined
"""
import logging
import re
from email.utils import parsedate_to_datetime
from datetime import datetime, timezone
logger = logging.getLogger(__name__)
logger.info("=== PARSING 429 RATE LIMIT RESPONSE ===")
wait_seconds = None
rate_limit_headers = {} # Store rate limit headers for auto-configuration
# Check for rate limit headers (for auto-configuration)
if headers:
rate_limit_headers = {
'limit': headers.get('X-RateLimit-Limit') or headers.get('x-ratelimit-limit'),
'remaining': headers.get('X-RateLimit-Remaining') or headers.get('x-ratelimit-remaining'),
'reset': headers.get('X-RateLimit-Reset') or headers.get('x-ratelimit-reset'),
'reset_at': headers.get('X-RateLimit-Reset-After') or headers.get('x-ratelimit-reset-after')
}
logger.info(f"Rate limit headers found: {rate_limit_headers}")
# Check Retry-After header
if headers:
retry_after = headers.get('Retry-After') or headers.get('retry-after')
if retry_after:
logger.info(f"Found Retry-After header: {retry_after}")
try:
# Try parsing as integer (seconds)
wait_seconds = int(retry_after)
logger.info(f"Parsed Retry-After as seconds: {wait_seconds}")
except ValueError:
# Try parsing as HTTP date
try:
retry_date = parsedate_to_datetime(retry_after)
now = datetime.now(timezone.utc)
wait_seconds = int((retry_date - now).total_seconds())
logger.info(f"Parsed Retry-After as date, wait seconds: {wait_seconds}")
except Exception as e:
logger.warning(f"Failed to parse Retry-After header: {e}")
# Check X-RateLimit-Reset header (Unix timestamp)
if not wait_seconds:
reset_time = headers.get('X-RateLimit-Reset') or headers.get('x-ratelimit-reset')
if reset_time:
logger.info(f"Found X-RateLimit-Reset header: {reset_time}")
try:
reset_timestamp = int(reset_time)
now_timestamp = int(time.time())
wait_seconds = reset_timestamp - now_timestamp
logger.info(f"Calculated wait from reset timestamp: {wait_seconds} seconds")
except Exception as e:
logger.warning(f"Failed to parse X-RateLimit-Reset header: {e}")
# Check response body
if not wait_seconds and isinstance(response_data, dict):
logger.info(f"Checking response body for rate limit info: {response_data}")
# Common field names for retry/reset time
retry_fields = [
'retry_after', 'retryAfter', 'retry_after_seconds',
'wait_seconds', 'waitSeconds', 'retry_in'
]
reset_fields = [
'reset_time', 'resetTime', 'reset_at', 'resetAt',
'reset_timestamp', 'resetTimestamp'
]
# Check retry fields (direct seconds)
for field in retry_fields:
if field in response_data:
try:
wait_seconds = int(response_data[field])
logger.info(f"Found {field} in response body: {wait_seconds} seconds")
break
except (ValueError, TypeError) as e:
logger.warning(f"Failed to parse {field}: {e}")
# Check reset fields (timestamp)
if not wait_seconds:
for field in reset_fields:
if field in response_data:
try:
reset_timestamp = int(response_data[field])
now_timestamp = int(time.time())
wait_seconds = reset_timestamp - now_timestamp
logger.info(f"Found {field} in response body, calculated wait: {wait_seconds} seconds")
break
except (ValueError, TypeError) as e:
logger.warning(f"Failed to parse {field}: {e}")
# Check for error message with time information
if not wait_seconds:
error_msg = response_data.get('error', {})
if isinstance(error_msg, dict):
message = error_msg.get('message', '')
elif isinstance(error_msg, str):
message = error_msg
else:
message = response_data.get('message', '')
if message:
logger.info(f"Checking error message for time info: {message}")
# Look for patterns like "try again in X seconds/minutes/hours"
patterns = [
r'try again in (\d+)\s*(second|minute|hour|day)s?',
r'retry after (\d+)\s*(second|minute|hour|day)s?',
r'wait (\d+)\s*(second|minute|hour|day)s?',
r'available in (\d+)\s*(second|minute|hour|day)s?',
]
for pattern in patterns:
match = re.search(pattern, message, re.IGNORECASE)
if match:
value = int(match.group(1))
unit = match.group(2).lower()
# Convert to seconds
multipliers = {
'second': 1,
'minute': 60,
'hour': 3600,
'day': 86400
}
wait_seconds = value * multipliers.get(unit, 1)
logger.info(f"Extracted wait time from message: {value} {unit}(s) = {wait_seconds} seconds")
break
# Ensure wait_seconds is positive and reasonable
if wait_seconds:
if wait_seconds < 0:
logger.warning(f"Calculated negative wait time: {wait_seconds}, setting to 60 seconds")
wait_seconds = 60
elif wait_seconds > 86400: # More than 1 day
logger.warning(f"Calculated very long wait time: {wait_seconds}, capping at 1 day")
wait_seconds = 86400
logger.info(f"Final parsed wait time: {wait_seconds} seconds")
else:
logger.warning("Could not determine wait time from 429 response, using default 60 seconds")
wait_seconds = 60
logger.info("=== END PARSING 429 RATE LIMIT RESPONSE ===")
return wait_seconds
def handle_429_error(self, response_data: Union[Dict, str] = None, headers: Dict = None):
"""
Handle 429 rate limit error by parsing the response and disabling provider
for the appropriate duration. Also records the 429 in the adaptive rate limiter.
"""
import logging
logger = logging.getLogger(__name__)
logger.error("=== 429 RATE LIMIT ERROR DETECTED ===")
logger.error(f"Provider: {self.provider_id}")
# Parse the response to get wait time
wait_seconds = self.parse_429_response(response_data, headers)
# Record 429 in adaptive rate limiter for learning
self.adaptive_limiter.record_429(wait_seconds)
# Check for rate limit headers and auto-configure if not already set
if headers:
self._auto_configure_rate_limits(headers)
# Disable provider for the calculated duration
self.error_tracking['disabled_until'] = time.time() + wait_seconds
logger.error(f"!!! PROVIDER DISABLED DUE TO RATE LIMIT !!!")
logger.error(f"Provider: {self.provider_id}")
logger.error(f"Reason: 429 Too Many Requests")
logger.error(f"Disabled for: {wait_seconds} seconds ({wait_seconds / 60:.1f} minutes)")
logger.error(f"Disabled until: {self.error_tracking['disabled_until']}")
logger.error(f"Adaptive rate limit: {self.adaptive_limiter.current_rate_limit:.2f}s")
logger.error(f"Provider will be automatically re-enabled after cooldown")
logger.error("=== END 429 RATE LIMIT ERROR ===")
def _auto_configure_rate_limits(self, headers: Dict = None):
"""
Auto-configure rate limits from response headers if not already configured.
"""
import logging
from ..config import config
logger = logging.getLogger(__name__)
if not headers:
return
# Extract rate limit headers
rate_limit_header = headers.get('X-RateLimit-Limit') or headers.get('x-ratelimit-limit')
remaining_header = headers.get('X-RateLimit-Remaining') or headers.get('x-ratelimit-remaining')
reset_header = headers.get('X-RateLimit-Reset') or headers.get('x-ratelimit-reset')
if not rate_limit_header:
logger.debug("No X-RateLimit-Limit header found, skipping auto-configuration")
return
try:
rate_limit_value = int(rate_limit_header)
logger.info(f"Found rate limit header: {rate_limit_value} requests")
# Get current provider config
provider_config = config.providers.get(self.provider_id)
if not provider_config:
logger.debug(f"Provider {self.provider_id} not found in config")
return
# Check if we don't have a rate limit configured
current_rate_limit = getattr(provider_config, 'rate_limit', None)
if current_rate_limit is None or current_rate_limit == 0:
# Calculate: use 80% of the limit to stay below it
auto_rate_limit = rate_limit_value * 0.8
logger.info(f"Auto-configuring rate limit for {self.provider_id}: {auto_rate_limit:.1f}s (from header limit: {rate_limit_value})")
# Try to save to config (this may not persist if config is immutable)
try:
# Update the in-memory config
if hasattr(provider_config, 'rate_limit'):
provider_config.rate_limit = auto_rate_limit
logger.info(f"✓ Auto-configured rate_limit: {auto_rate_limit:.1f}s for provider {self.provider_id}")
except Exception as e:
logger.debug(f"Could not auto-configure rate limit: {e}")
else:
logger.debug(f"Rate limit already configured ({current_rate_limit}), skipping auto-configuration")
except (ValueError, TypeError) as e:
logger.debug(f"Could not parse rate limit header: {e}")
def is_rate_limited(self) -> bool:
if self.error_tracking['disabled_until'] and self.error_tracking['disabled_until'] > time.time():
return True
return False
def _get_model_config(self, model: str) -> Optional[Dict]:
"""Get model configuration from provider config"""
provider_config = config.providers.get(self.provider_id)
if provider_config and hasattr(provider_config, 'models') and provider_config.models:
for model_config in provider_config.models:
# Handle both Pydantic objects and dictionaries
model_name_value = model_config.name if hasattr(model_config, 'name') else model_config.get('name')
if model_name_value == model:
# Convert Pydantic object to dict if needed
if hasattr(model_config, 'model_dump'):
return model_config.model_dump()
elif hasattr(model_config, 'dict'):
return model_config.dict()
else:
return model_config
return None
def _check_token_rate_limit(self, model: str, token_count: int) -> bool:
"""
Check if a request would exceed token rate limits.
Returns True if any rate limit would be exceeded, False otherwise.
"""
model_config = self._get_model_config(model)
if not model_config:
return False
import logging
logger = logging.getLogger(__name__)
current_time = time.time()
# Check TPM (tokens per minute)
if model_config.get('rate_limit_TPM'):
tpm = model_config['rate_limit_TPM']
# Get tokens used in the last minute
tokens_used_tpm = self.token_usage.get(model, {}).get('TPM', [])
# Filter to only include requests from the last 60 seconds
one_minute_ago = current_time - 60
recent_tokens_tpm = [t for t in tokens_used_tpm if t > one_minute_ago]
total_tpm = sum(recent_tokens_tpm)
if total_tpm + token_count > tpm:
logger.warning(f"TPM limit would be exceeded: {total_tpm + token_count}/{tpm}")
return True
# Check TPH (tokens per hour)
if model_config.get('rate_limit_TPH'):
tph = model_config['rate_limit_TPH']
# Get tokens used in the last hour
tokens_used_tph = self.token_usage.get(model, {}).get('TPH', [])
# Filter to only include requests from the last 3600 seconds
one_hour_ago = current_time - 3600
recent_tokens_tph = [t for t in tokens_used_tph if t > one_hour_ago]
total_tph = sum(recent_tokens_tph)
if total_tph + token_count > tph:
logger.warning(f"TPH limit would be exceeded: {total_tph + token_count}/{tph}")
return True
# Check TPD (tokens per day)
if model_config.get('rate_limit_TPD'):
tpd = model_config['rate_limit_TPD']
# Get tokens used in the last day
tokens_used_tpd = self.token_usage.get(model, {}).get('TPD', [])
# Filter to only include requests from the last 86400 seconds
one_day_ago = current_time - 86400
recent_tokens_tpd = [t for t in tokens_used_tpd if t > one_day_ago]
total_tpd = sum(recent_tokens_tpd)
if total_tpd + token_count > tpd:
logger.warning(f"TPD limit would be exceeded: {total_tpd + token_count}/{tpd}")
return True
return False
def _record_token_usage(self, model: str, token_count: int):
"""Record token usage for rate limit tracking"""
import logging
logger = logging.getLogger(__name__)
if model not in self.token_usage:
self.token_usage[model] = {"TPM": [], "TPH": [], "TPD": []}
current_time = time.time()
# Record for all three time windows
self.token_usage[model]["TPM"].append((current_time, token_count))
self.token_usage[model]["TPH"].append((current_time, token_count))
self.token_usage[model]["TPD"].append((current_time, token_count))
logger.debug(f"Recorded token usage for model {model}: {token_count} tokens")
def _disable_provider_for_duration(self, duration: str):
"""
Disable provider for a specific duration.
Args:
duration: "1m" (1 minute), "1h" (1 hour), or "1d" (1 day)
"""
import logging
logger = logging.getLogger(__name__)
duration_map = {
"1m": 60,
"1h": 3600,
"1d": 86400
}
if duration not in duration_map:
logger.error(f"Invalid duration: {duration}")
return
disable_seconds = duration_map[duration]
self.error_tracking['disabled_until'] = time.time() + disable_seconds
logger.error(f"!!! PROVIDER DISABLED !!!")
logger.error(f"Provider: {self.provider_id}")
logger.error(f"Reason: Token rate limit exceeded")
logger.error(f"Disabled for: {duration}")
logger.error(f"Disabled until: {self.error_tracking['disabled_until']}")
logger.error(f"Provider will be automatically re-enabled after cooldown")
async def apply_rate_limit(self, rate_limit: Optional[float] = None):
"""Apply rate limiting by waiting if necessary, using adaptive rate limiting."""
import logging
logger = logging.getLogger(__name__)
# Use adaptive rate limiter if enabled
if self.adaptive_limiter.enabled:
adaptive_limit = self.adaptive_limiter.get_rate_limit()
if rate_limit is None:
rate_limit = adaptive_limit
else:
# Use the higher of the two (more conservative)
rate_limit = max(rate_limit, adaptive_limit)
elif rate_limit is None:
rate_limit = self.rate_limit
if rate_limit and rate_limit > 0:
current_time = time.time()
time_since_last_request = current_time - self.last_request_time
required_wait = rate_limit - time_since_last_request
if required_wait > 0:
logger.info(f"[RateLimit] Provider {self.provider_id}: waiting {required_wait:.2f}s (adaptive: {self.adaptive_limiter.enabled})")
await asyncio.sleep(required_wait)
self.last_request_time = time.time()
async def apply_model_rate_limit(self, model: str, rate_limit: Optional[float] = None):
"""Apply rate limiting for a specific model, using adaptive rate limiting."""
import logging
logger = logging.getLogger(__name__)
# Use adaptive rate limiter if enabled
if self.adaptive_limiter.enabled:
adaptive_limit = self.adaptive_limiter.get_rate_limit()
if rate_limit is None:
rate_limit = adaptive_limit
else:
rate_limit = max(rate_limit, adaptive_limit)
elif rate_limit is None:
rate_limit = self.rate_limit
if rate_limit and rate_limit > 0:
current_time = time.time()
last_time = self.model_last_request_time.get(model, 0)
time_since_last_request = current_time - last_time
required_wait = rate_limit - time_since_last_request
if required_wait > 0:
logger.info(f"[RateLimit] Model {model}: waiting {required_wait:.2f}s (adaptive: {self.adaptive_limiter.enabled})")
await asyncio.sleep(required_wait)
self.model_last_request_time[model] = time.time()
def record_failure(self):
import logging
logger = logging.getLogger(__name__)
self.error_tracking['failures'] += 1
self.error_tracking['last_failure'] = time.time()
failure_count = self.error_tracking['failures']
logger.warning(f"=== PROVIDER FAILURE RECORDED ===")
logger.warning(f"Provider: {self.provider_id}")
logger.warning(f"Failure count: {failure_count}/3")
logger.warning(f"Last failure time: {self.error_tracking['last_failure']}")
if self.error_tracking['failures'] >= 3:
# Get cooldown period from provider config, default to 300 seconds (5 minutes)
provider_config = config.providers.get(self.provider_id)
cooldown_seconds = 300 # System default
if provider_config and hasattr(provider_config, 'default_error_cooldown') and provider_config.default_error_cooldown is not None:
cooldown_seconds = provider_config.default_error_cooldown
logger.info(f"Using provider-configured cooldown: {cooldown_seconds} seconds")
else:
logger.info(f"Using system default cooldown: {cooldown_seconds} seconds")
self.error_tracking['disabled_until'] = time.time() + cooldown_seconds
disabled_until_time = self.error_tracking['disabled_until']
cooldown_remaining = int(disabled_until_time - time.time())
logger.error(f"!!! PROVIDER DISABLED !!!")
logger.error(f"Provider: {self.provider_id}")
logger.error(f"Reason: 3 consecutive failures reached")
logger.error(f"Disabled until: {disabled_until_time}")
logger.error(f"Cooldown period: {cooldown_remaining} seconds ({cooldown_seconds / 60:.1f} minutes)")
logger.error(f"Provider will be automatically re-enabled after cooldown")
else:
remaining_failures = 3 - failure_count
logger.warning(f"Provider still active. {remaining_failures} more failure(s) will disable it")
logger.warning(f"=== END FAILURE RECORDING ===")
def record_success(self):
import logging
logger = logging.getLogger(__name__)
was_disabled = self.error_tracking['disabled_until'] is not None
previous_failures = self.error_tracking['failures']
self.error_tracking['failures'] = 0
self.error_tracking['disabled_until'] = None
# Record success in adaptive rate limiter
self.adaptive_limiter.record_success()
logger.info(f"=== PROVIDER SUCCESS RECORDED ===")
logger.info(f"Provider: {self.provider_id}")
logger.info(f"Previous failure count: {previous_failures}")
logger.info(f"Failure count reset to: 0")
logger.info(f"Adaptive rate limit: {self.adaptive_limiter.current_rate_limit:.2f}s")
if was_disabled:
logger.info(f"!!! PROVIDER RE-ENABLED !!!")
logger.info(f"Provider: {self.provider_id}")
logger.info(f"Reason: Successful request after cooldown period")
logger.info(f"Provider is now active and available for requests")
else:
logger.info(f"Provider remains active")
logger.info(f"=== END SUCCESS RECORDING ===")
async def handle_request_with_batching(self, model: str, messages: List[Dict], max_tokens: Optional[int] = None,
temperature: Optional[float] = 1.0, stream: Optional[bool] = False,
tools: Optional[List[Dict]] = None, tool_choice: Optional[Union[str, Dict]] = None) -> Union[Dict, object]:
"""
Handle a request with optional batching.
"""
# Check if batching is enabled and not streaming
if self.batcher.enabled and not stream:
# Prepare request data
request_data = {
"model": model,
"messages": messages,
"max_tokens": max_tokens,
"temperature": temperature,
"stream": stream,
"tools": tools,
"tool_choice": tool_choice,
"api_key": self.api_key
}
# Submit request for batching
batched_result = await self.batcher.submit_request(
provider_id=self.provider_id,
model=model,
request_data=request_data
)
# If batching returned None, it means batching is disabled or we should process directly
if batched_result is not None:
return batched_result
# Fall back to direct processing (either batching disabled, streaming, or batching returned None)
return await self._handle_request_direct(model, messages, max_tokens, temperature, stream, tools, tool_choice)
async def _handle_request_direct(self, model: str, messages: List[Dict], max_tokens: Optional[int] = None,
temperature: Optional[float] = 1.0, stream: Optional[bool] = False,
tools: Optional[List[Dict]] = None, tool_choice: Optional[Union[str, Dict]] = None) -> Union[Dict, object]:
"""
Direct request handling without batching (original handle_request logic).
This method should be overridden by subclasses with their specific implementation.
"""
raise NotImplementedError("_handle_request_direct must be implemented by subclasses")
"""
Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
AISBF - AI Service Broker Framework || AI Should Be Free
Claude Code OAuth2 provider handler.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Why did the programmer quit his job? Because he didn't get arrays!
"""
import httpx
import asyncio
import time
import random
from typing import Dict, List, Optional, Union
from anthropic import Anthropic
from ..models import Model
from ..config import config
from .base import BaseProviderHandler, AnthropicFormatConverter, AISBF_DEBUG
class ClaudeProviderHandler(BaseProviderHandler):
"""
Handler for Claude Code OAuth2 integration using Anthropic SDK.
This handler uses OAuth2 authentication to access Claude models through
the official Anthropic Python SDK. OAuth2 access tokens are passed as
the api_key parameter to the SDK, which handles proper message formatting,
retries, and streaming.
"""
# NOTE: OAuth2 API uses its own model naming scheme that differs from standard Anthropic API
def __init__(self, provider_id: str, api_key: Optional[str] = None):
super().__init__(provider_id, api_key)
self.provider_config = config.get_provider(provider_id)
# Get credentials file path from config
claude_config = getattr(self.provider_config, 'claude_config', None)
credentials_file = None
if claude_config and isinstance(claude_config, dict):
credentials_file = claude_config.get('credentials_file')
# Initialize ClaudeAuth with credentials file (handles OAuth2 flow)
from ..auth.claude import ClaudeAuth
self.auth = ClaudeAuth(credentials_file=credentials_file)
# HTTP client for direct API requests (OAuth2 requires direct HTTP, not SDK)
self.client = httpx.AsyncClient(timeout=httpx.Timeout(300.0, connect=30.0))
# Streaming idle watchdog configuration (Phase 1.3)
self.stream_idle_timeout = 90.0 # seconds - matches vendors/claude
# Cache token tracking for analytics (Phase 2.3)
self.cache_stats = {
'cache_hits': 0,
'cache_misses': 0,
'cache_tokens_read': 0,
'cache_tokens_created': 0,
'total_requests': 0,
}
# Session management for quota tracking
self.session_state = {
'initialized': False,
'session_id': None,
'device_id': None,
'account_uuid': None,
'organization_id': None,
'last_initialized': None,
'quota_5h_reset': None,
'quota_5h_utilization': None,
'quota_7d_reset': None,
'quota_7d_utilization': None,
'representative_claim': None,
'status': None,
'session_timeout': 3600, # 1 hour session timeout
}
# Initialize persistent identifiers for metadata
self._init_session_identifiers()
def _init_session_identifiers(self):
"""Initialize persistent session identifiers (device_id, account_uuid, session_id)."""
import uuid
import hashlib
if not self.session_state.get('device_id'):
device_seed = f"{self.provider_id}-{time.time()}"
self.session_state['device_id'] = hashlib.sha256(device_seed.encode()).hexdigest()
if not self.session_state.get('account_uuid'):
account_id = self.auth.get_account_id()
if account_id:
self.session_state['account_uuid'] = account_id
else:
self.session_state['account_uuid'] = str(uuid.uuid4())
async def _initialize_session(self):
"""Initialize session by sending a quota request to get rate limit information."""
import logging
import json
logger = logging.getLogger(__name__)
logger.info("ClaudeProviderHandler: Initializing session for quota tracking")
try:
headers = self._get_auth_headers(stream=False)
payload = {
'model': 'claude-haiku-4-5-20251001',
'max_tokens': 1,
'messages': [
{
'role': 'user',
'content': 'quota'
}
],
'metadata': {
'user_id': json.dumps({
'device_id': self.session_state['device_id'],
'account_uuid': self.session_state['account_uuid'],
'session_id': self.session_state['session_id']
})
}
}
api_url = 'https://api.anthropic.com/v1/messages?beta=true'
response = await self.client.post(api_url, headers=headers, json=payload)
if response.status_code == 200:
headers_dict = dict(response.headers)
self.session_state.update({
'initialized': True,
'last_initialized': time.time(),
'organization_id': headers_dict.get('anthropic-organization-id'),
'quota_5h_reset': headers_dict.get('anthropic-ratelimit-unified-5h-reset'),
'quota_5h_utilization': headers_dict.get('anthropic-ratelimit-unified-5h-utilization'),
'quota_7d_reset': headers_dict.get('anthropic-ratelimit-unified-7d-reset'),
'quota_7d_utilization': headers_dict.get('anthropic-ratelimit-unified-7d-utilization'),
'representative_claim': headers_dict.get('anthropic-ratelimit-unified-representative-claim'),
'status': headers_dict.get('anthropic-ratelimit-unified-status'),
})
logger.info(f"ClaudeProviderHandler: Session initialized successfully")
logger.info(f" Organization ID: {self.session_state['organization_id']}")
logger.info(f" 5h utilization: {self.session_state['quota_5h_utilization']}")
logger.info(f" 7d utilization: {self.session_state['quota_7d_utilization']}")
logger.info(f" Representative claim: {self.session_state['representative_claim']}")
logger.info(f" Status: {self.session_state['status']}")
return True
else:
logger.warning(f"ClaudeProviderHandler: Session initialization failed: {response.status_code}")
return False
except Exception as e:
logger.error(f"ClaudeProviderHandler: Session initialization error: {e}", exc_info=True)
return False
def _should_refresh_session(self) -> bool:
"""Check if session should be refreshed based on timeout or rate limit status."""
if not self.session_state['initialized']:
return True
if self.session_state['last_initialized']:
age = time.time() - self.session_state['last_initialized']
if age > self.session_state['session_timeout']:
return True
if self.session_state['status'] != 'allowed':
return True
return False
async def _ensure_session(self):
"""Ensure session is initialized and valid before making requests."""
if self._should_refresh_session():
import logging
logger = logging.getLogger(__name__)
logger.info("ClaudeProviderHandler: Session needs refresh, initializing...")
await self._initialize_session()
def _update_session_from_headers(self, headers: Dict):
"""Update session state from response headers."""
import logging
logger = logging.getLogger(__name__)
if 'anthropic-ratelimit-unified-5h-utilization' in headers:
old_util = self.session_state.get('quota_5h_utilization')
new_util = headers.get('anthropic-ratelimit-unified-5h-utilization')
self.session_state.update({
'quota_5h_reset': headers.get('anthropic-ratelimit-unified-5h-reset'),
'quota_5h_utilization': new_util,
'quota_7d_reset': headers.get('anthropic-ratelimit-unified-7d-reset'),
'quota_7d_utilization': headers.get('anthropic-ratelimit-unified-7d-utilization'),
'representative_claim': headers.get('anthropic-ratelimit-unified-representative-claim'),
'status': headers.get('anthropic-ratelimit-unified-status'),
})
if old_util != new_util:
logger.debug(f"ClaudeProviderHandler: Quota utilization updated: {old_util} -> {new_util}")
def _get_sdk_client(self):
"""Get or create an Anthropic SDK client configured with OAuth2 auth token."""
import logging
logger = logging.getLogger(__name__)
access_token = self.auth.get_valid_token()
if not access_token:
logger.error("ClaudeProviderHandler: No OAuth2 access token available")
raise Exception("No OAuth2 access token. Please re-authenticate with /login")
self._sdk_client = Anthropic(
auth_token=access_token,
max_retries=3,
timeout=httpx.Timeout(300.0, connect=30.0),
)
logger.info("ClaudeProviderHandler: Created SDK client with OAuth2 auth token")
return self._sdk_client
def _get_auth_headers(self, stream: bool = False):
"""Get HTTP headers with OAuth2 Bearer token."""
import logging
import uuid
import platform
logger = logging.getLogger(__name__)
access_token = self.auth.get_valid_token()
if not self.session_state.get('session_id'):
self.session_state['session_id'] = str(uuid.uuid4())
session_id = self.session_state['session_id']
request_id = str(uuid.uuid4())
headers = {
'accept': 'application/json',
'anthropic-beta': 'oauth-2025-04-20,interleaved-thinking-2025-05-14,redact-thinking-2026-02-12,context-management-2025-06-27,prompt-caching-scope-2026-01-05,structured-outputs-2025-12-15',
'anthropic-dangerous-direct-browser-access': 'true',
'anthropic-version': '2023-06-01',
'authorization': f'Bearer {access_token}',
'content-type': 'application/json',
'user-agent': 'claude-cli/99.0.0 (undefined, cli)',
'x-app': 'cli',
'x-claude-code-session-id': session_id,
'x-client-request-id': request_id,
'x-stainless-arch': platform.machine() or 'x64',
'x-stainless-lang': 'js',
'x-stainless-os': platform.system() or 'Linux',
'x-stainless-package-version': '0.81.0',
'x-stainless-retry-count': '0',
'x-stainless-runtime': 'node',
'x-stainless-runtime-version': 'v22.22.0',
'x-stainless-timeout': '600',
}
if stream:
headers['accept'] = 'text/event-stream'
headers['accept-encoding'] = 'identity'
else:
headers['accept-encoding'] = 'gzip, deflate, br, zstd'
logger.info("ClaudeProviderHandler: Created auth headers matching claude-cli client")
logger.debug(f"ClaudeProviderHandler: Session ID: {session_id}, Request ID: {request_id}")
import json
logger.debug(f"ClaudeProviderHandler: Full headers: {json.dumps(headers, indent=2)}")
return headers
def _sanitize_tool_call_id(self, tool_call_id: str) -> str:
"""Sanitize tool call ID for Claude API compatibility."""
import re
sanitized = re.sub(r'[^a-zA-Z0-9_-]', '_', tool_call_id)
return sanitized
def _filter_empty_content(self, content: Union[str, List, None]) -> Union[str, List, None]:
"""Filter empty content from messages for Claude API compatibility."""
if content is None:
return None
if isinstance(content, str):
if content.strip() == "":
return None
return content
if isinstance(content, list):
filtered = []
for block in content:
if isinstance(block, dict):
block_type = block.get('type', '')
if block_type == 'text':
text = block.get('text', '')
if text and text.strip():
filtered.append(block)
else:
filtered.append(block)
else:
filtered.append(block)
if not filtered:
return None
return filtered
return content
def _apply_cache_control(self, anthropic_messages: List[Dict], enable_caching: bool = True) -> List[Dict]:
"""Apply ephemeral cache_control to messages for prompt caching."""
if not enable_caching or not anthropic_messages:
return anthropic_messages
import logging
logger = logging.getLogger(__name__)
if len(anthropic_messages) < 4:
logger.debug(f"ClaudeProviderHandler: Skipping cache control (only {len(anthropic_messages)} messages)")
return anthropic_messages
cache_indices = []
for i in range(max(0, len(anthropic_messages) - 2), len(anthropic_messages)):
cache_indices.append(i)
for idx in cache_indices:
msg = anthropic_messages[idx]
content = msg.get('content')
if isinstance(content, str):
if content.strip():
msg['content'] = [
{
'type': 'text',
'text': content,
'cache_control': {'type': 'ephemeral'}
}
]
logger.debug(f"ClaudeProviderHandler: Applied cache_control to message {idx} (string content)")
elif isinstance(content, list) and content:
last_block = content[-1]
if isinstance(last_block, dict):
last_block['cache_control'] = {'type': 'ephemeral'}
logger.debug(f"ClaudeProviderHandler: Applied cache_control to message {idx} (list content)")
logger.info(f"ClaudeProviderHandler: Applied cache_control to {len(cache_indices)} messages for prompt caching")
return anthropic_messages
def _validate_messages(self, messages: List[Dict]) -> List[Dict]:
"""Validate and normalize message roles for Claude API compatibility."""
import logging
logger = logging.getLogger(__name__)
if not messages:
return messages
valid_roles = {'user', 'assistant', 'system', 'tool'}
normalized = []
issues_found = 0
for i, msg in enumerate(messages):
role = msg.get('role', '')
content = msg.get('content', '')
if role not in valid_roles:
logger.warning(f"ClaudeProviderHandler: Unknown message role '{role}' at index {i}, treating as 'user'")
msg['role'] = 'user'
role = 'user'
issues_found += 1
if role == 'system' and i > 0:
logger.warning(f"ClaudeProviderHandler: System message at index {i} (not at start), converting to user")
msg['role'] = 'user'
role = 'user'
issues_found += 1
if role == 'tool':
tool_call_id = msg.get('tool_call_id') or msg.get('name')
if not tool_call_id:
logger.warning(f"ClaudeProviderHandler: Tool message at index {i} missing tool_call_id, adding placeholder")
msg['tool_call_id'] = f"placeholder_{i}"
issues_found += 1
if normalized:
last_role = normalized[-1].get('role', '')
if role == 'user' and last_role == 'user':
logger.debug(f"ClaudeProviderHandler: Inserting synthetic assistant message between consecutive user messages at index {i}")
normalized.append({
'role': 'assistant',
'content': '(empty)'
})
issues_found += 1
elif role == 'assistant' and last_role == 'assistant':
logger.debug(f"ClaudeProviderHandler: Merging consecutive assistant messages at index {i}")
prev_content = normalized[-1].get('content', '')
if isinstance(prev_content, str) and isinstance(content, str):
normalized[-1]['content'] = f"{prev_content}\n{content}"
else:
normalized[-1]['content'] = content
issues_found += 1
continue
normalized.append(msg.copy())
if issues_found:
logger.info(f"ClaudeProviderHandler: Message validation fixed {issues_found} issue(s)")
return normalized
def _truncate_tool_result(self, content: str, max_chars: int = 100000) -> tuple:
"""Truncate tool result content if it exceeds the size limit."""
import logging
logger = logging.getLogger(__name__)
if not content or len(content) <= max_chars:
return content, False
truncation_notice = f"\n\n[Tool result truncated: exceeded {max_chars} character limit. Original length: {len(content)} characters.]"
truncated = content[:max_chars - len(truncation_notice)] + truncation_notice
logger.warning(f"ClaudeProviderHandler: Tool result truncated from {len(content)} to {max_chars} characters")
return truncated, True
def _get_cache_config(self) -> Dict:
"""Get prompt caching configuration from provider config."""
cache_config = {
'enabled': False,
'min_messages': 4,
}
if self.provider_config:
claude_config = getattr(self.provider_config, 'claude_config', None)
if claude_config and isinstance(claude_config, dict):
cache_config['enabled'] = claude_config.get('enable_prompt_caching', False)
cache_config['min_messages'] = claude_config.get('cache_min_messages', 4)
return cache_config
def _get_fallback_models(self) -> List[str]:
"""Get list of fallback models from provider config."""
fallback_models = []
if self.provider_config:
claude_config = getattr(self.provider_config, 'claude_config', None)
if claude_config and isinstance(claude_config, dict):
fallback_models = claude_config.get('fallback_models', [])
return fallback_models
def _convert_tool_choice_to_anthropic(self, tool_choice: Optional[Union[str, Dict]]) -> Optional[Dict]:
"""Convert OpenAI tool_choice format to Anthropic format."""
import logging
if not tool_choice:
return None
if isinstance(tool_choice, str):
if tool_choice == "auto":
return {"type": "auto"}
elif tool_choice == "none":
return None
elif tool_choice == "required":
return {"type": "any"}
else:
logging.warning(f"Unknown tool_choice string: {tool_choice}, using auto")
return {"type": "auto"}
if isinstance(tool_choice, dict):
if tool_choice.get("type") == "function":
function = tool_choice.get("function", {})
tool_name = function.get("name")
if tool_name:
return {"type": "tool", "name": tool_name}
else:
logging.warning(f"tool_choice dict missing function name: {tool_choice}")
return {"type": "auto"}
else:
logging.warning(f"Unknown tool_choice dict format: {tool_choice}, passing through")
return tool_choice
logging.warning(f"Unknown tool_choice type: {type(tool_choice)}, using auto")
return {"type": "auto"}
def _convert_tools_to_anthropic(self, tools: Optional[List[Dict]]) -> Optional[List[Dict]]:
"""Convert OpenAI tools format to Anthropic format."""
import logging
if not tools:
return None
def normalize_schema(schema: Dict) -> Dict:
"""Recursively normalize JSON Schema for Anthropic compatibility."""
if not isinstance(schema, dict):
return schema
result = {}
for key, value in schema.items():
if key == "type" and isinstance(value, list):
non_null_types = [t for t in value if t != "null"]
if len(non_null_types) == 1:
result[key] = non_null_types[0]
elif len(non_null_types) > 1:
result[key] = non_null_types
else:
result[key] = "string"
elif key == "properties" and isinstance(value, dict):
result[key] = {k: normalize_schema(v) for k, v in value.items()}
elif key == "items" and isinstance(value, dict):
result[key] = normalize_schema(value)
elif key == "additionalProperties" and value is False:
continue
elif key == "required" and isinstance(value, list):
properties = schema.get("properties", {})
cleaned_required = []
for field in value:
if field in properties:
field_schema = properties[field]
if isinstance(field_schema, dict):
field_type = field_schema.get("type")
if isinstance(field_type, list) and "null" in field_type:
continue
cleaned_required.append(field)
if cleaned_required:
result[key] = cleaned_required
else:
result[key] = value
return result
anthropic_tools = []
for tool in tools:
if tool.get("type") == "function":
function = tool.get("function", {})
parameters = function.get("parameters", {})
normalized_schema = normalize_schema(parameters)
anthropic_tool = {
"name": function.get("name", ""),
"description": function.get("description", ""),
"input_schema": normalized_schema
}
anthropic_tools.append(anthropic_tool)
logging.info(f"Converted tool to Anthropic format: {anthropic_tool['name']}")
else:
logging.warning(f"Unknown tool type: {tool.get('type')}, skipping")
return anthropic_tools if anthropic_tools else None
def _extract_images_from_content(self, content: Union[str, List, None]) -> List[Dict]:
"""Extract images from OpenAI message content format."""
import logging
logger = logging.getLogger(__name__)
if not isinstance(content, list):
return []
images = []
max_image_size = 5 * 1024 * 1024
for block in content:
if not isinstance(block, dict):
continue
block_type = block.get('type', '')
if block_type == 'image_url':
image_url_obj = block.get('image_url', {})
url = image_url_obj.get('url', '') if isinstance(image_url_obj, dict) else ''
if not url:
logger.warning("ClaudeProviderHandler: Empty image URL in content block")
continue
if url.startswith('data:'):
try:
header, data = url.split(',', 1)
media_part = header.split(';')[0]
media_type = media_part.replace('data:', '')
if len(data) > max_image_size:
logger.warning(f"ClaudeProviderHandler: Image too large ({len(data)} bytes), skipping")
continue
image_block = {
'type': 'image',
'source': {
'type': 'base64',
'media_type': media_type,
'data': data
}
}
images.append(image_block)
logger.debug(f"ClaudeProviderHandler: Extracted base64 image ({media_type}, {len(data)} bytes)")
except (ValueError, IndexError) as e:
logger.warning(f"ClaudeProviderHandler: Failed to parse data URL: {e}")
elif url.startswith(('http://', 'https://')):
image_block = {
'type': 'image',
'source': {
'type': 'url',
'url': url
}
}
images.append(image_block)
logger.debug(f"ClaudeProviderHandler: Extracted URL image: {url[:80]}...")
else:
logger.warning(f"ClaudeProviderHandler: Unsupported image URL format: {url[:80]}...")
elif block_type == 'image':
if 'source' in block:
images.append(block)
logger.debug("ClaudeProviderHandler: Passed through existing image block")
return images
def _convert_messages_to_anthropic(self, messages: List[Dict]) -> tuple:
"""
Convert OpenAI messages format to Anthropic format.
Delegates to shared AnthropicFormatConverter.convert_messages_to_anthropic().
"""
return AnthropicFormatConverter.convert_messages_to_anthropic(messages, sanitize_ids=True)
async def handle_request(self, model: str, messages: List[Dict], max_tokens: Optional[int] = None,
temperature: Optional[float] = 1.0, stream: Optional[bool] = False,
tools: Optional[List[Dict]] = None, tool_choice: Optional[Union[str, Dict]] = None) -> Union[Dict, object]:
if self.is_rate_limited():
raise Exception("Provider rate limited")
# Get fallback models from config (Phase 3.3)
fallback_models = self._get_fallback_models()
models_to_try = [model] + fallback_models
last_error = None
for attempt, current_model in enumerate(models_to_try):
try:
if attempt > 0:
import logging
logger = logging.getLogger(__name__)
logger.warning(f"ClaudeProviderHandler: Retrying with fallback model: {current_model} (original: {model})")
result = await self._handle_request_with_model(
model=current_model,
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
stream=stream,
tools=tools,
tool_choice=tool_choice
)
if stream:
return self._wrap_streaming_with_retry(result, current_model, messages, max_tokens, temperature, tools, tool_choice, models_to_try, attempt)
return result
except Exception as e:
last_error = e
import logging
logger = logging.getLogger(__name__)
error_str = str(e).lower()
is_retryable = any(keyword in error_str for keyword in [
'rate limit', 'overloaded', 'too many requests', '429', '529', '503'
])
if is_retryable and attempt < len(models_to_try) - 1:
logger.warning(f"ClaudeProviderHandler: Retryable error with {current_model}, trying next fallback model")
wait_time = min(2 ** attempt + random.uniform(0, 1), 30)
logger.info(f"ClaudeProviderHandler: Waiting {wait_time:.1f}s before retry")
await asyncio.sleep(wait_time)
continue
logger.error(f"ClaudeProviderHandler: Error with model {current_model}: {str(e)}", exc_info=True)
self.record_failure()
raise e
import logging
logger = logging.getLogger(__name__)
logger.error(f"ClaudeProviderHandler: All models failed (tried: {models_to_try})")
raise last_error
async def _wrap_streaming_with_retry(self, stream_generator, current_model, messages, max_tokens, temperature, tools, tool_choice, models_to_try, attempt):
"""Wrapper that consumes the streaming generator and catches errors."""
import logging
logger = logging.getLogger(__name__)
try:
async for chunk in stream_generator:
yield chunk
except Exception as e:
last_error = e
error_str = str(e).lower()
is_retryable = any(keyword in error_str for keyword in [
'rate limit', 'overloaded', 'too many requests', '429', '529', '503'
])
if is_retryable and attempt < len(models_to_try) - 1:
next_model = models_to_try[attempt + 1]
logger.warning(f"ClaudeProviderHandler: Streaming error with {current_model}, retrying with {next_model}")
wait_time = min(2 ** (attempt + 1) + random.uniform(0, 1), 30)
logger.info(f"ClaudeProviderHandler: Waiting {wait_time:.1f}s before retry")
await asyncio.sleep(wait_time)
try:
result = await self._handle_request_with_model(
model=next_model,
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
stream=True,
tools=tools,
tool_choice=tool_choice
)
async for chunk in self._wrap_streaming_with_retry(result, next_model, messages, max_tokens, temperature, tools, tool_choice, models_to_try, attempt + 1):
yield chunk
return
except Exception as retry_error:
logger.error(f"ClaudeProviderHandler: Retry with {next_model} also failed: {str(retry_error)}")
raise retry_error
logger.error(f"ClaudeProviderHandler: Streaming error: {str(e)}", exc_info=True)
raise e
async def _handle_request_with_model(self, model: str, messages: List[Dict], max_tokens: Optional[int] = None,
temperature: Optional[float] = 1.0, stream: Optional[bool] = False,
tools: Optional[List[Dict]] = None, tool_choice: Optional[Union[str, Dict]] = None) -> Union[Dict, object]:
"""Handle request with a specific model using direct HTTP requests."""
import logging
import json
logger = logging.getLogger(__name__)
logger.info(f"ClaudeProviderHandler: Handling request for model {model} (Direct HTTP mode)")
if AISBF_DEBUG:
logger.info(f"ClaudeProviderHandler: Messages: {messages}")
else:
logger.info(f"ClaudeProviderHandler: Messages count: {len(messages)}")
await self._ensure_session()
await self.apply_rate_limit()
validated_messages = self._validate_messages(messages)
system_message, anthropic_messages = self._convert_messages_to_anthropic(validated_messages)
payload = {
'model': model,
'messages': anthropic_messages,
'max_tokens': max_tokens or 4096,
}
if temperature is not None and temperature > 0:
payload['temperature'] = temperature
if system_message:
billing_header = {
'type': 'text',
'text': 'x-anthropic-billing-header: cc_version=99.0.0.e8c; cc_entrypoint=cli;'
}
claude_intro = {
'type': 'text',
'text': 'You are Claude Code, Anthropic\'s official CLI for Claude.'
}
user_system = {
'type': 'text',
'text': system_message
}
payload['system'] = [billing_header, claude_intro, user_system]
payload['metadata'] = {
'user_id': json.dumps({
'device_id': self.session_state['device_id'],
'account_uuid': self.session_state['account_uuid'],
'session_id': self.session_state['session_id']
})
}
if tools:
anthropic_tools = self._convert_tools_to_anthropic(tools)
if anthropic_tools:
payload['tools'] = anthropic_tools
if tool_choice and tools:
anthropic_tool_choice = self._convert_tool_choice_to_anthropic(tool_choice)
if anthropic_tool_choice:
payload['tool_choice'] = anthropic_tool_choice
headers = self._get_auth_headers(stream=stream)
api_url = 'https://api.anthropic.com/v1/messages?beta=true'
logger.info(f"ClaudeProviderHandler: Request payload keys: {list(payload.keys())}")
if AISBF_DEBUG:
logger.info(f"ClaudeProviderHandler: Full payload: {json.dumps(payload, indent=2)}")
try:
if stream:
payload['stream'] = True
logger.info(f"ClaudeProviderHandler: Using direct HTTP streaming mode")
return self._handle_streaming_request_with_retry(api_url, payload, headers, model)
else:
logger.info(f"ClaudeProviderHandler: Using direct HTTP non-streaming mode")
response = await self._request_with_retry(api_url, headers, payload, max_retries=3)
logger.info(f"ClaudeProviderHandler: HTTP response received successfully")
self._update_session_from_headers(dict(response.headers))
self.record_success()
response_data = response.json()
if AISBF_DEBUG:
logger.info(f"=== RAW CLAUDE RESPONSE ===")
logger.info(f"Raw response data: {json.dumps(response_data, indent=2, default=str)}")
logger.info(f"=== END RAW CLAUDE RESPONSE ===")
openai_response = self._convert_to_openai_format(response_data, model)
if AISBF_DEBUG:
logger.info(f"=== FINAL CLAUDE RESPONSE DICT ===")
logger.info(f"Final response: {json.dumps(openai_response, indent=2, default=str)}")
logger.info(f"=== END FINAL CLAUDE RESPONSE DICT ===")
return openai_response
except Exception as e:
logger.error(f"ClaudeProviderHandler: HTTP request failed: {e}", exc_info=True)
raise
async def _request_with_retry(self, api_url: str, headers: Dict, payload: Dict, max_retries: int = 3):
"""Non-streaming request with automatic retry for transient errors."""
import logging
import json
logger = logging.getLogger(__name__)
last_error = None
for attempt in range(max_retries):
try:
response = await self.client.post(api_url, headers=headers, json=payload)
logger.info(f"ClaudeProviderHandler: Response status: {response.status_code} (attempt {attempt + 1}/{max_retries})")
if response.status_code in (429, 529, 503):
should_retry = response.headers.get('x-should-retry', 'false').lower() == 'true'
if should_retry or response.status_code in (529, 503):
if attempt < max_retries - 1:
wait_time = min(2 ** attempt + random.uniform(0, 1), 30)
try:
error_data = response.json()
error_message = error_data.get('error', {}).get('message', '')
logger.warning(f"ClaudeProviderHandler: Retryable error: {error_message}")
except Exception:
pass
logger.info(f"ClaudeProviderHandler: Retrying in {wait_time:.1f}s (attempt {attempt + 1}/{max_retries})")
await asyncio.sleep(wait_time)
continue
else:
try:
response_data = response.json()
except Exception:
response_data = response.text
self.handle_429_error(response_data, dict(response.headers))
response.raise_for_status()
if response.status_code >= 400:
try:
error_body = response.json()
error_message = error_body.get('error', {}).get('message', 'Unknown error')
error_type = error_body.get('error', {}).get('type', 'unknown')
logger.error(f"ClaudeProviderHandler: API error response: {json.dumps(error_body, indent=2)}")
logger.error(f"ClaudeProviderHandler: Error type: {error_type}")
logger.error(f"ClaudeProviderHandler: Error message: {error_message}")
except Exception:
logger.error(f"ClaudeProviderHandler: API error response (text): {response.text}")
response.raise_for_status()
return response
except httpx.TimeoutException as e:
last_error = e
if attempt < max_retries - 1:
wait_time = min(2 ** attempt + random.uniform(0, 1), 30)
logger.warning(f"ClaudeProviderHandler: Request timeout, retrying in {wait_time:.1f}s")
await asyncio.sleep(wait_time)
continue
else:
logger.error(f"ClaudeProviderHandler: Request timeout after {max_retries} attempts")
raise
except httpx.HTTPError as e:
last_error = e
if attempt < max_retries - 1:
wait_time = min(2 ** attempt + random.uniform(0, 1), 30)
logger.warning(f"ClaudeProviderHandler: HTTP error, retrying in {wait_time:.1f}s: {e}")
await asyncio.sleep(wait_time)
continue
else:
logger.error(f"ClaudeProviderHandler: HTTP error after {max_retries} attempts: {e}")
raise
raise last_error or Exception("Request failed after max retries")
async def _handle_streaming_request_with_retry(self, api_url: str, payload: Dict, headers: Dict, model: str):
"""Wrapper for streaming request that catches rate limit errors at the call site."""
import logging
logger = logging.getLogger(__name__)
try:
async for chunk in self._handle_streaming_request(api_url, payload, headers, model):
yield chunk
except Exception as e:
error_str = str(e).lower()
if '429' in error_str or 'rate limit' in error_str or 'too many requests' in error_str:
logger.error(f"ClaudeProviderHandler: Streaming rate limit error: {e}")
raise Exception(f"Rate limit error: {e}")
raise
async def _handle_streaming_request(self, api_url: str, payload: Dict, headers: Dict, model: str):
"""Handle streaming request to Claude API using direct HTTP."""
import logging
import json
logger = logging.getLogger(__name__)
logger.info(f"ClaudeProviderHandler: Starting streaming request to {api_url}")
if AISBF_DEBUG:
logger.info(f"=== STREAMING REQUEST DETAILS ===")
logger.info(f"URL: {api_url}")
logger.info(f"Headers (auth redacted): {json.dumps({k: v for k, v in headers.items() if k.lower() != 'authorization'}, indent=2)}")
logger.info(f"Payload: {json.dumps(payload, indent=2)}")
logger.info(f"=== END STREAMING REQUEST DETAILS ===")
async with httpx.AsyncClient(timeout=httpx.Timeout(300.0, connect=30.0)) as streaming_client:
async with streaming_client.stream(
"POST",
api_url,
headers=headers,
json=payload
) as response:
logger.info(f"ClaudeProviderHandler: Streaming response status: {response.status_code}")
self._update_session_from_headers(dict(response.headers))
if response.status_code >= 400:
error_text = await response.aread()
logger.error(f"ClaudeProviderHandler: Streaming error response: {error_text}")
try:
error_json = json.loads(error_text)
error_message = error_json.get('error', {}).get('message', 'Unknown error')
error_type = error_json.get('error', {}).get('type', 'unknown')
logger.error(f"ClaudeProviderHandler: Error type: {error_type}")
logger.error(f"ClaudeProviderHandler: Error message: {error_message}")
if response.status_code == 429:
raise Exception(f"Rate limit error (429): {error_type} - {error_message}")
else:
raise Exception(f"Claude API error ({response.status_code}): {error_message}")
except json.JSONDecodeError:
logger.error(f"ClaudeProviderHandler: Could not parse error response as JSON")
if response.status_code == 429:
raise Exception(f"Rate limit error (429): Too Many Requests - {error_text.decode() if isinstance(error_text, bytes) else error_text}")
else:
raise Exception(f"Claude API error: {response.status_code} - {error_text.decode() if isinstance(error_text, bytes) else error_text}")
completion_id = f"claude-{int(time.time())}"
created_time = int(time.time())
first_chunk = True
accumulated_content = ""
accumulated_tool_calls = []
accumulated_thinking = ""
thinking_signature = ""
is_redacted_thinking = False
content_block_index = 0
current_tool_calls = []
last_event_time = time.time()
idle_timeout = self.stream_idle_timeout
stream_stop_reason = None
async for line in response.aiter_lines():
if time.time() - last_event_time > idle_timeout:
logger.error(f"ClaudeProviderHandler: Stream idle timeout ({idle_timeout}s)")
raise TimeoutError(f"Stream idle for {idle_timeout}s")
if not line or not line.startswith('data: '):
continue
data_str = line[6:]
if data_str == '[DONE]':
break
try:
chunk_data = json.loads(data_str)
last_event_time = time.time()
event_type = chunk_data.get('type')
if event_type == 'content_block_start':
content_block = chunk_data.get('content_block', {})
block_type = content_block.get('type', '')
if block_type == 'tool_use':
tool_call = {
'index': content_block_index,
'id': content_block.get('id', ''),
'type': 'function',
'function': {
'name': content_block.get('name', ''),
'arguments': ''
}
}
current_tool_calls.append(tool_call)
logger.debug(f"ClaudeProviderHandler: Tool use block started: {tool_call['function']['name']}")
elif block_type == 'thinking':
accumulated_thinking = ""
is_redacted_thinking = False
thinking_signature = ""
logger.debug(f"ClaudeProviderHandler: Thinking block started")
elif block_type == 'redacted_thinking':
accumulated_thinking = ""
is_redacted_thinking = True
thinking_signature = ""
logger.debug(f"ClaudeProviderHandler: Redacted thinking block started")
content_block_index += 1
elif event_type == 'content_block_delta':
delta = chunk_data.get('delta', {})
delta_type = delta.get('type', '')
if delta_type == 'text_delta':
text = delta.get('text', '')
accumulated_content += text
openai_delta = {'content': text}
if first_chunk:
openai_delta['role'] = 'assistant'
first_chunk = False
openai_chunk = {
'id': completion_id,
'object': 'chat.completion.chunk',
'created': created_time,
'model': f'{self.provider_id}/{model}',
'choices': [{
'index': 0,
'delta': openai_delta,
'finish_reason': None
}]
}
yield f"data: {json.dumps(openai_chunk, ensure_ascii=False)}\n\n".encode('utf-8')
elif delta_type == 'input_json_delta':
partial_json = delta.get('partial_json', '')
if current_tool_calls:
current_tool_calls[-1]['function']['arguments'] += partial_json
elif delta_type == 'thinking_delta':
thinking_text = delta.get('thinking', '')
accumulated_thinking += thinking_text
logger.debug(f"ClaudeProviderHandler: Thinking delta: {len(thinking_text)} chars")
elif delta_type == 'signature_delta':
signature = delta.get('signature', '')
thinking_signature = signature
logger.debug(f"ClaudeProviderHandler: Thinking signature received")
elif event_type == 'content_block_stop':
if current_tool_calls:
tool_call = current_tool_calls[-1]
try:
args = json.loads(tool_call['function']['arguments']) if tool_call['function']['arguments'] else {}
tool_call['function']['arguments'] = json.dumps(args)
except json.JSONDecodeError:
logger.warning(f"ClaudeProviderHandler: Invalid tool call arguments JSON")
tool_call['function']['arguments'] = '{}'
tool_call_chunk = {
'id': completion_id,
'object': 'chat.completion.chunk',
'created': created_time,
'model': f'{self.provider_id}/{model}',
'choices': [{
'index': 0,
'delta': {
'tool_calls': [{
'index': tool_call['index'],
'id': tool_call['id'],
'type': tool_call['type'],
'function': tool_call['function']
}]
},
'finish_reason': None
}]
}
yield f"data: {json.dumps(tool_call_chunk, ensure_ascii=False)}\n\n".encode('utf-8')
logger.debug(f"ClaudeProviderHandler: Emitted tool call: {tool_call['function']['name']}")
elif accumulated_thinking:
block_type = "redacted_thinking" if is_redacted_thinking else "thinking"
logger.info(f"ClaudeProviderHandler: {block_type} block completed ({len(accumulated_thinking)} chars)")
accumulated_thinking = ""
is_redacted_thinking = False
thinking_signature = ""
elif event_type == 'message_delta':
delta_data = chunk_data.get('delta', {})
usage = chunk_data.get('usage', {})
stream_stop_reason = delta_data.get('stop_reason')
if stream_stop_reason:
logger.debug(f"ClaudeProviderHandler: Stream stop_reason: {stream_stop_reason}")
if usage:
logger.debug(f"ClaudeProviderHandler: Streaming usage update: {usage}")
cache_read = usage.get('cache_read_input_tokens', 0)
cache_creation = usage.get('cache_creation_input_tokens', 0)
if cache_read > 0:
self.cache_stats['cache_hits'] += 1
self.cache_stats['cache_tokens_read'] += cache_read
if cache_creation > 0:
self.cache_stats['cache_misses'] += 1
self.cache_stats['cache_tokens_created'] += cache_creation
elif event_type == 'message_stop':
stop_reason_map = {
'end_turn': 'stop',
'max_tokens': 'length',
'stop_sequence': 'stop',
'tool_use': 'tool_calls'
}
if stream_stop_reason:
finish_reason = stop_reason_map.get(stream_stop_reason, 'stop')
elif current_tool_calls:
finish_reason = 'tool_calls'
else:
finish_reason = 'stop'
logger.debug(f"ClaudeProviderHandler: Final finish_reason: {finish_reason}")
final_chunk = {
'id': completion_id,
'object': 'chat.completion.chunk',
'created': created_time,
'model': f'{self.provider_id}/{model}',
'choices': [{
'index': 0,
'delta': {},
'finish_reason': finish_reason
}]
}
yield f"data: {json.dumps(final_chunk, ensure_ascii=False)}\n\n".encode('utf-8')
yield b"data: [DONE]\n\n"
except json.JSONDecodeError as e:
logger.warning(f"Failed to parse streaming chunk: {e}")
continue
def _convert_to_openai_format(self, claude_response: Dict, model: str) -> Dict:
"""Convert Claude API response to OpenAI format."""
import logging
import json
logger = logging.getLogger(__name__)
logger.info(f"ClaudeProviderHandler: Converting response to OpenAI format")
content_text = ""
tool_calls = []
thinking_text = ""
if 'content' in claude_response:
for block in claude_response['content']:
block_type = block.get('type', '')
if block_type == 'text':
content_text += block.get('text', '')
elif block_type == 'tool_use':
tool_calls.append({
'id': block.get('id', f"call_{len(tool_calls)}"),
'type': 'function',
'function': {
'name': block.get('name', ''),
'arguments': json.dumps(block.get('input', {}))
}
})
elif block_type == 'thinking':
thinking_text = block.get('thinking', '')
logger.debug(f"ClaudeProviderHandler: Extracted thinking block ({len(thinking_text)} chars)")
elif block_type == 'redacted_thinking':
logger.debug(f"ClaudeProviderHandler: Found redacted_thinking block")
stop_reason_map = {
'end_turn': 'stop',
'max_tokens': 'length',
'stop_sequence': 'stop',
'tool_use': 'tool_calls'
}
stop_reason = claude_response.get('stop_reason', 'end_turn')
finish_reason = stop_reason_map.get(stop_reason, 'stop')
usage = claude_response.get('usage', {})
input_tokens = usage.get('input_tokens', 0)
output_tokens = usage.get('output_tokens', 0)
cache_read_tokens = usage.get('cache_read_input_tokens', 0)
cache_creation_tokens = usage.get('cache_creation_input_tokens', 0)
if cache_read_tokens or cache_creation_tokens:
logger.info(f"ClaudeProviderHandler: Cache usage - read: {cache_read_tokens}, creation: {cache_creation_tokens}")
openai_response = {
'id': f"claude-{model}-{int(time.time())}",
'object': 'chat.completion',
'created': int(time.time()),
'model': f'{self.provider_id}/{model}',
'choices': [{
'index': 0,
'message': {
'role': 'assistant',
'content': content_text if content_text else None
},
'finish_reason': finish_reason
}],
'usage': {
'prompt_tokens': input_tokens,
'completion_tokens': output_tokens,
'total_tokens': input_tokens + output_tokens,
'prompt_tokens_details': {
'cached_tokens': cache_read_tokens,
'audio_tokens': 0
},
'completion_tokens_details': {
'reasoning_tokens': 0,
'audio_tokens': 0
}
}
}
if tool_calls:
openai_response['choices'][0]['message']['tool_calls'] = tool_calls
if thinking_text:
openai_response['choices'][0]['message']['provider_options'] = {
'anthropic': {
'thinking': thinking_text
}
}
logger.debug(f"ClaudeProviderHandler: Added thinking content to response ({len(thinking_text)} chars)")
return openai_response
def _convert_sdk_response_to_openai(self, response, model: str) -> Dict:
"""Convert Anthropic SDK response object to OpenAI format."""
import logging
import json
logger = logging.getLogger(__name__)
message_content = ""
tool_calls = []
thinking_text = ""
for block in response.content:
block_type = getattr(block, 'type', '')
if block_type == 'text' or hasattr(block, 'text'):
message_content += getattr(block, 'text', '')
elif block_type == 'tool_use':
tool_calls.append({
'id': getattr(block, 'id', f"call_{len(tool_calls)}"),
'type': 'function',
'function': {
'name': getattr(block, 'name', ''),
'arguments': json.dumps(getattr(block, 'input', {}))
}
})
elif block_type == 'thinking':
thinking_text = getattr(block, 'thinking', '')
logger.debug(f"ClaudeProviderHandler: Extracted thinking block ({len(thinking_text)} chars)")
elif block_type == 'redacted_thinking':
logger.debug(f"ClaudeProviderHandler: Found redacted_thinking block")
stop_reason_map = {
'end_turn': 'stop',
'max_tokens': 'length',
'stop_sequence': 'stop',
'tool_use': 'tool_calls'
}
stop_reason = getattr(response, 'stop_reason', 'end_turn') or 'end_turn'
finish_reason = stop_reason_map.get(stop_reason, 'stop')
usage = getattr(response, 'usage', None)
input_tokens = getattr(usage, 'input_tokens', 0) if usage else 0
output_tokens = getattr(usage, 'output_tokens', 0) if usage else 0
cache_read_tokens = getattr(usage, 'cache_read_input_tokens', 0) if usage else 0
cache_creation_tokens = getattr(usage, 'cache_creation_input_tokens', 0) if usage else 0
if cache_read_tokens or cache_creation_tokens:
logger.info(f"ClaudeProviderHandler: Cache usage - read: {cache_read_tokens}, creation: {cache_creation_tokens}")
openai_response = {
'id': getattr(response, 'id', f"claude-{model}-{int(time.time())}"),
'object': 'chat.completion',
'created': int(time.time()),
'model': f'{self.provider_id}/{model}',
'choices': [{
'index': 0,
'message': {
'role': 'assistant',
'content': message_content if message_content else None,
},
'finish_reason': finish_reason
}],
'usage': {
'prompt_tokens': input_tokens,
'completion_tokens': output_tokens,
'total_tokens': input_tokens + output_tokens,
'prompt_tokens_details': {
'cached_tokens': cache_read_tokens,
'audio_tokens': 0
},
'completion_tokens_details': {
'reasoning_tokens': 0,
'audio_tokens': 0
}
}
}
if tool_calls:
openai_response['choices'][0]['message']['tool_calls'] = tool_calls
if thinking_text:
openai_response['choices'][0]['message']['provider_options'] = {
'anthropic': {
'thinking': thinking_text
}
}
logger.debug(f"ClaudeProviderHandler: Added thinking content to response ({len(thinking_text)} chars)")
return openai_response
async def _handle_streaming_request_sdk(self, client, request_kwargs: Dict, model: str):
"""Handle streaming request using Anthropic SDK's async streaming API."""
import logging
import json
logger = logging.getLogger(__name__)
logger.info(f"ClaudeProviderHandler: Starting SDK streaming request")
completion_id = f"claude-{int(time.time())}"
created_time = int(time.time())
first_chunk = True
accumulated_content = ""
accumulated_thinking = ""
thinking_signature = ""
is_redacted_thinking = False
content_block_index = 0
current_tool_calls = []
last_event_time = time.time()
idle_timeout = self.stream_idle_timeout
try:
stream = await client.messages.create(**request_kwargs, stream=True)
async for event in stream:
last_event_time = time.time()
if time.time() - last_event_time > idle_timeout:
logger.error(f"ClaudeProviderHandler: Stream idle timeout ({idle_timeout}s)")
raise TimeoutError(f"Stream idle for {idle_timeout}s")
event_type = getattr(event, 'type', None)
if event_type == 'content_block_start':
content_block = getattr(event, 'content_block', None)
if content_block:
block_type = getattr(content_block, 'type', '')
if block_type == 'tool_use':
tool_call = {
'index': content_block_index,
'id': getattr(content_block, 'id', ''),
'type': 'function',
'function': {
'name': getattr(content_block, 'name', ''),
'arguments': ''
}
}
current_tool_calls.append(tool_call)
logger.debug(f"ClaudeProviderHandler: Tool use block started: {tool_call['function']['name']}")
elif block_type == 'thinking':
accumulated_thinking = ""
is_redacted_thinking = False
thinking_signature = ""
logger.debug(f"ClaudeProviderHandler: Thinking block started")
elif block_type == 'redacted_thinking':
accumulated_thinking = ""
is_redacted_thinking = True
thinking_signature = ""
logger.debug(f"ClaudeProviderHandler: Redacted thinking block started")
content_block_index += 1
elif event_type == 'content_block_delta':
delta = getattr(event, 'delta', None)
if delta:
delta_type = getattr(delta, 'type', '')
if delta_type == 'text_delta':
text = getattr(delta, 'text', '')
accumulated_content += text
openai_delta = {'content': text}
if first_chunk:
openai_delta['role'] = 'assistant'
first_chunk = False
openai_chunk = {
'id': completion_id,
'object': 'chat.completion.chunk',
'created': created_time,
'model': f'{self.provider_id}/{model}',
'choices': [{
'index': 0,
'delta': openai_delta,
'finish_reason': None
}]
}
yield f"data: {json.dumps(openai_chunk, ensure_ascii=False)}\n\n".encode('utf-8')
elif delta_type == 'input_json_delta':
partial_json = getattr(delta, 'partial_json', '')
if current_tool_calls:
current_tool_calls[-1]['function']['arguments'] += partial_json
elif delta_type == 'thinking_delta':
thinking_text = getattr(delta, 'thinking', '')
accumulated_thinking += thinking_text
logger.debug(f"ClaudeProviderHandler: Thinking delta: {len(thinking_text)} chars")
elif delta_type == 'signature_delta':
signature = getattr(delta, 'signature', '')
thinking_signature = signature
logger.debug(f"ClaudeProviderHandler: Thinking signature received")
elif event_type == 'content_block_stop':
if current_tool_calls:
tool_call = current_tool_calls[-1]
try:
args = json.loads(tool_call['function']['arguments']) if tool_call['function']['arguments'] else {}
tool_call['function']['arguments'] = json.dumps(args)
except json.JSONDecodeError:
logger.warning(f"ClaudeProviderHandler: Invalid tool call arguments JSON")
tool_call['function']['arguments'] = '{}'
tool_call_chunk = {
'id': completion_id,
'object': 'chat.completion.chunk',
'created': created_time,
'model': f'{self.provider_id}/{model}',
'choices': [{
'index': 0,
'delta': {
'tool_calls': [{
'index': tool_call['index'],
'id': tool_call['id'],
'type': tool_call['type'],
'function': tool_call['function']
}]
},
'finish_reason': None
}]
}
yield f"data: {json.dumps(tool_call_chunk, ensure_ascii=False)}\n\n".encode('utf-8')
logger.debug(f"ClaudeProviderHandler: Emitted tool call: {tool_call['function']['name']}")
elif accumulated_thinking:
block_type = "redacted_thinking" if is_redacted_thinking else "thinking"
logger.info(f"ClaudeProviderHandler: {block_type} block completed ({len(accumulated_thinking)} chars)")
accumulated_thinking = ""
is_redacted_thinking = False
thinking_signature = ""
elif event_type == 'message_delta':
usage = getattr(event, 'usage', None)
if usage:
logger.debug(f"ClaudeProviderHandler: Streaming usage update: {usage}")
cache_read = getattr(usage, 'cache_read_input_tokens', 0)
cache_creation = getattr(usage, 'cache_creation_input_tokens', 0)
if cache_read > 0:
self.cache_stats['cache_hits'] += 1
self.cache_stats['cache_tokens_read'] += cache_read
if cache_creation > 0:
self.cache_stats['cache_misses'] += 1
self.cache_stats['cache_tokens_created'] += cache_creation
elif event_type == 'message_stop':
final_chunk = {
'id': completion_id,
'object': 'chat.completion.chunk',
'created': created_time,
'model': f'{self.provider_id}/{model}',
'choices': [{
'index': 0,
'delta': {},
'finish_reason': 'stop'
}]
}
yield f"data: {json.dumps(final_chunk, ensure_ascii=False)}\n\n".encode('utf-8')
yield b"data: [DONE]\n\n"
logger.info(f"ClaudeProviderHandler: SDK streaming completed successfully")
self.record_success()
except Exception as e:
logger.error(f"ClaudeProviderHandler: SDK streaming error: {str(e)}", exc_info=True)
raise
def get_cache_stats(self) -> Dict:
"""Get cache usage statistics (Phase 2.3)."""
total = self.cache_stats['cache_hits'] + self.cache_stats['cache_misses']
hit_rate = (self.cache_stats['cache_hits'] / total * 100) if total > 0 else 0
return {
**self.cache_stats,
'total_cache_events': total,
'cache_hit_rate_percent': round(hit_rate, 2),
}
def _get_models_cache_path(self) -> str:
"""Get the path to the models cache file."""
import os
cache_dir = os.path.expanduser("~/.aisbf")
os.makedirs(cache_dir, exist_ok=True)
return os.path.join(cache_dir, f"claude_models_cache_{self.provider_id}.json")
def _save_models_cache(self, models: List[Model]) -> None:
"""Save models to cache file."""
import logging
import json
try:
cache_path = self._get_models_cache_path()
cache_data = {
'timestamp': time.time(),
'models': []
}
for m in models:
model_dict = {'id': m.id, 'name': m.name}
if m.context_size:
model_dict['context_size'] = m.context_size
if m.context_length:
model_dict['context_length'] = m.context_length
if m.description:
model_dict['description'] = m.description
if m.pricing:
model_dict['pricing'] = m.pricing
if m.top_provider:
model_dict['top_provider'] = m.top_provider
if m.supported_parameters:
model_dict['supported_parameters'] = m.supported_parameters
cache_data['models'].append(model_dict)
with open(cache_path, 'w') as f:
json.dump(cache_data, f, indent=2)
logging.info(f"ClaudeProviderHandler: ✓ Saved {len(models)} models to cache: {cache_path}")
except Exception as e:
logging.warning(f"ClaudeProviderHandler: Failed to save models cache: {e}")
def _load_models_cache(self) -> Optional[List[Model]]:
"""Load models from cache file if available and not too old."""
import logging
import json
import os
try:
cache_path = self._get_models_cache_path()
if not os.path.exists(cache_path):
logging.info(f"ClaudeProviderHandler: No cache file found at {cache_path}")
return None
with open(cache_path, 'r') as f:
cache_data = json.load(f)
cache_age = time.time() - cache_data.get('timestamp', 0)
cache_age_hours = cache_age / 3600
logging.info(f"ClaudeProviderHandler: Found cache file (age: {cache_age_hours:.1f} hours)")
if cache_age > 86400:
logging.info(f"ClaudeProviderHandler: Cache is too old (>{cache_age_hours:.1f} hours), ignoring")
return None
models = []
for m in cache_data.get('models', []):
models.append(Model(
id=m['id'],
name=m['name'],
provider_id=self.provider_id,
context_size=m.get('context_size'),
context_length=m.get('context_length'),
description=m.get('description'),
pricing=m.get('pricing'),
top_provider=m.get('top_provider'),
supported_parameters=m.get('supported_parameters')
))
if models:
logging.info(f"ClaudeProviderHandler: ✓ Loaded {len(models)} models from cache")
return models
else:
logging.info(f"ClaudeProviderHandler: Cache file is empty")
return None
except Exception as e:
logging.warning(f"ClaudeProviderHandler: Failed to load models cache: {e}")
return None
async def get_models(self) -> List[Model]:
"""Return list of available Claude models by querying the API."""
try:
import logging
import json
logging.info("=" * 80)
logging.info("ClaudeProviderHandler: Starting model list retrieval")
logging.info("=" * 80)
await self.apply_rate_limit()
try:
logging.info("ClaudeProviderHandler: [1/3] Attempting primary API endpoint...")
headers = self._get_auth_headers(stream=False)
api_endpoint = 'https://api.anthropic.com/v1/models'
logging.info(f"ClaudeProviderHandler: Calling API endpoint: {api_endpoint}")
logging.info(f"ClaudeProviderHandler: Using OAuth2 authentication with full headers")
response = await self.client.get(api_endpoint, headers=headers)
logging.info(f"ClaudeProviderHandler: API response status: {response.status_code}")
if response.status_code == 200:
models_data = response.json()
logging.info(f"ClaudeProviderHandler: ✓ Primary API call successful!")
logging.info(f"ClaudeProviderHandler: Response data keys: {list(models_data.keys())}")
logging.info(f"ClaudeProviderHandler: Retrieved {len(models_data.get('data', []))} models from API")
if AISBF_DEBUG:
logging.info(f"ClaudeProviderHandler: Full API response: {models_data}")
models = []
for model_data in models_data.get('data', []):
model_id = model_data.get('id', '')
display_name = model_data.get('display_name') or model_data.get('name') or model_id
context_size = (
model_data.get('max_input_tokens') or
model_data.get('context_window') or
model_data.get('context_length') or
model_data.get('max_tokens')
)
description = model_data.get('description')
models.append(Model(
id=model_id,
name=display_name,
provider_id=self.provider_id,
context_size=context_size,
context_length=context_size,
description=description
))
logging.info(f"ClaudeProviderHandler: - {model_id} ({display_name}, context: {context_size})")
if models:
self._save_models_cache(models)
logging.info("=" * 80)
logging.info(f"ClaudeProviderHandler: ✓ SUCCESS - Returning {len(models)} models from primary API")
logging.info(f"ClaudeProviderHandler: Source: Dynamic API retrieval (Anthropic)")
logging.info("=" * 80)
return models
else:
logging.warning("ClaudeProviderHandler: ✗ Primary API returned empty model list")
else:
logging.warning(f"ClaudeProviderHandler: ✗ Primary API call failed with status {response.status_code}")
try:
error_body = response.json()
logging.warning(f"ClaudeProviderHandler: Error response: {error_body}")
except:
logging.warning(f"ClaudeProviderHandler: Error response (text): {response.text[:200]}")
except Exception as api_error:
logging.warning(f"ClaudeProviderHandler: ✗ Exception during primary API call")
logging.warning(f"ClaudeProviderHandler: Error type: {type(api_error).__name__}")
logging.warning(f"ClaudeProviderHandler: Error message: {str(api_error)}")
if AISBF_DEBUG:
logging.warning(f"ClaudeProviderHandler: Full traceback:", exc_info=True)
try:
logging.info("-" * 80)
logging.info("ClaudeProviderHandler: [2/3] Attempting fallback endpoint...")
fallback_endpoint = 'http://lisa.nexlab.net:5000/claude/models'
logging.info(f"ClaudeProviderHandler: Calling fallback endpoint: {fallback_endpoint}")
fallback_client = httpx.AsyncClient(timeout=httpx.Timeout(10.0, connect=5.0))
try:
fallback_response = await fallback_client.get(fallback_endpoint)
logging.info(f"ClaudeProviderHandler: Fallback response status: {fallback_response.status_code}")
if fallback_response.status_code == 200:
fallback_data = fallback_response.json()
logging.info(f"ClaudeProviderHandler: ✓ Fallback API call successful!")
if AISBF_DEBUG:
logging.info(f"ClaudeProviderHandler: Fallback response: {fallback_data}")
models_list = fallback_data if isinstance(fallback_data, list) else fallback_data.get('data', fallback_data.get('models', []))
models = []
for model_data in models_list:
if isinstance(model_data, str):
models.append(Model(id=model_data, name=model_data, provider_id=self.provider_id))
elif isinstance(model_data, dict):
model_id = model_data.get('id', model_data.get('model', ''))
display_name = model_data.get('name', model_data.get('display_name', model_id))
context_size = (
model_data.get('max_input_tokens') or
model_data.get('context_window') or
model_data.get('context_length') or
model_data.get('context_size') or
model_data.get('max_tokens')
)
description = model_data.get('description')
models.append(Model(
id=model_id,
name=display_name,
provider_id=self.provider_id,
context_size=context_size,
context_length=context_size,
description=description
))
if models:
for model in models:
logging.info(f"ClaudeProviderHandler: - {model.id} ({model.name})")
self._save_models_cache(models)
logging.info("=" * 80)
logging.info(f"ClaudeProviderHandler: ✓ SUCCESS - Returning {len(models)} models from fallback API")
logging.info(f"ClaudeProviderHandler: Source: Dynamic API retrieval (Fallback)")
logging.info("=" * 80)
return models
else:
logging.warning("ClaudeProviderHandler: ✗ Fallback API returned empty model list")
else:
logging.warning(f"ClaudeProviderHandler: ✗ Fallback API call failed with status {fallback_response.status_code}")
try:
error_body = fallback_response.json()
logging.warning(f"ClaudeProviderHandler: Fallback error response: {error_body}")
except:
logging.warning(f"ClaudeProviderHandler: Fallback error response (text): {fallback_response.text[:200]}")
finally:
await fallback_client.aclose()
except Exception as fallback_error:
logging.warning(f"ClaudeProviderHandler: ✗ Exception during fallback API call")
logging.warning(f"ClaudeProviderHandler: Error type: {type(fallback_error).__name__}")
logging.warning(f"ClaudeProviderHandler: Error message: {str(fallback_error)}")
if AISBF_DEBUG:
logging.warning(f"ClaudeProviderHandler: Full traceback:", exc_info=True)
logging.info("-" * 80)
logging.info("ClaudeProviderHandler: [3/3] Attempting to load from cache...")
cached_models = self._load_models_cache()
if cached_models:
for model in cached_models:
logging.info(f"ClaudeProviderHandler: - {model.id} ({model.name})")
logging.info("=" * 80)
logging.info(f"ClaudeProviderHandler: ✓ Returning {len(cached_models)} models from cache")
logging.info(f"ClaudeProviderHandler: Source: Cached model list")
logging.info("=" * 80)
return cached_models
logging.info("-" * 80)
logging.info("ClaudeProviderHandler: Using static fallback model list")
static_models = [
Model(id="claude-3-7-sonnet-20250219", name="Claude 3.7 Sonnet", provider_id=self.provider_id, context_size=200000, context_length=200000),
Model(id="claude-3-5-sonnet-20241022", name="Claude 3.5 Sonnet", provider_id=self.provider_id, context_size=200000, context_length=200000),
Model(id="claude-3-5-haiku-20241022", name="Claude 3.5 Haiku", provider_id=self.provider_id, context_size=200000, context_length=200000),
Model(id="claude-3-opus-20240229", name="Claude 3 Opus", provider_id=self.provider_id, context_size=200000, context_length=200000),
]
for model in static_models:
logging.info(f"ClaudeProviderHandler: - {model.id} ({model.name})")
logging.info("=" * 80)
logging.info(f"ClaudeProviderHandler: ✓ Returning {len(static_models)} models from static list")
logging.info(f"ClaudeProviderHandler: Source: Static fallback configuration")
logging.info("=" * 80)
return static_models
except Exception as e:
import logging
logging.error("=" * 80)
logging.error(f"ClaudeProviderHandler: ✗ FATAL ERROR getting models: {str(e)}")
logging.error("=" * 80)
logging.error(f"ClaudeProviderHandler: Error details:", exc_info=True)
raise e
"""
Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
AISBF - AI Service Broker Framework || AI Should Be Free
Google (Gemini) provider handler.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Why did the programmer quit his job? Because he didn't get arrays!
"""
import time
from typing import Dict, List, Optional, Union
from google import genai
from ..models import Model
from ..config import config
from ..utils import count_messages_tokens
from .base import BaseProviderHandler, AISBF_DEBUG
class GoogleProviderHandler(BaseProviderHandler):
def __init__(self, provider_id: str, api_key: str):
super().__init__(provider_id, api_key)
# Initialize google-genai library
from google import genai
self.client = genai.Client(api_key=api_key)
# Cache storage for Google Context Caching
self._cached_content_refs = {} # {cache_key: (cached_content_name, expiry_time)}
async def handle_request(self, model: str, messages: List[Dict], max_tokens: Optional[int] = None,
temperature: Optional[float] = 1.0, stream: Optional[bool] = False,
tools: Optional[List[Dict]] = None, tool_choice: Optional[Union[str, Dict]] = None) -> Union[Dict, object]:
if self.is_rate_limited():
raise Exception("Provider rate limited")
try:
import logging
logging.info(f"GoogleProviderHandler: Handling request for model {model}")
logging.info(f"GoogleProviderHandler: Stream: {stream}")
if AISBF_DEBUG:
logging.info(f"GoogleProviderHandler: Messages: {messages}")
else:
logging.info(f"GoogleProviderHandler: Messages count: {len(messages)}")
if tools:
logging.info(f"GoogleProviderHandler: Tools provided: {len(tools)} tools")
if AISBF_DEBUG:
logging.info(f"GoogleProviderHandler: Tools: {tools}")
if tool_choice:
logging.info(f"GoogleProviderHandler: Tool choice: {tool_choice}")
# Apply rate limiting
await self.apply_rate_limit()
# Check if native caching is enabled for this provider
provider_config = config.providers.get(self.provider_id)
enable_native_caching = getattr(provider_config, 'enable_native_caching', False)
cache_ttl = getattr(provider_config, 'cache_ttl', None)
min_cacheable_tokens = getattr(provider_config, 'min_cacheable_tokens', 1000)
logging.info(f"GoogleProviderHandler: Native caching enabled: {enable_native_caching}")
# Initialize cached_content_name for this request (will be set if we use caching)
cached_content_name = None
if enable_native_caching:
logging.info(f"GoogleProviderHandler: Cache TTL: {cache_ttl} seconds, min_cacheable_tokens: {min_cacheable_tokens}")
# Calculate total token count to determine if caching is beneficial
total_tokens = count_messages_tokens(messages, model)
logging.info(f"GoogleProviderHandler: Total message tokens: {total_tokens}")
# Only use caching if total tokens exceed minimum threshold
if total_tokens >= min_cacheable_tokens:
# Generate a cache key based on system message and early conversation
cache_key = self._generate_cache_key(messages, model)
logging.info(f"GoogleProviderHandler: Generated cache_key: {cache_key}")
# Check if we have a valid cached content
if cache_key in self._cached_content_refs:
cached_content_name, expiry_time = self._cached_content_refs[cache_key]
current_time = time.time()
if current_time < expiry_time:
logging.info(f"GoogleProviderHandler: Using cached content: {cached_content_name} (expires in {expiry_time - current_time:.0f}s)")
else:
# Cache expired, remove it
logging.info(f"GoogleProviderHandler: Cache expired, removing: {cached_content_name}")
del self._cached_content_refs[cache_key]
cached_content_name = None
else:
logging.info(f"GoogleProviderHandler: No cached content found for cache_key")
# If no cached content, and we have a TTL, mark to create cache after first request
if cached_content_name is None and cache_ttl:
self._pending_cache_key = (cache_key, cache_ttl, messages)
logging.info(f"GoogleProviderHandler: Will create cached content after first request")
else:
self._pending_cache_key = None
else:
logging.info(f"GoogleProviderHandler: Total tokens ({total_tokens}) below min_cacheable_tokens ({min_cacheable_tokens}), skipping cache")
self._pending_cache_key = None
else:
self._pending_cache_key = None
# Build content from messages
content = "\n\n".join([f"{msg['role']}: {msg['content']}" for msg in messages])
# Build config with only non-None values
config_params = {"temperature": temperature}
if max_tokens is not None:
config_params["max_output_tokens"] = max_tokens
# Convert OpenAI tools to Google's function calling format
google_tools = None
if tools:
function_declarations = []
for tool in tools:
if tool.get("type") == "function":
function = tool.get("function", {})
# Use Google's SDK types for proper validation
from google.genai import types as genai_types
function_declaration = genai_types.FunctionDeclaration(
name=function.get("name"),
description=function.get("description", ""),
parameters=function.get("parameters", {})
)
function_declarations.append(function_declaration)
logging.info(f"GoogleProviderHandler: Converted tool to Google format: {function_declaration}")
if function_declarations:
from google.genai import types as genai_types
google_tools = genai_types.Tool(function_declarations=function_declarations)
logging.info(f"GoogleProviderHandler: Added {len(function_declarations)} tools to google_tools")
config_params["tools"] = google_tools
logging.info(f"GoogleProviderHandler: Added tools to config")
# Handle streaming request
if stream:
logging.info(f"GoogleProviderHandler: Using streaming API")
from google import genai
stream_client = genai.Client(api_key=self.api_key)
chunks = []
for chunk in stream_client.models.generate_content_stream(
model=model,
contents=content,
config=config_params
):
chunks.append(chunk)
logging.info(f"GoogleProviderHandler: Streaming response received (total chunks: {len(chunks)})")
self.record_success()
# After successful streaming response, create cached content if pending
if hasattr(self, '_pending_cache_key') and self._pending_cache_key:
cache_key, cache_ttl, cache_messages = self._pending_cache_key
try:
new_cached_name = self._create_cached_content(cache_messages, model, cache_ttl)
if new_cached_name:
expiry_time = time.time() + cache_ttl
self._cached_content_refs[cache_key] = (new_cached_name, expiry_time)
logging.info(f"GoogleProviderHandler: Cached content stored (streaming): {new_cached_name}, expires in {cache_ttl}s")
except Exception as e:
logging.warning(f"GoogleProviderHandler: Failed to create cache after streaming: {e}")
self._pending_cache_key = None
async def async_generator():
for chunk in chunks:
yield chunk
return async_generator()
else:
# Non-streaming request
use_cached = cached_content_name is not None
if use_cached and cached_content_name:
last_msg_count = min(3, len(messages))
last_messages = messages[-last_msg_count:] if messages else []
content = "\n\n".join([f"{msg['role']}: {msg['content']}" for msg in last_messages])
logging.info(f"GoogleProviderHandler: Using cached content, sending last {last_msg_count} messages")
else:
content = "\n\n".join([f"{msg['role']}: {msg['content']}" for msg in messages])
config_params = {"temperature": temperature}
if max_tokens is not None:
config_params["max_output_tokens"] = max_tokens
google_tools = None
if tools:
function_declarations = []
for tool in tools:
if tool.get("type") == "function":
function = tool.get("function", {})
from google.genai import types as genai_types
function_declaration = genai_types.FunctionDeclaration(
name=function.get("name"),
description=function.get("description", ""),
parameters=function.get("parameters", {})
)
function_declarations.append(function_declaration)
logging.info(f"GoogleProviderHandler: Converted tool to Google format: {function_declaration}")
if function_declarations:
from google.genai import types as genai_types
google_tools = genai_types.Tool(function_declarations=function_declarations)
logging.info(f"GoogleProviderHandler: Added {len(function_declarations)} tools to google_tools")
config_params["tools"] = google_tools
logging.info(f"GoogleProviderHandler: Added tools to config")
if use_cached and cached_content_name:
try:
logging.info(f"GoogleProviderHandler: Making request with cached_content: {cached_content_name}")
response = self.client.models.generate_content(
model=model,
contents=content,
config=config_params,
cached_content=cached_content_name
)
except TypeError as e:
logging.warning(f"GoogleProviderHandler: cached_content param not supported, using regular request: {e}")
response = self.client.models.generate_content(
model=model,
contents=content,
config=config_params
)
else:
response = self.client.models.generate_content(
model=model,
contents=content,
config=config_params
)
logging.info(f"GoogleProviderHandler: Response received: {response}")
self.record_success()
# After successful response, create cached content if pending
if hasattr(self, '_pending_cache_key') and self._pending_cache_key:
cache_key, cache_ttl, cache_messages = self._pending_cache_key
try:
new_cached_name = self._create_cached_content(cache_messages, model, cache_ttl)
if new_cached_name:
expiry_time = time.time() + cache_ttl
self._cached_content_refs[cache_key] = (new_cached_name, expiry_time)
logging.info(f"GoogleProviderHandler: Cached content stored: {new_cached_name}, expires in {cache_ttl}s")
except Exception as e:
logging.warning(f"GoogleProviderHandler: Failed to create cache after response: {e}")
self._pending_cache_key = None
if AISBF_DEBUG:
logging.info(f"=== RAW GOOGLE RESPONSE ===")
logging.info(f"Raw response type: {type(response)}")
logging.info(f"Raw response: {response}")
logging.info(f"Raw response dir: {dir(response)}")
logging.info(f"=== END RAW GOOGLE RESPONSE ===")
response_text = ""
tool_calls = None
finish_reason = "stop"
logging.info(f"=== GOOGLE RESPONSE PARSING START ===")
logging.info(f"Response type: {type(response)}")
logging.info(f"Response attributes: {dir(response)}")
try:
if hasattr(response, 'candidates'):
logging.info(f"Response has 'candidates' attribute")
logging.info(f"Candidates: {response.candidates}")
logging.info(f"Candidates type: {type(response.candidates)}")
logging.info(f"Candidates length: {len(response.candidates) if hasattr(response.candidates, '__len__') else 'N/A'}")
if response.candidates:
logging.info(f"Candidates is not empty, getting first candidate")
candidate = response.candidates[0]
logging.info(f"Candidate type: {type(candidate)}")
logging.info(f"Candidate attributes: {dir(candidate)}")
if hasattr(candidate, 'finish_reason'):
logging.info(f"Candidate has 'finish_reason' attribute")
logging.info(f"Finish reason: {candidate.finish_reason}")
finish_reason_map = {
'STOP': 'stop',
'MAX_TOKENS': 'length',
'SAFETY': 'content_filter',
'RECITATION': 'content_filter',
'OTHER': 'stop'
}
google_finish_reason = str(candidate.finish_reason)
finish_reason = finish_reason_map.get(google_finish_reason, 'stop')
logging.info(f"Mapped finish reason: {finish_reason}")
else:
logging.warning(f"Candidate does NOT have 'finish_reason' attribute")
if hasattr(candidate, 'content'):
logging.info(f"Candidate has 'content' attribute")
logging.info(f"Content: {candidate.content}")
logging.info(f"Content type: {type(candidate.content)}")
logging.info(f"Content attributes: {dir(candidate.content)}")
if candidate.content:
logging.info(f"Content is not empty")
if hasattr(candidate.content, 'parts'):
logging.info(f"Content has 'parts' attribute")
logging.info(f"Parts: {candidate.content.parts}")
logging.info(f"Parts type: {type(candidate.content.parts)}")
logging.info(f"Parts length: {len(candidate.content.parts) if hasattr(candidate.content.parts, '__len__') else 'N/A'}")
if candidate.content.parts:
logging.info(f"Parts is not empty, processing all parts")
text_parts = []
openai_tool_calls = []
call_id = 0
for idx, part in enumerate(candidate.content.parts):
logging.info(f"Processing part {idx}")
logging.info(f"Part type: {type(part)}")
logging.info(f"Part attributes: {dir(part)}")
if hasattr(part, 'text') and part.text:
logging.info(f"Part {idx} has 'text' attribute")
text_parts.append(part.text)
logging.info(f"Part {idx} text length: {len(part.text)}")
if hasattr(part, 'function_call') and part.function_call:
logging.info(f"Part {idx} has 'function_call' attribute")
logging.info(f"Function call: {part.function_call}")
try:
function_call = part.function_call
openai_tool_call = {
"id": f"call_{call_id}",
"type": "function",
"function": {
"name": function_call.name,
"arguments": function_call.args if hasattr(function_call, 'args') else {}
}
}
openai_tool_calls.append(openai_tool_call)
call_id += 1
logging.info(f"Converted function call to OpenAI format: {openai_tool_call}")
except Exception as e:
logging.error(f"Error converting function call: {e}", exc_info=True)
if hasattr(part, 'function_response') and part.function_response:
logging.info(f"Part {idx} has 'function_response' attribute")
logging.info(f"Function response: {part.function_response}")
response_text = "\n".join(text_parts)
logging.info(f"Combined text length: {len(response_text)}")
logging.info(f"Combined text (first 200 chars): {response_text[:200] if response_text else 'None'}")
if response_text and not openai_tool_calls:
import json
import re
google_tool_intent_patterns = [
r"(?:^|\n)\s*(?:Let(?:'s|s us)?)\s*(?:execite|execute|use|call)\s*(?:the\s+)?(\w+)\s*(?:tool)?",
r"(?:^|\n)\s*(?:I(?:'m| am)?|We(?:'re| are)?)\s*(?:going to |just )?(?:execite|execut(?:e|ed)?|use|call)\s*(?:the\s+)?(\w+)\s*(?:tool)?",
r"(?:^|\n)\s*(?:I(?:'m| am)?|We(?:'re| are)?)\s*(?:going to |just )?read\s+(?:the\s+)?file[s]?",
r"(?:^|\n)\s*Using\s+(?:the\s+)?(\w+)\s*(?:tool)?",
]
for pattern in google_tool_intent_patterns:
tool_intent_match = re.search(pattern, response_text, re.IGNORECASE)
if tool_intent_match:
tool_name = tool_intent_match.group(1).lower() if tool_intent_match.lastindex else ''
if tool_name in ['read', 'file', 'files', 'execite', '']:
tool_name = 'read'
elif tool_name in ['write', 'create']:
tool_name = 'write'
elif tool_name in ['exec', 'command', 'shell', 'run']:
tool_name = 'exec'
elif tool_name in ['edit', 'modify']:
tool_name = 'edit'
elif tool_name in ['search', 'find']:
tool_name = 'web_search'
elif tool_name in ['browser', 'browse', 'navigate']:
tool_name = 'browser'
logging.warning(f"Google model indicated intent to use '{tool_name}' tool (matched pattern: {pattern})")
params = {}
path_patterns = [
r"['\"]([^'\"]+\.md)['\"]",
r"(?:file|path)s?\s*[:=]\s*['\"]([^'\"]+)['\"]",
r"(?:path|file)\s*[:=]\s*['\"]([^'\"]+)['\"]",
r"(?:open|read)\s+['\"]([^'\"]+)['\"]",
]
for pp in path_patterns:
path_match = re.search(pp, response_text, re.IGNORECASE)
if path_match:
params['path'] = path_match.group(1)
break
offset_match = re.search(r'(?:offset|start|line\s*#?)\s*[:=]?\s*(\d+)', response_text, re.IGNORECASE)
if offset_match:
params['offset'] = int(offset_match.group(1))
limit_match = re.search(r'(?:limit|lines?|count)\s*[:=]?\s*(\d+)', response_text, re.IGNORECASE)
if limit_match:
params['limit'] = int(limit_match.group(1))
if tool_name == 'exec':
cmd_patterns = [
r"(?:command|cmd|run)\s*[:=]?\s*['\"]([^'\"]+)['\"]",
r"(?:run|execute)\s+(?:command\s+)?(['\"]([^'\"]+)['\"]|\S+)",
]
for cp in cmd_patterns:
cmd_match = re.search(cp, response_text, re.IGNORECASE)
if cmd_match:
params['command'] = cmd_match.group(1) or cmd_match.group(2)
break
if params:
openai_tool_calls.append({
"id": f"call_{call_id}",
"type": "function",
"function": {
"name": tool_name,
"arguments": json.dumps(params)
}
})
call_id += 1
logging.info(f"Converted Google tool intent to OpenAI format: {openai_tool_calls[-1]}")
break
if response_text and not openai_tool_calls:
import json
import re
outer_assistant_pattern = r"^assistant:\s*(\[.*\])\s*$"
outer_assistant_match = re.match(outer_assistant_pattern, response_text.strip(), re.DOTALL)
if outer_assistant_match:
try:
outer_content = json.loads(outer_assistant_match.group(1))
if isinstance(outer_content, list) and len(outer_content) > 0:
for item in outer_content:
if isinstance(item, dict) and item.get('type') == 'text':
inner_text = item.get('text', '')
inner_tool_pattern = r'tool:\s*(\{.*?\})\s*(?:assistant:\s*(\[.*\]))?\s*$'
inner_tool_match = re.search(inner_tool_pattern, inner_text, re.DOTALL)
if inner_tool_match:
tool_json_str = inner_tool_match.group(1)
try:
tool_start = inner_text.find('tool:')
if tool_start != -1:
json_start = inner_text.find('{', tool_start)
brace_count = 0
json_end = json_start
for i, c in enumerate(inner_text[json_start:], json_start):
if c == '{':
brace_count += 1
elif c == '}':
brace_count -= 1
if brace_count == 0:
json_end = i + 1
break
tool_json_str = inner_text[json_start:json_end]
parsed_tool = json.loads(tool_json_str)
openai_tool_call = {
"id": f"call_{call_id}",
"type": "function",
"function": {
"name": parsed_tool.get('action', parsed_tool.get('name', 'unknown')),
"arguments": json.dumps({k: v for k, v in parsed_tool.items() if k not in ['action', 'name']})
}
}
openai_tool_calls.append(openai_tool_call)
call_id += 1
logging.info(f"Converted nested 'tool:' format to OpenAI tool_calls: {openai_tool_call}")
if inner_tool_match.group(2):
try:
final_assistant = json.loads(inner_tool_match.group(2))
if isinstance(final_assistant, list) and len(final_assistant) > 0:
for final_item in final_assistant:
if isinstance(final_item, dict) and final_item.get('type') == 'text':
response_text = final_item.get('text', '')
break
else:
response_text = ""
else:
response_text = ""
except json.JSONDecodeError:
response_text = ""
else:
response_text = ""
except (json.JSONDecodeError, Exception) as e:
logging.debug(f"Failed to parse nested tool JSON: {e}")
break
except (json.JSONDecodeError, Exception) as e:
logging.debug(f"Failed to parse outer assistant format: {e}")
elif not openai_tool_calls:
tool_pattern = r'tool:\s*(\{[^}]*\})'
tool_match = re.search(tool_pattern, response_text, re.DOTALL)
try:
tool_json_str = tool_match.group(1)
parsed_json = json.loads(tool_json_str)
logging.info(f"Detected 'tool:' format in text content: {parsed_json}")
openai_tool_call = {
"id": f"call_{call_id}",
"type": "function",
"function": {
"name": parsed_json.get('action', parsed_json.get('name', 'unknown')),
"arguments": json.dumps({k: v for k, v in parsed_json.items() if k not in ['action', 'name']})
}
}
openai_tool_calls.append(openai_tool_call)
call_id += 1
logging.info(f"Converted 'tool:' format to OpenAI tool_calls: {openai_tool_call}")
assistant_pattern = r"assistant:\s*(\[.*\])"
assistant_match = re.search(assistant_pattern, response_text, re.DOTALL)
if assistant_match:
try:
assistant_content = json.loads(assistant_match.group(1))
if isinstance(assistant_content, list) and len(assistant_content) > 0:
for item in assistant_content:
if isinstance(item, dict) and item.get('type') == 'text':
response_text = item.get('text', '')
break
else:
response_text = ""
else:
response_text = ""
except json.JSONDecodeError:
response_text = ""
else:
response_text = ""
except (json.JSONDecodeError, Exception) as e:
logging.debug(f"Failed to parse 'tool:' format: {e}")
elif content_assistant_match:
try:
tool_content = content_assistant_match.group(1)
assistant_json_str = content_assistant_match.group(2)
logging.info(f"Detected 'content/assistant:' format - tool content length: {len(tool_content)}")
openai_tool_call = {
"id": f"call_{call_id}",
"type": "function",
"function": {
"name": "write",
"arguments": json.dumps({"content": tool_content})
}
}
openai_tool_calls.append(openai_tool_call)
call_id += 1
logging.info(f"Converted 'content/assistant:' format to OpenAI tool_calls")
try:
assistant_content = json.loads(assistant_json_str)
if isinstance(assistant_content, list) and len(assistant_content) > 0:
for item in assistant_content:
if isinstance(item, dict) and item.get('type') == 'text':
response_text = item.get('text', '')
break
else:
response_text = ""
else:
response_text = ""
except json.JSONDecodeError:
response_text = ""
except Exception as e:
logging.debug(f"Failed to parse 'content/assistant:' format: {e}")
elif not openai_tool_calls:
try:
parsed_json = json.loads(response_text.strip())
if isinstance(parsed_json, dict):
if 'action' in parsed_json or 'function' in parsed_json or 'name' in parsed_json:
if 'action' in parsed_json:
openai_tool_call = {
"id": f"call_{call_id}",
"type": "function",
"function": {
"name": parsed_json.get('action', 'unknown'),
"arguments": json.dumps({k: v for k, v in parsed_json.items() if k != 'action'})
}
}
openai_tool_calls.append(openai_tool_call)
call_id += 1
logging.info(f"Detected tool call in text content: {parsed_json}")
response_text = ""
elif 'function' in parsed_json or 'name' in parsed_json:
openai_tool_call = {
"id": f"call_{call_id}",
"type": "function",
"function": {
"name": parsed_json.get('name', parsed_json.get('function', 'unknown')),
"arguments": json.dumps(parsed_json.get('arguments', parsed_json.get('parameters', {})))
}
}
openai_tool_calls.append(openai_tool_call)
call_id += 1
logging.info(f"Detected tool call in text content: {parsed_json}")
response_text = ""
except (json.JSONDecodeError, Exception) as e:
logging.debug(f"Response text is not valid JSON: {e}")
if openai_tool_calls:
tool_calls = openai_tool_calls
logging.info(f"Total tool calls: {len(tool_calls)}")
for tc in tool_calls:
logging.info(f" - {tc}")
else:
logging.info(f"No tool calls found")
else:
logging.error(f"Parts is empty")
else:
logging.error(f"Content does NOT have 'parts' attribute")
else:
logging.error(f"Content is empty")
else:
logging.error(f"Candidate does NOT have 'content' attribute")
else:
logging.error(f"Candidates is empty")
else:
logging.error(f"Response does NOT have 'candidates' attribute")
logging.info(f"Final response_text length: {len(response_text)}")
logging.info(f"Final response_text (first 200 chars): {response_text[:200] if response_text else 'None'}")
logging.info(f"Final tool_calls: {tool_calls}")
logging.info(f"Final finish_reason: {finish_reason}")
except Exception as e:
logging.error(f"GoogleProviderHandler: Exception during response parsing: {e}", exc_info=True)
response_text = ""
logging.info(f"=== GOOGLE RESPONSE PARSING END ===")
prompt_tokens = 0
completion_tokens = 0
total_tokens = 0
try:
if hasattr(response, 'usage_metadata') and response.usage_metadata:
usage_metadata = response.usage_metadata
prompt_tokens = getattr(usage_metadata, 'prompt_token_count', 0)
completion_tokens = getattr(usage_metadata, 'candidates_token_count', 0)
total_tokens = getattr(usage_metadata, 'total_token_count', 0)
logging.info(f"GoogleProviderHandler: Usage metadata - prompt: {prompt_tokens}, completion: {completion_tokens}, total: {total_tokens}")
except Exception as e:
logging.warning(f"GoogleProviderHandler: Could not extract usage metadata: {e}")
openai_response = {
"id": f"google-{model}-{int(time.time())}",
"object": "chat.completion",
"created": int(time.time()),
"model": f"{self.provider_id}/{model}",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": response_text if response_text else None
},
"finish_reason": finish_reason
}],
"usage": {
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": total_tokens
}
}
if tool_calls:
openai_response["choices"][0]["message"]["tool_calls"] = tool_calls
openai_response["choices"][0]["message"]["content"] = None
logging.info(f"Added tool_calls to response message")
logging.info(f"=== FINAL OPENAI RESPONSE STRUCTURE ===")
logging.info(f"Response type: {type(openai_response)}")
logging.info(f"Response keys: {openai_response.keys()}")
logging.info(f"Response id: {openai_response['id']}")
logging.info(f"Response object: {openai_response['object']}")
logging.info(f"Response created: {openai_response['created']}")
logging.info(f"Response model: {openai_response['model']}")
logging.info(f"Response choices count: {len(openai_response['choices'])}")
logging.info(f"Response choices[0] index: {openai_response['choices'][0]['index']}")
logging.info(f"Response choices[0] message role: {openai_response['choices'][0]['message']['role']}")
logging.info(f"Response choices[0] message content length: {len(openai_response['choices'][0]['message']['content'])}")
logging.info(f"Response choices[0] message content (first 200 chars): {openai_response['choices'][0]['message']['content'][:200]}")
logging.info(f"Response choices[0] finish_reason: {openai_response['choices'][0]['finish_reason']}")
logging.info(f"Response usage: {openai_response['usage']}")
logging.info(f"=== END FINAL OPENAI RESPONSE STRUCTURE ===")
logging.info(f"GoogleProviderHandler: Returning response dict (no validation)")
logging.info(f"Response dict keys: {openai_response.keys()}")
if AISBF_DEBUG:
logging.info(f"=== FINAL GOOGLE RESPONSE DICT ===")
logging.info(f"Final response: {openai_response}")
logging.info(f"=== END FINAL GOOGLE RESPONSE DICT ===")
return openai_response
except Exception as e:
import logging
logging.error(f"GoogleProviderHandler: Error: {str(e)}", exc_info=True)
self.record_failure()
raise e
async def get_models(self) -> List[Model]:
try:
import logging
logging.info("GoogleProviderHandler: Getting models list")
await self.apply_rate_limit()
models = self.client.models.list()
logging.info(f"GoogleProviderHandler: Models received: {models}")
result = []
for model in models:
context_size = None
if hasattr(model, 'context_window') and model.context_window:
context_size = model.context_window
elif hasattr(model, 'context_length') and model.context_length:
context_size = model.context_length
elif hasattr(model, 'max_context_length') and model.max_context_length:
context_size = model.max_context_length
result.append(Model(
id=model.name,
name=model.display_name or model.name,
provider_id=self.provider_id,
context_size=context_size,
context_length=context_size
))
return result
except Exception as e:
import logging
logging.error(f"GoogleProviderHandler: Error getting models: {str(e)}", exc_info=True)
raise e
def _generate_cache_key(self, messages: List[Dict], model: str) -> str:
"""Generate a cache key based on the early messages."""
import hashlib
import json
cacheable_messages = []
for i, msg in enumerate(messages):
if msg.get('role') == 'system' or i < max(0, len(messages) - 3):
cacheable_messages.append({
'role': msg.get('role'),
'content': msg.get('content', '')[:1000]
})
cache_data = json.dumps({
'model': model,
'messages': cacheable_messages
}, sort_keys=True)
return hashlib.sha256(cache_data.encode()).hexdigest()[:32]
def _create_cached_content(self, messages: List[Dict], model: str, cache_ttl: int) -> Optional[str]:
"""Create a cached content object in Google API."""
import logging
try:
cacheable_parts = []
for i, msg in enumerate(messages):
if msg.get('role') == 'system' or i < max(0, len(messages) - 3):
role = msg.get('role', 'user')
content = msg.get('content', '')
cacheable_parts.append(f"{role}: {content}")
if not cacheable_parts:
logging.info("GoogleProviderHandler: No cacheable content to create")
return None
cached_content_text = "\n\n".join(cacheable_parts)
cache_name = f"cached_content_{int(time.time())}"
logging.info(f"GoogleProviderHandler: Creating cached content: {cache_name}")
logging.info(f"GoogleProviderHandler: Cached content length: {len(cached_content_text)} chars")
from google.genai import types as genai_types
try:
cached_content = self.client.cached_contents.create(
model=model,
display_name=cache_name,
system_instruction=cached_content_text,
ttl=f"{cache_ttl}s"
)
logging.info(f"GoogleProviderHandler: Cached content created: {cached_content.name}")
return cached_content.name
except AttributeError as e:
logging.info(f"GoogleProviderHandler: Cached content API not available in this SDK: {e}")
return None
except Exception as e:
logging.warning(f"GoogleProviderHandler: Failed to create cached content: {e}")
return None
except Exception as e:
logging.error(f"GoogleProviderHandler: Error creating cached content: {e}")
return None
def _use_cached_content_in_request(self, cached_content_name: str, model: str,
last_messages: List[Dict], max_tokens: Optional[int],
temperature: float, tools: Optional[List[Dict]]) -> Union[Dict, object]:
"""Make a request using cached content."""
import logging
from google.genai import types as genai_types
logging.info(f"GoogleProviderHandler: Using cached content: {cached_content_name}")
content = "\n\n".join([f"{msg['role']}: {msg['content']}" for msg in last_messages])
config_params = {"temperature": temperature}
if max_tokens is not None:
config_params["max_output_tokens"] = max_tokens
if tools:
function_declarations = []
for tool in tools:
if tool.get("type") == "function":
function = tool.get("function", {})
function_declaration = genai_types.FunctionDeclaration(
name=function.get("name"),
description=function.get("description", ""),
parameters=function.get("parameters", {})
)
function_declarations.append(function_declaration)
if function_declarations:
google_tools = genai_types.Tool(function_declarations=function_declarations)
config_params["tools"] = google_tools
response = self.client.models.generate_content(
model=model,
contents=content,
config=config_params,
cached_content=cached_content_name
)
return response
"""
Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
AISBF - AI Service Broker Framework || AI Should Be Free
Kilo Gateway (OpenAI-compatible with OAuth2) provider handler.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Why did the programmer quit his job? Because he didn't get arrays!
"""
import httpx
import time
from typing import Dict, List, Optional, Union
from openai import OpenAI
from ..models import Model
from ..config import config
from .base import BaseProviderHandler, AISBF_DEBUG
class KiloProviderHandler(BaseProviderHandler):
"""
Handler for Kilo Gateway (OpenAI-compatible with OAuth2 support).
"""
def __init__(self, provider_id: str, api_key: Optional[str] = None):
super().__init__(provider_id, api_key)
self.provider_config = config.get_provider(provider_id)
kilo_config = getattr(self.provider_config, 'kilo_config', None)
credentials_file = None
api_base = None
if kilo_config and isinstance(kilo_config, dict):
credentials_file = kilo_config.get('credentials_file')
api_base = kilo_config.get('api_base')
from ..auth.kilo import KiloOAuth2
self.oauth2 = KiloOAuth2(credentials_file=credentials_file, api_base=api_base)
configured_endpoint = getattr(self.provider_config, 'endpoint', None)
if configured_endpoint:
endpoint = configured_endpoint.rstrip('/')
if not endpoint.endswith('/v1'):
endpoint = endpoint + '/v1'
else:
endpoint = 'https://kilo.ai/api/openrouter/v1'
self._kilo_endpoint = endpoint
self.client = OpenAI(base_url=endpoint, api_key=api_key or "placeholder")
async def _ensure_authenticated(self) -> str:
"""Ensure user is authenticated and return valid token."""
import logging
logger = logging.getLogger(__name__)
token = self.oauth2.get_valid_token()
if token:
logger.info("KiloProviderHandler: Using existing OAuth2 token")
return token
if self.api_key and self.api_key != "placeholder":
logger.info("KiloProviderHandler: Using API key authentication")
return self.api_key
logger.info("KiloProviderHandler: No valid token, initiating OAuth2 flow")
result = await self.oauth2.authenticate_with_device_flow()
if result.get("type") == "success":
token = result.get("token")
logger.info(f"KiloProviderHandler: OAuth2 authentication successful")
return token
raise Exception("OAuth2 authentication failed")
async def handle_request(self, model: str, messages: List[Dict], max_tokens: Optional[int] = None,
temperature: Optional[float] = 1.0, stream: Optional[bool] = False,
tools: Optional[List[Dict]] = None, tool_choice: Optional[Union[str, Dict]] = None) -> Union[Dict, object]:
if self.is_rate_limited():
raise Exception("Provider rate limited")
try:
import logging
import json
logging.info(f"KiloProviderHandler: Handling request for model {model}")
if AISBF_DEBUG:
logging.info(f"KiloProviderHandler: Messages: {messages}")
logging.info(f"KiloProviderHandler: Tools: {tools}")
else:
logging.info(f"KiloProviderHandler: Messages count: {len(messages)}")
logging.info(f"KiloProviderHandler: Tools count: {len(tools) if tools else 0}")
token = await self._ensure_authenticated()
self.client.api_key = token
await self.apply_rate_limit()
request_params = {
"model": model,
"messages": [],
"temperature": temperature,
"stream": stream
}
if max_tokens is not None:
request_params["max_tokens"] = max_tokens
for msg in messages:
message = {"role": msg["role"]}
if msg["role"] == "tool":
if "tool_call_id" in msg and msg["tool_call_id"] is not None:
message["tool_call_id"] = msg["tool_call_id"]
else:
logging.warning(f"Skipping tool message without tool_call_id: {msg}")
continue
if "content" in msg and msg["content"] is not None:
message["content"] = msg["content"]
if "tool_calls" in msg and msg["tool_calls"] is not None:
message["tool_calls"] = msg["tool_calls"]
if "name" in msg and msg["name"] is not None:
message["name"] = msg["name"]
request_params["messages"].append(message)
if tools is not None:
request_params["tools"] = tools
if tool_choice is not None:
request_params["tool_choice"] = tool_choice
if stream:
logging.info(f"KiloProviderHandler: Using async httpx streaming mode")
return await self._handle_streaming_request(request_params, token, model)
response = self.client.chat.completions.create(**request_params)
logging.info(f"KiloProviderHandler: Response received: {response}")
self.record_success()
if AISBF_DEBUG:
logging.info(f"=== RAW KILO RESPONSE ===")
logging.info(f"Raw response type: {type(response)}")
logging.info(f"Raw response: {response}")
logging.info(f"=== END RAW KILO RESPONSE ===")
logging.info(f"KiloProviderHandler: Returning raw response without parsing")
return response
except Exception as e:
import logging
logging.error(f"KiloProviderHandler: Error: {str(e)}", exc_info=True)
self.record_failure()
raise e
async def _handle_streaming_request(self, request_params: Dict, token: str, model: str):
"""Handle streaming request to Kilo API using httpx async streaming."""
import logging
import json
logger = logging.getLogger(__name__)
logger.info(f"KiloProviderHandler: Starting async streaming request to {self._kilo_endpoint}")
api_url = f"{self._kilo_endpoint}/chat/completions"
headers = {
'Authorization': f'Bearer {token}',
'Content-Type': 'application/json',
'Accept': 'text/event-stream',
}
if AISBF_DEBUG:
logger.info(f"=== KILO STREAMING REQUEST DETAILS ===")
logger.info(f"URL: {api_url}")
logger.info(f"Payload: {json.dumps(request_params, indent=2)}")
logger.info(f"=== END KILO STREAMING REQUEST DETAILS ===")
streaming_client = httpx.AsyncClient(timeout=httpx.Timeout(300.0, connect=30.0))
try:
request = streaming_client.build_request("POST", api_url, headers=headers, json=request_params)
response = await streaming_client.send(request, stream=True)
logger.info(f"KiloProviderHandler: Streaming response status: {response.status_code}")
if response.status_code >= 400:
error_text = await response.aread()
await response.aclose()
await streaming_client.aclose()
logger.error(f"KiloProviderHandler: Streaming error response: {error_text}")
try:
error_json = json.loads(error_text)
error_message = error_json.get('error', {}).get('message', 'Unknown error') if isinstance(error_json.get('error'), dict) else str(error_json.get('error', 'Unknown error'))
except (json.JSONDecodeError, Exception):
error_message = error_text.decode('utf-8') if isinstance(error_text, bytes) else str(error_text)
if response.status_code == 429:
self.handle_429_error(
error_json if 'error_json' in locals() else error_message,
dict(response.headers)
)
self.record_failure()
raise Exception(f"Kilo API streaming error ({response.status_code}): {error_message}")
except Exception:
await streaming_client.aclose()
raise
return self._stream_kilo_response(streaming_client, response, model)
async def _stream_kilo_response(self, streaming_client, response, model: str):
"""Yield SSE chunks from an already-validated Kilo streaming response."""
import logging
import json
logger = logging.getLogger(__name__)
try:
async for line in response.aiter_lines():
if not line:
continue
if line.startswith('data: '):
data_str = line[6:]
if data_str.strip() == '[DONE]':
yield b"data: [DONE]\n\n"
break
try:
chunk_data = json.loads(data_str)
yield f"data: {json.dumps(chunk_data, ensure_ascii=False)}\n\n".encode('utf-8')
except json.JSONDecodeError as e:
logger.warning(f"KiloProviderHandler: Failed to parse streaming chunk: {e}")
continue
elif line.startswith(':'):
continue
logger.info(f"KiloProviderHandler: Streaming completed successfully")
self.record_success()
finally:
await response.aclose()
await streaming_client.aclose()
async def get_models(self) -> List[Model]:
try:
import logging
import json
logging.info("KiloProviderHandler: Getting models list")
token = await self._ensure_authenticated()
await self.apply_rate_limit()
base_endpoint = self._kilo_endpoint.rstrip('/')
if base_endpoint.endswith('/v1'):
models_url = base_endpoint[:-3] + '/models'
else:
models_url = base_endpoint + '/models'
logging.info(f"KiloProviderHandler: Fetching models from {models_url}")
headers = {
'Authorization': f'Bearer {token}',
'Content-Type': 'application/json',
}
async with httpx.AsyncClient(timeout=httpx.Timeout(30.0, connect=10.0)) as client:
response = await client.get(models_url, headers=headers)
logging.info(f"KiloProviderHandler: Models response status: {response.status_code}")
if response.status_code != 200:
logging.warning(f"KiloProviderHandler: Models endpoint returned {response.status_code}")
try:
error_body = response.json()
logging.warning(f"KiloProviderHandler: Error response: {error_body}")
except Exception:
logging.warning(f"KiloProviderHandler: Error response (text): {response.text[:200]}")
response.raise_for_status()
models_data = response.json()
logging.info(f"KiloProviderHandler: Models received: {models_data}")
models_list = models_data.get('data', []) if isinstance(models_data, dict) else models_data
result = []
for model_entry in models_list:
if isinstance(model_entry, dict):
model_id = model_entry.get('id', '')
model_name = model_entry.get('name', model_id) or model_id
context_size = (
model_entry.get('context_window') or
model_entry.get('context_length') or
model_entry.get('max_context_length')
)
if model_id:
result.append(Model(
id=model_id,
name=model_name,
provider_id=self.provider_id,
context_size=context_size,
context_length=context_size
))
elif hasattr(model_entry, 'id'):
context_size = None
if hasattr(model_entry, 'context_window') and model_entry.context_window:
context_size = model_entry.context_window
elif hasattr(model_entry, 'context_length') and model_entry.context_length:
context_size = model_entry.context_length
result.append(Model(
id=model_entry.id,
name=model_entry.id,
provider_id=self.provider_id,
context_size=context_size,
context_length=context_size
))
logging.info(f"KiloProviderHandler: Parsed {len(result)} models")
return result
except Exception as e:
import logging
logging.error(f"KiloProviderHandler: Error getting models: {str(e)}", exc_info=True)
raise e
"""
Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
AISBF - AI Service Broker Framework || AI Should Be Free
Ollama provider handler.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Why did the programmer quit his job? Because he didn't get arrays!
"""
import httpx
import time
from typing import Dict, List, Optional, Union
from ..models import Model
from ..config import config
from .base import BaseProviderHandler, AISBF_DEBUG
class OllamaProviderHandler(BaseProviderHandler):
def __init__(self, provider_id: str, api_key: Optional[str] = None):
super().__init__(provider_id, api_key)
timeout = httpx.Timeout(
connect=60.0,
read=300.0,
write=60.0,
pool=60.0
)
self.client = httpx.AsyncClient(base_url=config.providers[provider_id].endpoint, timeout=timeout)
async def handle_request(self, model: str, messages: List[Dict], max_tokens: Optional[int] = None,
temperature: Optional[float] = 1.0, stream: Optional[bool] = False,
tools: Optional[List[Dict]] = None, tool_choice: Optional[Union[str, Dict]] = None) -> Dict:
"""
Handle request for Ollama provider.
Note: Ollama doesn't support tools/tool_choice, so these parameters are accepted but ignored.
"""
import logging
import json
logger = logging.getLogger(__name__)
logger.info(f"=== OllamaProviderHandler.handle_request START ===")
logger.info(f"Provider ID: {self.provider_id}")
logger.info(f"Endpoint: {self.client.base_url}")
logger.info(f"Model: {model}")
logger.info(f"Messages count: {len(messages)}")
logger.info(f"Max tokens: {max_tokens}")
logger.info(f"Temperature: {temperature}")
logger.info(f"Stream: {stream}")
logger.info(f"API key provided: {bool(self.api_key)}")
if self.is_rate_limited():
logger.error("Provider is rate limited")
raise Exception("Provider rate limited")
try:
logger.info("Testing Ollama connection...")
try:
health_response = await self.client.get("/api/tags", timeout=10.0)
logger.info(f"Ollama health check passed: {health_response.status_code}")
logger.info(f"Available models: {health_response.json().get('models', [])}")
except Exception as e:
logger.error(f"Ollama health check failed: {str(e)}")
logger.error(f"Cannot connect to Ollama at {self.client.base_url}")
logger.error(f"Please ensure Ollama is running and accessible")
raise Exception(f"Cannot connect to Ollama at {self.client.base_url}: {str(e)}")
logger.info("Applying rate limiting...")
await self.apply_rate_limit()
logger.info("Rate limiting applied")
prompt = "\n\n".join([f"{msg['role']}: {msg['content']}" for msg in messages])
logger.info(f"Prompt length: {len(prompt)} characters")
options = {"temperature": temperature}
if max_tokens is not None:
options["num_predict"] = max_tokens
request_data = {
"model": model,
"prompt": prompt,
"options": options,
"stream": False
}
headers = {}
if self.api_key:
headers["Authorization"] = f"Bearer {self.api_key}"
logger.info("API key added to request headers for Ollama cloud")
logger.info(f"Sending POST request to {self.client.base_url}/api/generate")
logger.info(f"Request data: {request_data}")
logger.info(f"Request headers: {headers}")
logger.info(f"Client timeout: {self.client.timeout}")
response = await self.client.post("/api/generate", json=request_data, headers=headers)
logger.info(f"Response status code: {response.status_code}")
logger.info(f"Response content type: {response.headers.get('content-type')}")
logger.info(f"Response content length: {len(response.content)} bytes")
logger.info(f"Raw response content (first 500 chars): {response.text[:500]}")
if response.status_code == 429:
try:
response_data = response.json()
except Exception:
response_data = response.text
self.handle_429_error(response_data, dict(response.headers))
response.raise_for_status()
response.raise_for_status()
content = response.text
logger.info(f"Attempting to parse response as JSON...")
try:
response_json = response.json()
logger.info(f"Response parsed as single JSON: {response_json}")
except json.JSONDecodeError as e:
logger.warning(f"Failed to parse as single JSON: {e}")
logger.info(f"Attempting to parse as multiple JSON objects...")
responses = []
for line in content.strip().split('\n'):
if line.strip():
try:
obj = json.loads(line)
responses.append(obj)
except json.JSONDecodeError as line_error:
logger.error(f"Failed to parse line: {line}")
logger.error(f"Error: {line_error}")
if not responses:
raise Exception("No valid JSON objects found in response")
response_json = responses[-1]
logger.info(f"Parsed {len(responses)} JSON objects, using last one: {response_json}")
logger.info(f"Final response: {response_json}")
self.record_success()
if AISBF_DEBUG:
logging.info(f"=== RAW OLLAMA RESPONSE ===")
logging.info(f"Raw response JSON: {response_json}")
logging.info(f"=== END RAW OLLAMA RESPONSE ===")
logger.info(f"=== OllamaProviderHandler.handle_request END ===")
openai_response = {
"id": f"ollama-{model}-{int(time.time())}",
"object": "chat.completion",
"created": int(time.time()),
"model": f"{self.provider_id}/{model}",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": response_json.get("response", "")
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": response_json.get("prompt_eval_count", 0),
"completion_tokens": response_json.get("eval_count", 0),
"total_tokens": response_json.get("prompt_eval_count", 0) + response_json.get("eval_count", 0)
}
}
if AISBF_DEBUG:
logging.info(f"=== FINAL OLLAMA RESPONSE DICT ===")
logging.info(f"Final response: {openai_response}")
logging.info(f"=== END FINAL OLLAMA RESPONSE DICT ===")
return openai_response
except Exception as e:
self.record_failure()
raise e
async def get_models(self) -> List[Model]:
await self.apply_rate_limit()
response = await self.client.get("/api/tags")
response.raise_for_status()
models = response.json().get('models', [])
return [Model(id=model, name=model, provider_id=self.provider_id) for model in models]
"""
Copyright (C) 2026 Stefy Lanza <stefy@nexlab.net>
AISBF - AI Service Broker Framework || AI Should Be Free
OpenAI provider handler.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Why did the programmer quit his job? Because he didn't get arrays!
"""
import time
from typing import Dict, List, Optional, Union
from openai import OpenAI
from ..models import Model
from ..config import config
from ..utils import count_messages_tokens
from .base import BaseProviderHandler, AISBF_DEBUG
class OpenAIProviderHandler(BaseProviderHandler):
def __init__(self, provider_id: str, api_key: str):
super().__init__(provider_id, api_key)
self.client = OpenAI(base_url=config.providers[provider_id].endpoint, api_key=api_key)
async def handle_request(self, model: str, messages: List[Dict], max_tokens: Optional[int] = None,
temperature: Optional[float] = 1.0, stream: Optional[bool] = False,
tools: Optional[List[Dict]] = None, tool_choice: Optional[Union[str, Dict]] = None) -> Union[Dict, object]:
if self.is_rate_limited():
raise Exception("Provider rate limited")
try:
import logging
logging.info(f"OpenAIProviderHandler: Handling request for model {model}")
if AISBF_DEBUG:
logging.info(f"OpenAIProviderHandler: Messages: {messages}")
else:
logging.info(f"OpenAIProviderHandler: Messages count: {len(messages)}")
if AISBF_DEBUG:
logging.info(f"OpenAIProviderHandler: Tools: {tools}")
logging.info(f"OpenAIProviderHandler: Tool choice: {tool_choice}")
# Apply rate limiting
await self.apply_rate_limit()
# Check if native caching is enabled for this provider
provider_config = config.providers.get(self.provider_id)
enable_native_caching = getattr(provider_config, 'enable_native_caching', False)
min_cacheable_tokens = getattr(provider_config, 'min_cacheable_tokens', 1024)
prompt_cache_key = getattr(provider_config, 'prompt_cache_key', None)
logging.info(f"OpenAIProviderHandler: Native caching enabled: {enable_native_caching}")
if enable_native_caching:
logging.info(f"OpenAIProviderHandler: Min cacheable tokens: {min_cacheable_tokens}, prompt_cache_key: {prompt_cache_key}")
# Build request parameters
request_params = {
"model": model,
"messages": [],
"temperature": temperature,
"stream": stream
}
# Only add max_tokens if it's not None
if max_tokens is not None:
request_params["max_tokens"] = max_tokens
# Add prompt_cache_key if provided (for OpenAI's load balancer routing optimization)
if enable_native_caching and prompt_cache_key:
request_params["prompt_cache_key"] = prompt_cache_key
logging.info(f"OpenAIProviderHandler: Added prompt_cache_key to request")
# Build messages with all fields (including tool_calls, tool_call_id, and cache_control)
if enable_native_caching:
# Count cumulative tokens for cache decision
cumulative_tokens = 0
for i, msg in enumerate(messages):
# Count tokens in this message
message_tokens = count_messages_tokens([msg], model)
cumulative_tokens += message_tokens
message = {"role": msg["role"]}
# For tool role, tool_call_id is required
if msg["role"] == "tool":
if "tool_call_id" in msg and msg["tool_call_id"] is not None:
message["tool_call_id"] = msg["tool_call_id"]
else:
# Skip tool messages without tool_call_id
logging.warning(f"Skipping tool message without tool_call_id: {msg}")
continue
if "content" in msg and msg["content"] is not None:
message["content"] = msg["content"]
if "tool_calls" in msg and msg["tool_calls"] is not None:
message["tool_calls"] = msg["tool_calls"]
if "name" in msg and msg["name"] is not None:
message["name"] = msg["name"]
# Apply cache_control based on position and token count
if (msg["role"] == "system" or
(i < len(messages) - 2 and cumulative_tokens >= min_cacheable_tokens)):
message["cache_control"] = {"type": "ephemeral"}
logging.info(f"OpenAIProviderHandler: Applied cache_control to message {i} ({message_tokens} tokens, cumulative: {cumulative_tokens})")
else:
logging.info(f"OpenAIProviderHandler: Not caching message {i} ({message_tokens} tokens, cumulative: {cumulative_tokens})")
request_params["messages"].append(message)
else:
# Standard message formatting without caching
for msg in messages:
message = {"role": msg["role"]}
# For tool role, tool_call_id is required
if msg["role"] == "tool":
if "tool_call_id" in msg and msg["tool_call_id"] is not None:
message["tool_call_id"] = msg["tool_call_id"]
else:
# Skip tool messages without tool_call_id
logging.warning(f"Skipping tool message without tool_call_id: {msg}")
continue
if "content" in msg and msg["content"] is not None:
message["content"] = msg["content"]
if "tool_calls" in msg and msg["tool_calls"] is not None:
message["tool_calls"] = msg["tool_calls"]
if "name" in msg and msg["name"] is not None:
message["name"] = msg["name"]
request_params["messages"].append(message)
# Add tools and tool_choice if provided
if tools is not None:
request_params["tools"] = tools
if tool_choice is not None:
request_params["tool_choice"] = tool_choice
response = self.client.chat.completions.create(**request_params)
logging.info(f"OpenAIProviderHandler: Response received: {response}")
self.record_success()
# Dump raw response if AISBF_DEBUG is enabled
if AISBF_DEBUG:
logging.info(f"=== RAW OPENAI RESPONSE ===")
logging.info(f"Raw response type: {type(response)}")
logging.info(f"Raw response: {response}")
logging.info(f"=== END RAW OPENAI RESPONSE ===")
# Return raw response without any parsing or modification
logging.info(f"OpenAIProviderHandler: Returning raw response without parsing")
return response
except Exception as e:
import logging
logging.error(f"OpenAIProviderHandler: Error: {str(e)}", exc_info=True)
self.record_failure()
raise e
async def get_models(self) -> List[Model]:
try:
import logging
logging.info("OpenAIProviderHandler: Getting models list")
# Apply rate limiting
await self.apply_rate_limit()
models = self.client.models.list()
logging.info(f"OpenAIProviderHandler: Models received: {models}")
result = []
for model in models:
# Extract context size if available - check multiple field names
context_size = None
if hasattr(model, 'context_window') and model.context_window:
context_size = model.context_window
elif hasattr(model, 'context_length') and model.context_length:
context_size = model.context_length
elif hasattr(model, 'max_context_length') and model.max_context_length:
context_size = model.max_context_length
# Extract pricing if available (OpenRouter-style)
pricing = None
if hasattr(model, 'pricing') and model.pricing:
pricing = model.pricing
elif hasattr(model, 'top_provider') and model.top_provider:
# Try to extract from top_provider
top_provider = model.top_provider
if hasattr(top_provider, 'dict'):
top_provider = top_provider.dict()
if isinstance(top_provider, dict):
# Check for pricing in top_provider
tp_pricing = top_provider.get('pricing')
if tp_pricing:
pricing = tp_pricing
result.append(Model(
id=model.id,
name=model.id,
provider_id=self.provider_id,
context_size=context_size,
context_length=context_size,
pricing=pricing
))
return result
except Exception as e:
import logging
logging.error(f"OpenAIProviderHandler: Error getting models: {str(e)}", exc_info=True)
raise e
......@@ -49,6 +49,7 @@ Documentation = "https://git.nexlab.net/nexlab/aisbf.git"
[tool.setuptools]
packages = ["aisbf", "aisbf.auth", "aisbf.providers", "aisbf.providers.kiro"]
# Note: Provider handler modules (base, google, openai, anthropic, claude, kilo, ollama) are in aisbf.providers package
py-modules = ["cli"]
[tool.setuptools.package-data]
......
......@@ -99,6 +99,13 @@ setup(
'aisbf/config.py',
'aisbf/models.py',
'aisbf/providers/__init__.py',
'aisbf/providers/base.py',
'aisbf/providers/google.py',
'aisbf/providers/openai.py',
'aisbf/providers/anthropic.py',
'aisbf/providers/claude.py',
'aisbf/providers/kilo.py',
'aisbf/providers/ollama.py',
'aisbf/handlers.py',
'aisbf/context.py',
'aisbf/utils.py',
......@@ -107,6 +114,8 @@ setup(
'aisbf/tor.py',
'aisbf/auth/__init__.py',
'aisbf/auth/kiro.py',
'aisbf/auth/claude.py',
'aisbf/auth/kilo.py',
'aisbf/providers/kiro/__init__.py',
'aisbf/providers/kiro/handler.py',
'aisbf/providers/kiro/converters.py',
......@@ -114,7 +123,6 @@ setup(
'aisbf/providers/kiro/models.py',
'aisbf/providers/kiro/parsers.py',
'aisbf/providers/kiro/utils.py',
'aisbf/claude_auth.py',
'aisbf/semantic_classifier.py',
'aisbf/batching.py',
'aisbf/cache.py',
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment