Release v0.9.2: Documentation updates and version bump

- Updated README.md with comprehensive documentation for new features: * User-Specific API Endpoints with Bearer token authentication * Adaptive Rate Limiting with learning from 429 responses * Model Metadata Extraction with automatic pricing/rate limit detection * Enhanced Analytics Filtering by provider/model/rotation * Updated Web Dashboard feature list - Updated DOCUMENTATION.md with detailed sections: * Adaptive Rate Limiting configuration and benefits * Model Metadata Extraction features and dashboard integration - Updated CHANGELOG.md: * Moved Unreleased section to version 0.9.2 (2026-04-03) * Added comprehensive list of new features and changes - Version bump to 0.9.2: * Updated pyproject.toml version * Updated aisbf/__init__.py version This release focuses on improving documentation coverage for recently added features including user-specific API endpoints, adaptive rate limiting, model metadata extraction, and analytics filtering.

Release v0.9.2: Documentation updates and version bump
- Updated README.md with comprehensive documentation for new features: * User-Specific API Endpoints with Bearer token authentication * Adaptive Rate Limiting with learning from 429 responses * Model Metadata Extraction with automatic pricing/rate limit detection * Enhanced Analytics Filtering by provider/model/rotation * Updated Web Dashboard feature list - Updated DOCUMENTATION.md with detailed sections: * Adaptive Rate Limiting configuration and benefits * Model Metadata Extraction features and dashboard integration - Updated CHANGELOG.md: * Moved Unreleased section to version 0.9.2 (2026-04-03) * Added comprehensive list of new features and changes - Version bump to 0.9.2: * Updated pyproject.toml version * Updated aisbf/__init__.py version This release focuses on improving documentation coverage for recently added features including user-specific API endpoints, adaptive rate limiting, model metadata extraction, and analytics filtering.
fbb49301 · Your Name · 252d45e4 · fbb49301 · 252d45e4 · 252d45e4
Commit fbb49301 authored Apr 03, 2026 by Your Name
13 changed files
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.9.2] - 2026-04-03
 ### Added
 - **User-Specific API Endpoints**: New API endpoints for authenticated users to access their own configurations
  - `GET /api/user/models` - List user's own models
@@ -56,6 +58,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  - Per-provider reset functionality and reset-all button
  - Configurable via aisbf.json with learning_rate, headroom_percent, recovery_rate, etc.
  - Integration with BaseProviderHandler.apply_rate_limit() and handle_429_error()
+### Changed
+- **Documentation Updates**: Updated README.md and DOCUMENTATION.md with comprehensive coverage of new features
+  - Enhanced User-Specific API Endpoints documentation
+  - Added Adaptive Rate Limiting configuration guide
+  - Updated Web Dashboard feature list
+  - Added Model Metadata Extraction details
+  - Improved Analytics Filtering documentation
+## [0.9.1] - 2026-03-XX
 - **Token Usage Analytics**: Comprehensive analytics dashboard for tracking token usage, costs, and performance
  - Analytics module (`aisbf/analytics.py`) with token usage tracking, cost estimation, and optimization recommendations
  - Dashboard page with charts for token usage over time (1h, 6h, 24h, 7d)

--- a/CLAUDE_OAUTH2_DEEP_DIVE.md
+++ b/CLAUDE_OAUTH2_DEEP_DIVE.md
--- a/CLAUDE_OAUTH2_SETUP.md
+++ b/CLAUDE_OAUTH2_SETUP.md
--- a/DOCUMENTATION.md
+++ b/DOCUMENTATION.md
@@ -630,6 +630,30 @@ User tokens authenticate MCP requests, with admin users getting full access and
 AISBF supports the following AI providers:
+### Model Metadata Extraction
+AISBF automatically extracts and tracks model metadata from provider responses:
+**Automatic Extraction:**
+- **Pricing Information**: `rate_multiplier`, `rate_unit` (e.g., "per million tokens")
+- **Token Usage**: `prompt_tokens`, `completion_tokens` from API responses
+- **Rate Limits**: Auto-configures rate limits from 429 responses with retry-after headers
+- **Model Details**: `description`, `context_length`, `architecture`, `supported_parameters`
+**Dashboard Features:**
+- **"Get Models" Button**: Fetches and displays comprehensive model metadata
+- **Real-time Display**: Shows pricing, rate limits, and capabilities for each model
+- **Extended Fields**: OpenRouter-style metadata including top_provider, pricing details, and architecture
+**Configuration:**
+Model metadata is automatically extracted from provider responses and stored in the database. No manual configuration required.
+**Benefits:**
+- Automatic rate limit configuration from provider responses
+- Cost estimation based on actual pricing data
+- Better model selection with detailed capability information
+- Reduced manual configuration overhead
 ### Google
 - Uses google-genai SDK
 - Requires API key
@@ -1179,6 +1203,58 @@ In this example:
 ```
 ### Rate Limiting
+#### Adaptive Rate Limiting
+AISBF includes intelligent rate limit management that learns from provider 429 responses and automatically adjusts request rates:
+**Features:**
+- **Learning from 429 Responses**: Automatically detects rate limits from provider error responses
+- **Exponential Backoff with Jitter**: Configurable backoff strategy to avoid thundering herd
+- **Rate Limit Headroom**: Stays 10% below learned limits to prevent hitting rate limits
+- **Gradual Recovery**: Slowly increases rate after consecutive successful requests
+- **Per-Provider Tracking**: Independent rate limiters for each provider
+- **Dashboard Monitoring**: Real-time view of current limits, 429 counts, success rates, and recovery progress
+**Configuration:**
+Via Dashboard:
+1. Navigate to Dashboard → Rate Limits
+2. View current rate limits and statistics for each provider
+3. Reset individual provider limits or reset all
+4. Monitor 429 response patterns and success rates
+Via Configuration File (`~/.aisbf/aisbf.json`):
+```json
+{
+  "adaptive_rate_limiting": {
+    "enabled": true,
+    "learning_rate": 0.1,
+    "headroom_percent": 10,
+    "recovery_rate": 0.05,
+    "base_backoff": 1.0,
+    "jitter_factor": 0.1,
+    "history_window": 100
+  }
+}
+```
+**Configuration Fields:**
+- `enabled`: Enable adaptive rate limiting (default: true)
+- `learning_rate`: How quickly to adjust limits (0.0-1.0, default: 0.1)
+- `headroom_percent`: Safety margin below learned limit (default: 10%)
+- `recovery_rate`: Rate of limit increase after successes (default: 0.05)
+- `base_backoff`: Base backoff time in seconds (default: 1.0)
+- `jitter_factor`: Random jitter to prevent synchronized retries (default: 0.1)
+- `history_window`: Number of recent requests to track (default: 100)
+**Benefits:**
+- Automatic optimization without manual rate limit configuration
+- Reduced 429 errors by learning optimal request rates
+- Better resource utilization by maximizing throughput while respecting limits
+- Provider-specific tracking for independent rate limit management
+#### Traditional Rate Limiting
 - Automatic provider disabling when rate limited
 - Intelligent parsing of 429 responses to determine wait time
 - Graceful error handling

--- a/README.md
+++ b/README.md
@@ -8,14 +8,17 @@ A modular proxy server for managing multiple AI provider integrations with unifi
 AISBF includes a comprehensive web-based dashboard for easy configuration and management:
- **Provider Management**: Configure API keys, endpoints, and model settings
+- **Provider Management**: Configure API keys, endpoints, and model settings with automatic metadata extraction
 - **Rotation Configuration**: Set up weighted load balancing across providers
 - **Autoselect Configuration**: Configure AI-powered model selection
 - **Server Settings**: Manage SSL/TLS, authentication, and TOR hidden service
 - **User Management**: Create/manage users with role-based access control (admin users only)
 - **Multi-User Support**: Isolated configurations per user with API token management
 - **Real-time Monitoring**: View provider status and configuration
- **Token Usage Analytics**: Track token usage, costs, and performance with charts and export functionality
+- **Token Usage Analytics**: Track token usage, costs, and performance with charts, filtering by provider/model/rotation, and export functionality
+- **Rate Limits Dashboard**: Monitor adaptive rate limiting with real-time statistics, 429 counts, success rates, and recovery progress
+- **Model Metadata Display**: View detailed model information including pricing, rate limits, and supported parameters
+- **Cache Management**: View cache statistics and clear cache via dashboard endpoints
 Access the dashboard at `http://localhost:17765/dashboard` (default credentials: admin/admin)
@@ -29,7 +32,7 @@ Access the dashboard at `http://localhost:17765/dashboard` (default credentials:
 - **Content Classification**: NSFW/privacy content filtering with configurable classification windows
 - **Streaming Support**: Full support for streaming responses from all providers
 - **Error Tracking**: Automatic provider disabling after consecutive failures with configurable cooldown periods
- **Adaptive Rate Limiting**: Intelligent rate limit management that learns from 429 responses with exponential backoff and gradual recovery
+- **Adaptive Rate Limiting**: Intelligent rate limit management that learns from 429 responses with exponential backoff, gradual recovery, and dashboard monitoring
 - **Rate Limiting**: Built-in rate limiting and graceful error handling
 - **Request Splitting**: Automatic splitting of large requests when exceeding `max_request_tokens` limit
 - **Token Rate Limiting**: Per-model token usage tracking with TPM (tokens per minute), TPH (tokens per hour), and TPD (tokens per day) limits
@@ -48,14 +51,15 @@ Access the dashboard at `http://localhost:17765/dashboard` (default credentials:
  - Dashboard endpoints for cache management
 - **Smart Request Batching**: 15-25% latency reduction by batching similar requests within 100ms window with provider-specific configurations
 - **Streaming Response Optimization**: 10-20% memory reduction with chunk pooling, backpressure handling, and provider-specific streaming optimizations for Google and Kiro providers
- **Token Usage Analytics**: Comprehensive analytics dashboard with charts, cost estimation, performance tracking, and export functionality
+- **Token Usage Analytics**: Comprehensive analytics dashboard with charts, cost estimation, performance tracking, filtering by provider/model/rotation, and export functionality
+- **Model Metadata Extraction**: Automatic extraction of pricing, rate limits, and model information from provider responses with dashboard display
 - **SSL/TLS Support**: Built-in HTTPS support with Let's Encrypt integration and automatic certificate renewal
 - **Self-Signed Certificates**: Automatic generation of self-signed certificates for development/testing
 - **TOR Hidden Service**: Full support for exposing AISBF over TOR network as a hidden service (ephemeral and persistent)
 - **MCP Server**: Model Context Protocol server for remote agent configuration and model access (SSE and HTTP streaming)
 - **Persistent Database**: SQLite/MySQL-based tracking of token usage, context dimensions, and model embeddings with automatic cleanup
 - **Multi-User Support**: User management with isolated configurations, role-based access control, and API token management
- **User-Specific API Endpoints**: Dedicated API endpoints for authenticated users to access their own configurations
+- **User-Specific API Endpoints**: Dedicated API endpoints for authenticated users to access their own configurations with Bearer token authentication
 - **Database Integration**: SQLite/MySQL-based persistent storage for user configurations, token usage tracking, and context management
 - **User-Specific Configurations**: Each user can have their own providers, rotations, and autoselect configurations stored in the database
 - **Flexible Caching System**: Multi-backend caching for model embeddings and performance optimization
@@ -613,6 +617,57 @@ Users can create and manage their own:
 - **User Dashboard**: Personal configuration management and usage statistics
 - **API Token Management**: Create, view, and delete API tokens with usage analytics
+### Adaptive Rate Limiting
+AISBF includes intelligent rate limit management that learns from provider 429 responses and automatically adjusts request rates:
+#### Features
+- **Learning from 429 Responses**: Automatically detects rate limits from provider error responses
+- **Exponential Backoff with Jitter**: Configurable backoff strategy to avoid thundering herd
+- **Rate Limit Headroom**: Stays 10% below learned limits to prevent hitting rate limits
+- **Gradual Recovery**: Slowly increases rate after consecutive successful requests
+- **Per-Provider Tracking**: Independent rate limiters for each provider
+- **Dashboard Monitoring**: Real-time view of current limits, 429 counts, success rates, and recovery progress
+#### Configuration
+**Via Dashboard:**
+1. Navigate to Dashboard → Rate Limits
+2. View current rate limits and statistics for each provider
+3. Reset individual provider limits or reset all
+4. Monitor 429 response patterns and success rates
+**Via Configuration File:**
+Edit `~/.aisbf/aisbf.json`:
+```json
+{
+  "adaptive_rate_limiting": {
+    "enabled": true,
+    "learning_rate": 0.1,
+    "headroom_percent": 10,
+    "recovery_rate": 0.05,
+    "base_backoff": 1.0,
+    "jitter_factor": 0.1,
+    "history_window": 100
+  }
+}
+```
+#### Configuration Fields
+- **`enabled`**: Enable adaptive rate limiting (default: true)
+- **`learning_rate`**: How quickly to adjust limits (0.0-1.0, default: 0.1)
+- **`headroom_percent`**: Safety margin below learned limit (default: 10%)
+- **`recovery_rate`**: Rate of limit increase after successes (default: 0.05)
+- **`base_backoff`**: Base backoff time in seconds (default: 1.0)
+- **`jitter_factor`**: Random jitter to prevent synchronized retries (default: 0.1)
+- **`history_window`**: Number of recent requests to track (default: 100)
+#### Benefits
+- **Automatic Optimization**: No manual rate limit configuration needed
+- **Reduced 429 Errors**: Learns optimal request rates for each provider
+- **Better Resource Utilization**: Maximizes throughput while respecting limits
+- **Provider-Specific**: Each provider has independent rate limit tracking
 ### Content Classification and Semantic Selection
 AISBF provides advanced content filtering and intelligent model selection based on content analysis:
@@ -785,12 +840,23 @@ Authorization: Bearer YOUR_API_TOKEN
 | `POST /api/user/chat/completions` | Chat completions using user's own models |
 | `GET /api/user/{config_type}/models` | List models for specific config type (provider, rotation, autoselect) |
+#### Access Control
 **Admin Users** have access to both global and user configurations when using user API endpoints.
 **Regular Users** can only access their own configurations.
 **Global Tokens** (configured in aisbf.json) have full access to all configurations.
+#### Token Management
+Users can create and manage API tokens through the dashboard:
+1. Navigate to Dashboard → User Dashboard → API Tokens
+2. Click "Generate New Token" to create a token
+3. Copy the token immediately (it won't be shown again)
+4. Use the token in API requests via Bearer authentication
+5. View token usage statistics and delete tokens as needed
 #### Example: Using User API with cURL
 ```bash
@@ -802,8 +868,21 @@ curl -X POST -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "your-rotation/model", "messages": [{"role": "user", "content": "Hello"}]}' \
  http://localhost:17765/api/user/chat/completions
+# List user's providers
+curl -H "Authorization: Bearer YOUR_TOKEN" http://localhost:17765/api/user/providers
+# List user's rotations
+curl -H "Authorization: Bearer YOUR_TOKEN" http://localhost:17765/api/user/rotations
 ```
+#### MCP Integration
+User tokens also work with MCP (Model Context Protocol) endpoints:
+- Admin users get access to both global and user-specific MCP tools
+- Regular users get access to user-only MCP tools
+- Tools include model access, configuration management, and usage statistics
 ### MCP (Model Context Protocol)
 AISBF provides an MCP server for remote agent configuration and model access:

--- a/aisbf/__init__.py
+++ b/aisbf/__init__.py
@@ -46,7 +46,7 @@ from .providers import (
 from .handlers import RequestHandler, RotationHandler, AutoselectHandler
 from .utils import count_messages_tokens, split_messages_into_chunks, get_max_request_tokens_for_model
-__version__ = "0.3.3"
+__version__ = "0.9.2"
 __all__ = [
    # Config
    "config",

--- a/aisbf/claude_auth.py
+++ b/aisbf/claude_auth.py
@@ -77,7 +77,7 @@ def _generate_client_id():
 # Claude OAuth2 Configuration
 # These values match the official claude-cli implementation
 CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"  # Official Claude Code client ID
-AUTH_URL = "https://claude.ai/oauth/authorize"  # Authorization endpoint
+AUTH_URL = "https://claude.com/cai/oauth/authorize"  # Authorization endpoint (note: /cai path is required)
 TOKEN_URL = "https://api.anthropic.com/v1/oauth/token"  # Token exchange endpoint
 REDIRECT_URI = "http://localhost:54545/callback"  # OAuth2 callback URI
 CLI_USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
@@ -141,6 +141,9 @@ class ClaudeAuth:
        """Save credentials to file with file locking to prevent race conditions."""
        try:
            self.tokens = data
+            # Store id_token if received (contains account info)
+            if 'id_token' in data:
+                self.tokens['id_token'] = data['id_token']
            # Add local expiry timestamp for easier checking
            self.tokens['expires_at'] = time.time() + data.get('expires_in', 3600)
@@ -281,14 +284,24 @@ class ClaudeAuth:
        logger.error(f"Token refresh failed after {max_retries} attempts")
        return False
-    def get_valid_token(self) -> str:
+    def get_valid_token(self, auto_login: bool = False) -> str:
        """
        Get a valid access token, refreshing it if necessary.
+        Args:
+            auto_login: If True, automatically trigger login flow when no credentials exist.
+                       If False, raise an exception instead (default: False for security).
        Returns:
            Valid access token
+        Raises:
+            Exception: If no credentials exist and auto_login is False
        """
        if not self.tokens:
+            if not auto_login:
+                logger.error("No Claude credentials available. Please authenticate via dashboard or MCP.")
+                raise Exception("Claude authentication required. Please authenticate via /dashboard/claude/auth/start or MCP tool.")
            logger.info("No tokens available, starting login flow")
            self.login()
@@ -296,10 +309,51 @@ class ClaudeAuth:
        if time.time() > (self.tokens.get('expires_at', 0) - 300):
            logger.info("Token expiring soon, refreshing...")
            if not self.refresh_token():
+                if not auto_login:
+                    logger.error("Token refresh failed and auto_login is disabled")
+                    raise Exception("Claude token refresh failed. Please re-authenticate via /dashboard/claude/auth/start or MCP tool.")
                logger.warning("Refresh failed, re-authenticating...")
                self.login()
        return self.tokens['access_token']
+    def get_account_id(self) -> Optional[str]:
+        """
+        Get account_id from OAuth2 credentials.
+        Returns:
+            Account ID if available, None otherwise
+        """
+        if not self.tokens:
+            return None
+        # First check for account.uuid in token response (Claude OAuth2 format)
+        account = self.tokens.get('account')
+        if account and isinstance(account, dict):
+            account_uuid = account.get('uuid')
+            if account_uuid:
+                return account_uuid
+        # Then try to get from id_token (JWT claim)
+        id_token = self.tokens.get('id_token')
+        if id_token:
+            try:
+                import base64
+                import json
+                # Decode JWT payload (second part of JWT)
+                parts = id_token.split('.')
+                if len(parts) >= 2:
+                    # Add padding if needed
+                    payload = parts[1] + '=' * (4 - len(parts[1]) % 4)
+                    decoded = base64.urlsafe_b64decode(payload)
+                    claims = json.loads(decoded)
+                    # Try sub claim first, then account_id
+                    return claims.get('sub') or claims.get('account_id')
+            except Exception:
+                pass
+        # Fall back to direct account_id field in token response
+        return self.tokens.get('account_id') or self.tokens.get('account_uuid')
    def login(self, use_local_server=True):
        """
@@ -333,7 +387,7 @@ class ClaudeAuth:
                "client_id": CLIENT_ID,
                "response_type": "code",
                "redirect_uri": REDIRECT_URI,
-                "scope": "org:create_api_key user:profile user:inference",
+                "scope": "org:create_api_key user:profile user:inference user:sessions:claude_code user:mcp_servers user:file_upload",
                "code_challenge": challenge,
                "code_challenge_method": "S256",
                "state": state
@@ -442,7 +496,7 @@ class ClaudeAuth:
            "client_id": CLIENT_ID,
            "response_type": "code",
            "redirect_uri": REDIRECT_URI,
-            "scope": "org:create_api_key user:profile user:inference",
+            "scope": "org:create_api_key user:profile user:inference user:sessions:claude_code user:mcp_servers user:file_upload",
            "code_challenge": challenge,
            "code_challenge_method": "S256",
            "state": state

--- a/aisbf/providers.py
+++ b/aisbf/providers.py
--- a/docs/claude_provider_comparison.md
+++ b/docs/claude_provider_comparison.md
--- a/docs/claude_provider_improvement_plan.md
+++ b/docs/claude_provider_improvement_plan.md
--- a/docs/claude_provider_improvements.md
+++ b/docs/claude_provider_improvements.md
-# Claude Provider: Improvements & SDK Migration Analysis
-**Date:** 2026-04-01  
-**Author:** AI Assistant
---
-## Executive Summary
-This document analyzes potential improvements for the AISBF Claude provider and evaluates the trade-offs of migrating from direct HTTP (`httpx`) to the official Anthropic Python SDK.
---
-## 1. Current Architecture Assessment
-### What We Do Well:
- **Direct HTTP control**: Full control over request/response lifecycle
- **OAuth2 integration**: Custom auth flow matching Claude Code's OAuth2
- **Streaming SSE parsing**: Manual SSE parsing gives fine-grained control
- **OpenAI format conversion**: Complete OpenAI ↔ Anthropic translation
- **Fallback retry logic**: Model fallback with exponential backoff
-### Current Limitations:
- Manual message format conversion (error-prone)
- No automatic retry on transient errors
- Missing advanced SDK features (automatic token counting, etc.)
- Temperature/thinking conflict handling (just fixed)
---
-## 2. Recommended Improvements (Without SDK Migration)
-### 2.1 Message Validation Pipeline
-**Priority:** HIGH  
-**Effort:** MEDIUM
-Implement a comprehensive message validation pipeline similar to vendors/kilocode:
-```python
-def validate_and_normalize_messages(self, messages: List[Dict]) -> List[Dict]:
-    """Complete message validation pipeline."""
-    # 1. Empty content filtering
-    messages = self._filter_empty_content_blocks(messages)
-    # 2. Tool call ID sanitization
-    messages = self._sanitize_tool_call_ids(messages)
-    # 3. Role alternation enforcement
-    messages = self._ensure_alternating_roles(messages)
-    # 4. Tool result pairing
-    messages = self._ensure_tool_result_pairing(messages)
-    # 5. Thinking block preservation
-    messages = self._preserve_thinking_blocks(messages)
-    # 6. Media limit enforcement (100 items max)
-    messages = self._enforce_media_limits(messages)
-    return messages
-```
-**Benefits:**
- Prevents 400 errors from malformed messages
- Matches vendors/kilocode robustness
- Reduces API rejection rate
-### 2.2 Automatic Retry with Exponential Backoff
-**Priority:** HIGH  
-**Effort:** LOW
-Add automatic retry for transient errors (529, 503, rate limits):
-```python
-async def _request_with_retry(self, api_url, payload, headers, max_retries=3):
-    """Request with automatic retry and exponential backoff."""
-    for attempt in range(max_retries):
-        try:
-            response = await self.client.post(api_url, headers=headers, json=payload)
-            if response.status_code == 429:
-                wait_time = self._parse_retry_after(response.headers)
-                await asyncio.sleep(wait_time)
-                continue
-            if response.status_code in (529, 503):
-                wait_time = min(2 ** attempt + random.uniform(0, 1), 30)
-                await asyncio.sleep(wait_time)
-                continue
-            return response
-        except httpx.TimeoutException:
-            if attempt < max_retries - 1:
-                await asyncio.sleep(2 ** attempt)
-                continue
-            raise
-```
-**Benefits:**
- Handles transient overload errors automatically
- Respects `x-should-retry: true` header
- Reduces user-facing errors
-### 2.3 Temperature/Thinking Conflict Resolution
-**Priority:** HIGH (ALREADY FIXED)  
-**Effort:** DONE
-Fixed in commit 2559e2f - skip temperature 0.0 when thinking beta is active.
-### 2.4 Streaming Idle Watchdog
-**Priority:** MEDIUM  
-**Effort:** LOW
-Add timeout detection for hung streams (matching vendors/claude):
-```python
-STREAM_IDLE_TIMEOUT = 90.0  # seconds
-async def _stream_with_watchdog(self, response):
-    """Stream with idle timeout detection."""
-    last_event_time = time.time()
-    async for line in response.aiter_lines():
-        if time.time() - last_event_time > STREAM_IDLE_TIMEOUT:
-            raise TimeoutError(f"Stream idle for {STREAM_IDLE_TIMEOUT}s")
-        last_event_time = time.time()
-        yield line
-```
-**Benefits:**
- Detects hung connections quickly
- Prevents indefinite hangs
- Matches vendors/claude behavior
-### 2.5 Token Counting and Context Management
-**Priority:** MEDIUM  
-**Effort:** MEDIUM
-Add automatic token counting for context window management:
-```python
-def _count_tokens(self, messages: List[Dict], model: str) -> int:
-    """Count tokens in messages for context window management."""
-    # Use tiktoken or anthropic's token counting
-    # Track cumulative token usage
-    # Warn when approaching context limits
-    pass
-```
-**Benefits:**
- Prevents context window exceeded errors
- Enables automatic compaction decisions
- Better resource management
-### 2.6 Cache Token Tracking
-**Priority:** LOW  
-**Effort:** LOW
-Track cache hit/miss rates for analytics:
-```python
-def _track_cache_usage(self, usage: Dict):
-    """Track prompt cache usage for analytics."""
-    cache_read = usage.get('cache_read_input_tokens', 0)
-    cache_creation = usage.get('cache_creation_input_tokens', 0)
-    if cache_read > 0:
-        self.cache_hits += 1
-        self.cache_tokens_read += cache_read
-    if cache_creation > 0:
-        self.cache_misses += 1
-        self.cache_tokens_created += cache_creation
-```
---
-## 3. SDK Migration Analysis
-### 3.1 Official Anthropic Python SDK
-**Package:** `anthropic` (already in requirements.txt)  
-**Current Usage:** Only for `AnthropicProviderHandler`, not for `ClaudeProviderHandler`
-#### Pros of SDK Migration:
-1. **Automatic Message Validation**
-   - SDK validates messages before sending
-   - Catches format errors early
-   - Reduces 400 errors
-2. **Built-in Retry Logic**
-   - SDK has automatic retry for transient errors
-   - Configurable retry strategies
-   - Handles rate limits gracefully
-3. **Token Counting**
-   - SDK can count tokens automatically
-   - No need for external token counting
-   - Accurate token usage tracking
-4. **Streaming Abstraction**
-   - SDK handles SSE parsing internally
-   - Cleaner streaming code
-   - Automatic event type handling
-5. **Type Safety**
-   - Pydantic models for all request/response types
-   - Better IDE support
-   - Compile-time error detection
-6. **Future-Proof**
-   - SDK updates with new API features
-   - Less maintenance burden
-   - Official support from Anthropic
-#### Cons of SDK Migration:
-1. **OAuth2 Token Handling**
-   - SDK expects API keys, not OAuth2 tokens
-   - May need custom auth implementation
-   - Current direct HTTP works well with OAuth2
-2. **Loss of Fine-Grained Control**
-   - SDK abstracts away some control
-   - Custom headers may be harder to set
-   - Beta header management through SDK
-3. **Dependency on SDK Version**
-   - SDK updates may break compatibility
-   - Need to track SDK releases
-   - Potential breaking changes
-4. **Streaming Differences**
-   - SDK streaming uses different abstraction
-   - May need to rewrite streaming logic
-   - Current SSE parsing works well
-### 3.2 Hybrid Approach (Recommended)
-Use SDK for non-streaming requests, keep direct HTTP for streaming:
-```python
-class ClaudeProviderHandler(BaseProviderHandler):
-    def __init__(self, ...):
-        # SDK client for non-streaming
-        self.sdk_client = Anthropic(
-            api_key=self._get_oauth_token(),
-            base_url="https://api.anthropic.com"
-        )
-        # HTTP client for streaming
-        self.http_client = httpx.AsyncClient(...)
-    async def handle_request(self, ..., stream=False):
-        if stream:
-            return await self._handle_streaming_http(...)
-        else:
-            return await self._handle_non_streaming_sdk(...)
-```
-**Benefits:**
- Best of both worlds
- SDK validation for non-streaming
- Full control for streaming
- Gradual migration path
---
-## 4. Implementation Priority
-### Phase 1: Quick Wins (1-2 days)
-1. ✅ Temperature/thinking conflict fix (DONE)
-2. Automatic retry with exponential backoff
-3. Streaming idle watchdog
-### Phase 2: Robustness (3-5 days)
-4. Message validation pipeline
-5. Token counting and context management
-6. Cache token tracking
-### Phase 3: SDK Evaluation (1-2 weeks)
-7. Prototype SDK integration for non-streaming
-8. Compare error rates and performance
-9. Decide on full migration or hybrid approach
---
-## 5. Recommendation
-**Do NOT migrate to SDK immediately.** Instead:
-1. **Implement the quick wins first** - These provide immediate value with minimal effort
-2. **Build the message validation pipeline** - This addresses the most common error source
-3. **Evaluate SDK after Phase 2** - Once our implementation is robust, evaluate if SDK adds value
-**Rationale:**
- Our direct HTTP approach gives us full control over OAuth2
- We've already implemented most SDK features manually
- SDK migration would be a significant rewrite with uncertain benefits
- The hybrid approach adds complexity without clear advantages
-**When to reconsider SDK:**
- If Anthropic adds features we can't easily implement manually
- If SDK becomes the only way to access new API features
- If maintenance burden of manual implementation becomes too high
---
-## 6. Comparison: Our Implementation vs SDK
-| Feature | Our Implementation | SDK | Gap |
-|---------|-------------------|-----|-----|
-| Message Validation | Manual (Phase 2) | Automatic | Medium |
-| Retry Logic | Manual fallback | Built-in | Low |
-| Token Counting | External | Built-in | Medium |
-| Streaming | Manual SSE | SDK abstraction | Low |
-| OAuth2 Support | Custom | Requires workaround | High |
-| Type Safety | Dict-based | Pydantic models | Medium |
-| Beta Headers | Manual | SDK config | Low |
-| Error Handling | Custom | SDK exceptions | Low |
-**Overall Assessment:** Our implementation is 80% as robust as SDK, with better OAuth2 support. The remaining 20% can be achieved with the recommended improvements without SDK migration.
--- a/main.py
+++ b/main.py
@@ -5699,12 +5699,13 @@ async def dashboard_claude_auth_start(request: Request):
        # Build OAuth2 URL (Claude requires full scope set)
        auth_params = {
+            "code": "true",
            "client_id": auth.CLIENT_ID,
            "response_type": "code",
            "code_challenge": challenge,
            "code_challenge_method": "S256",
            "redirect_uri": auth.REDIRECT_URI,
-            "scope": "user:profile user:inference user:sessions:claude_code user:mcp_servers user:file_upload",
+            "scope": "org:create_api_key user:profile user:inference user:sessions:claude_code user:mcp_servers user:file_upload",
            "state": state
        }
        auth_url = f"{auth.AUTH_URL}?{'&'.join(f'{k}={v}' for k, v in auth_params.items())}"

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "aisbf"
-version = "0.9.1"
+version = "0.9.2"
 description = "AISBF - AI Service Broker Framework || AI Should Be Free - A modular proxy server for managing multiple AI provider integrations"
 readme = "README.md"
 license = "GPL-3.0-or-later"