docs: Comprehensive documentation update with all missing features

- Updated CHANGELOG.md with complete feature list including: * Claude OAuth2 provider with PKCE flow and automatic token refresh * Response caching with semantic deduplication (Memory/Redis/SQLite/MySQL) * Model embeddings cache with multiple backends * User-specific API endpoints and MCP enhancements * Adaptive rate limiting and token usage analytics * Smart request batching and streaming optimization * All performance features and bug fixes - Enhanced README.md with: * Claude OAuth2 authentication section with setup guide * Response caching details with all backends and deduplication * Flexible caching system with Redis/MySQL/SQLite/File/Memory * Updated key features with expanded descriptions * Configuration examples for all caching systems - Updated DOCUMENTATION.md with: * Claude Code provider in Provider Support section * Enhanced provider descriptions with caching capabilities * Reference to Claude OAuth2 setup documentation - Enhanced CLAUDE_OAUTH2_SETUP.md with key features list - Added clarifying comments to aisbf/claude_auth.py All documentation now accurately reflects the codebase with complete coverage of caching systems (response cache and model embeddings cache), request deduplication via SHA256, and all implemented features.

docs: Comprehensive documentation update with all missing features
- Updated CHANGELOG.md with complete feature list including: * Claude OAuth2 provider with PKCE flow and automatic token refresh * Response caching with semantic deduplication (Memory/Redis/SQLite/MySQL) * Model embeddings cache with multiple backends * User-specific API endpoints and MCP enhancements * Adaptive rate limiting and token usage analytics * Smart request batching and streaming optimization * All performance features and bug fixes - Enhanced README.md with: * Claude OAuth2 authentication section with setup guide * Response caching details with all backends and deduplication * Flexible caching system with Redis/MySQL/SQLite/File/Memory * Updated key features with expanded descriptions * Configuration examples for all caching systems - Updated DOCUMENTATION.md with: * Claude Code provider in Provider Support section * Enhanced provider descriptions with caching capabilities * Reference to Claude OAuth2 setup documentation - Enhanced CLAUDE_OAUTH2_SETUP.md with key features list - Added clarifying comments to aisbf/claude_auth.py All documentation now accurately reflects the codebase with complete coverage of caching systems (response cache and model embeddings cache), request deduplication via SHA256, and all implemented features.
f64585f1 · Your Name · a33c622b · f64585f1 · f64585f1 · f64585f1
Commit f64585f1 authored Mar 30, 2026 by Your Name
Showing with 423 additions and 79 deletions

CHANGELOG.md CHANGELOG.md +275 -66

CLAUDE_OAUTH2_SETUP.md CLAUDE_OAUTH2_SETUP.md +9 -1

DOCUMENTATION.md DOCUMENTATION.md +18 -2

README.md README.md +115 -5

claude_auth.py aisbf/claude_auth.py +6 -5

No files found.
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
--- a/CLAUDE_OAUTH2_SETUP.md
+++ b/CLAUDE_OAUTH2_SETUP.md
@@ -2,7 +2,15 @@

 ## Overview

-AISBF now supports Claude Code (claude.ai) as a provider using OAuth2 authentication. This implementation mimics the official Claude CLI authentication flow and includes a Chrome extension to handle OAuth2 callbacks when AISBF runs on a remote server.
+AISBF supports Claude Code (claude.ai) as a provider using OAuth2 authentication with automatic token refresh. This implementation matches the official Claude CLI authentication flow and includes a Chrome extension to handle OAuth2 callbacks when AISBF runs on a remote server.
+
+**Key Features:**
+- Full OAuth2 PKCE flow matching official claude-cli
+- Automatic token refresh with refresh token rotation
+- Chrome extension for remote server OAuth2 callback interception
+- Dashboard integration with authentication UI
+- Optional curl_cffi TLS fingerprinting for Cloudflare bypass
+- Compatible with official claude-cli credentials

 ## Architecture


--- a/DOCUMENTATION.md
+++ b/DOCUMENTATION.md
@@ -2,11 +2,12 @@

 ## Overview

-AISBF is a modular proxy server for managing multiple AI provider integrations. It provides a unified API interface for interacting with various AI services (Google, OpenAI, Anthropic, Ollama) with support for provider rotation, AI-assisted model selection, and error tracking.
+AISBF is a modular proxy server for managing multiple AI provider integrations. It provides a unified API interface for interacting with various AI services (Google, OpenAI, Anthropic, Claude Code, Ollama, Kiro) with support for provider rotation, AI-assisted model selection, and error tracking.

 ### Key Features

- **Multi-Provider Support**: Unified interface for Google, OpenAI, Anthropic, Ollama, and Kiro (Amazon Q Developer)
+- **Multi-Provider Support**: Unified interface for Google, OpenAI, Anthropic, Claude Code (OAuth2), Ollama, and Kiro (Amazon Q Developer)
+- **Claude OAuth2 Authentication**: Full OAuth2 PKCE flow for Claude Code with automatic token refresh, Chrome extension for remote servers, and curl_cffi TLS fingerprinting support
 - **Rotation Models**: Intelligent load balancing across multiple providers with weighted model selection and automatic failover
 - **Autoselect Models**: AI-powered model selection that analyzes request content to route to the most appropriate specialized model
 - **Streaming Support**: Full support for streaming responses from all providers with proper serialization
@@ -633,16 +634,31 @@ AISBF supports the following AI providers:
 - Uses google-genai SDK
 - Requires API key
 - Supports streaming and non-streaming responses
+- Context Caching API support for cost reduction

 ### OpenAI
 - Uses openai SDK
 - Requires API key
 - Supports streaming and non-streaming responses
+- Automatic prefix caching (no configuration needed)

 ### Anthropic
 - Uses anthropic SDK
 - Requires API key
 - Static model list (no dynamic model discovery)
+- cache_control support for cost reduction
+
+### Claude Code (OAuth2)
+- Full OAuth2 PKCE authentication flow
+- Automatic token refresh with refresh token rotation
+- Chrome extension for remote server OAuth2 callback interception
+- Dashboard integration with authentication UI
+- Credentials stored in `~/.aisbf/claude_credentials.json`
+- Optional curl_cffi TLS fingerprinting for Cloudflare bypass
+- Compatible with official claude-cli credentials
+- Access to latest Claude models (3.7 Sonnet, 3.5 Sonnet, 3.5 Haiku, etc.)
+- Supports streaming, tool calling, vision, and all Claude features
+- See [`CLAUDE_OAUTH2_SETUP.md`](CLAUDE_OAUTH2_SETUP.md) for setup instructions

 ### Ollama
 - Uses direct HTTP API

--- a/README.md
+++ b/README.md
@@ -21,13 +21,15 @@ Access the dashboard at `http://localhost:17765/dashboard` (default credentials:

 ## Key Features

- **Multi-Provider Support**: Unified interface for Google, OpenAI, Anthropic, Ollama, and Kiro (Amazon Q Developer)
+- **Multi-Provider Support**: Unified interface for Google, OpenAI, Anthropic, Ollama, Kiro (Amazon Q Developer), and Claude Code (OAuth2)
+- **Claude OAuth2 Authentication**: Full OAuth2 PKCE flow for Claude Code with automatic token refresh and Chrome extension for remote servers
 - **Rotation Models**: Weighted load balancing across multiple providers with automatic failover
 - **Autoselect Models**: AI-powered model selection based on content analysis and request characteristics
 - **Semantic Classification**: Fast hybrid BM25 + semantic model selection using sentence transformers (optional)
 - **Content Classification**: NSFW/privacy content filtering with configurable classification windows
 - **Streaming Support**: Full support for streaming responses from all providers
- **Error Tracking**: Automatic provider disabling after consecutive failures with cooldown periods
+- **Error Tracking**: Automatic provider disabling after consecutive failures with configurable cooldown periods
+- **Adaptive Rate Limiting**: Intelligent rate limit management that learns from 429 responses with exponential backoff and gradual recovery
 - **Rate Limiting**: Built-in rate limiting and graceful error handling
 - **Request Splitting**: Automatic splitting of large requests when exceeding `max_request_tokens` limit
 - **Token Rate Limiting**: Per-model token usage tracking with TPM (tokens per minute), TPH (tokens per hour), and TPD (tokens per day) limits
@@ -37,18 +39,33 @@ Access the dashboard at `http://localhost:17765/dashboard` (default credentials:
 - **Effective Context Tracking**: Reports total tokens used (effective_context) for every request
 - **Enhanced Context Condensation**: 8 condensation methods including hierarchical, conversational, semantic, algorithmic, sliding window, importance-based, entity-aware, and code-aware condensation
 - **Provider-Native Caching**: 50-70% cost reduction using Anthropic `cache_control`, Google Context Caching, and OpenAI-compatible APIs (including prompt_cache_key for OpenAI load balancer routing)
- **Response Caching**: 20-30% cache hit rate with semantic deduplication across multiple backends (memory, Redis, SQLite, MySQL)
+- **Response Caching (Semantic Deduplication)**: 20-30% cache hit rate with intelligent request deduplication
+  - Multiple backends: In-memory LRU cache, Redis, SQLite, MySQL, file-based
+  - SHA256-based cache key generation for request deduplication
+  - TTL-based expiration with configurable timeouts
+  - Granular cache control at model, provider, rotation, and autoselect levels
+  - Cache statistics tracking (hits, misses, hit rate, evictions)
+  - Dashboard endpoints for cache management
 - **Smart Request Batching**: 15-25% latency reduction by batching similar requests within 100ms window with provider-specific configurations
 - **Streaming Response Optimization**: 10-20% memory reduction with chunk pooling, backpressure handling, and provider-specific streaming optimizations for Google and Kiro providers
+- **Token Usage Analytics**: Comprehensive analytics dashboard with charts, cost estimation, performance tracking, and export functionality
 - **SSL/TLS Support**: Built-in HTTPS support with Let's Encrypt integration and automatic certificate renewal
 - **Self-Signed Certificates**: Automatic generation of self-signed certificates for development/testing
- **TOR Hidden Service**: Full support for exposing AISBF over TOR network as a hidden service
+- **TOR Hidden Service**: Full support for exposing AISBF over TOR network as a hidden service (ephemeral and persistent)
 - **MCP Server**: Model Context Protocol server for remote agent configuration and model access (SSE and HTTP streaming)
 - **Persistent Database**: SQLite/MySQL-based tracking of token usage, context dimensions, and model embeddings with automatic cleanup
 - **Multi-User Support**: User management with isolated configurations, role-based access control, and API token management
+- **User-Specific API Endpoints**: Dedicated API endpoints for authenticated users to access their own configurations
 - **Database Integration**: SQLite/MySQL-based persistent storage for user configurations, token usage tracking, and context management
 - **User-Specific Configurations**: Each user can have their own providers, rotations, and autoselect configurations stored in the database
- **Flexible Caching**: SQLite/MySQL/Redis/file/memory-based caching system for model embeddings and other cached data with automatic fallback
+- **Flexible Caching System**: Multi-backend caching for model embeddings and performance optimization
+  - Redis: High-performance distributed caching for production
+  - SQLite/MySQL: Persistent database-backed caching
+  - File-based: Legacy local file storage
+  - Memory: In-memory caching for development
+  - Automatic fallback between backends
+  - Configurable TTL per data type
+- **Proxy-Awareness**: Full support for reverse proxy deployments with automatic URL generation and subpath support

 ## Author

@@ -107,6 +124,7 @@ See [`PYPI.md`](PYPI.md) for detailed instructions on publishing to PyPI.
 - Google (google-genai)
 - OpenAI and openai-compatible endpoints (openai)
 - Anthropic (anthropic)
+- Claude Code (OAuth2 authentication via claude.ai)
 - Ollama (direct HTTP)
 - Kiro (Amazon Q Developer / AWS CodeWhisperer)
 ## Configuration
@@ -256,6 +274,98 @@ http://your-onion-address.onion/
 - Monitor access logs for suspicious activity
 - Keep TOR and AISBF updated

+### Claude OAuth2 Authentication
+
+AISBF supports Claude Code (claude.ai) as a provider using OAuth2 authentication with automatic token refresh:
+
+#### Features
+- Full OAuth2 PKCE flow matching official claude-cli
+- Automatic token refresh with refresh token rotation
+- Chrome extension for remote server OAuth2 callback interception
+- Dashboard integration with authentication UI
+- Credentials stored in `~/.aisbf/claude_credentials.json`
+- Optional curl_cffi TLS fingerprinting for Cloudflare bypass
+- Compatible with official claude-cli credentials
+
+#### Setup
+1. Add Claude provider to configuration (via dashboard or `~/.aisbf/providers.json`)
+2. For remote servers: Install Chrome extension (download from dashboard)
+3. Click "Authenticate with Claude" in dashboard
+4. Log in with your Claude account
+5. Use Claude models via API: `claude/claude-3-7-sonnet-20250219`
+
+#### Configuration Example
+```json
+{
+  "providers": {
+    "claude": {
+      "id": "claude",
+      "name": "Claude Code (OAuth2)",
+      "endpoint": "https://api.anthropic.com/v1",
+      "type": "claude",
+      "api_key_required": false,
+      "claude_config": {
+        "credentials_file": "~/.aisbf/claude_credentials.json"
+      },
+      "models": [
+        {
+          "name": "claude-3-7-sonnet-20250219",
+          "context_size": 200000
+        }
+      ]
+    }
+  }
+}
+```
+
+See [`CLAUDE_OAUTH2_SETUP.md`](CLAUDE_OAUTH2_SETUP.md) for detailed setup instructions and [`CLAUDE_OAUTH2_DEEP_DIVE.md`](CLAUDE_OAUTH2_DEEP_DIVE.md) for technical details.
+
+### Response Caching (Semantic Deduplication)
+
+AISBF includes an intelligent response caching system that deduplicates similar requests to reduce API costs and latency:
+
+#### Supported Cache Backends
+- **Memory (LRU)**: In-memory cache with LRU eviction, fast but ephemeral
+- **Redis**: High-performance distributed caching, recommended for production
+- **SQLite**: Persistent local database caching
+- **MySQL**: Network database caching for multi-server deployments
+
+#### Features
+- **SHA256-based Deduplication**: Generates cache keys from request content for intelligent deduplication
+- **TTL-based Expiration**: Configurable timeout (default: 600 seconds)
+- **LRU Eviction**: Automatic eviction of least recently used entries (memory backend)
+- **Cache Statistics**: Tracks hits, misses, hit rate, and evictions
+- **Granular Control**: Enable/disable caching at model, provider, rotation, or autoselect level
+- **Hierarchical Configuration**: Model > Provider > Rotation > Autoselect > Global
+- **Dashboard Management**: View statistics and clear cache via dashboard
+- **Streaming Skip**: Automatically skips caching for streaming requests
+
+#### Configuration
+
+**Via Dashboard:**
+1. Navigate to Dashboard → Settings → Response Cache
+2. Enable response caching
+3. Select cache backend (Memory, Redis, SQLite, MySQL)
+4. Configure TTL and max size
+5. Save settings and restart server
+
+**Via Configuration File:**
+Edit `~/.aisbf/aisbf.json`:
+```json
+{
+  "response_cache": {
+    "enabled": true,
+    "backend": "redis",
+    "ttl": 600,
+    "max_size": 1000,
+    "redis_host": "localhost",
+    "redis_port": 6379
+  }
+}
+```
+
+**Cache Hit Rate:** Typically achieves 20-30% cache hit rate in production workloads, significantly reducing API costs and latency.
+
 ### Database Configuration

 AISBF supports multiple database backends for persistent storage of configurations, token usage tracking, and context management:

--- a/aisbf/claude_auth.py
+++ b/aisbf/claude_auth.py
@@ -74,11 +74,12 @@ def _generate_client_id():
    # Generate UUID5 (name-based) from the machine ID
    return str(uuid.uuid5(uuid.NAMESPACE_DNS, machine_id))

-# Use the provided client ID for Claude OAuth2
-CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"
-AUTH_URL = "https://claude.ai/oauth/authorize"
-TOKEN_URL = "https://api.anthropic.com/v1/oauth/token"  # Correct endpoint from CLIProxyAPI
-REDIRECT_URI = "http://localhost:54545/callback"
+# Claude OAuth2 Configuration
+# These values match the official claude-cli implementation
+CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"  # Official Claude Code client ID
+AUTH_URL = "https://claude.ai/oauth/authorize"  # Authorization endpoint
+TOKEN_URL = "https://api.anthropic.com/v1/oauth/token"  # Token exchange endpoint
+REDIRECT_URI = "http://localhost:54545/callback"  # OAuth2 callback URI
 CLI_USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

 logger = logging.getLogger(__name__)