Commit f64585f1 authored by Your Name's avatar Your Name

docs: Comprehensive documentation update with all missing features

- Updated CHANGELOG.md with complete feature list including:
  * Claude OAuth2 provider with PKCE flow and automatic token refresh
  * Response caching with semantic deduplication (Memory/Redis/SQLite/MySQL)
  * Model embeddings cache with multiple backends
  * User-specific API endpoints and MCP enhancements
  * Adaptive rate limiting and token usage analytics
  * Smart request batching and streaming optimization
  * All performance features and bug fixes

- Enhanced README.md with:
  * Claude OAuth2 authentication section with setup guide
  * Response caching details with all backends and deduplication
  * Flexible caching system with Redis/MySQL/SQLite/File/Memory
  * Updated key features with expanded descriptions
  * Configuration examples for all caching systems

- Updated DOCUMENTATION.md with:
  * Claude Code provider in Provider Support section
  * Enhanced provider descriptions with caching capabilities
  * Reference to Claude OAuth2 setup documentation

- Enhanced CLAUDE_OAUTH2_SETUP.md with key features list

- Added clarifying comments to aisbf/claude_auth.py

All documentation now accurately reflects the codebase with complete
coverage of caching systems (response cache and model embeddings cache),
request deduplication via SHA256, and all implemented features.
parent a33c622b
# AISBF Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- **User-Specific API Endpoints**: New API endpoints for authenticated users to access their own configurations
- `GET /api/user/models` - List user's own models
......@@ -17,6 +23,29 @@
- Admin users get access to both global and user tools
- Regular users get access to user-only tools
- **Dashboard API Documentation**: User dashboard now includes comprehensive API endpoint documentation
- **Model Metadata Extraction**: Automatic extraction of pricing and rate limit information from provider responses
- `rate_multiplier` - Cost multiplier for the model
- `rate_unit` - Pricing unit (e.g., "per million tokens")
- `prompt_tokens` - Tokens used in prompt
- `completion_tokens` - Tokens used in completion
- Auto-configure rate limits on 429 responses with retry-after headers
- **Enhanced Model Metadata**: Extended model information fields
- `top_provider` - Primary provider for the model
- `pricing` - Detailed pricing information (prompt/completion costs)
- `description` - Model description
- `supported_parameters` - List of supported API parameters
- `architecture` - Model architecture details
- Dashboard "Get Models" button to fetch and display model metadata
- **Analytics Filtering**: Filter analytics by provider, model, rotation, and autoselect
- Dropdown filters in analytics dashboard
- Real-time chart updates based on selected filters
- Export filtered data to JSON/CSV
- **Admin User Management**: Complete user management system in dashboard
- Create, edit, and delete users
- Role-based access control (admin/user roles)
- Password management
- User token management
- View user statistics and usage
- **Adaptive Rate Limiting**: Intelligent rate limit management that learns from 429 responses
- Per-provider adaptive rate limiters with learning capability
- Exponential backoff with jitter (configurable base and jitter factor)
......@@ -38,33 +67,20 @@
- Support for rotation_id and autoselect_id tracking
- Real-time request counts and latency tracking
- Error rates and types tracking
- OpenRouter-style extended fields to Model class (description, context_length, architecture, pricing, top_provider, supported_parameters, default_parameters)
- Web dashboard section to README with screenshot reference
- Comprehensive dashboard documentation including features and access information
- Kiro AWS Event Stream parsing, converters, and TODO roadmap
- Credential validation for kiro/kiro-cli providers
- TOR hidden service support with persistent/ephemeral options
- MCP (Model Context Protocol) server endpoint
- Proxy-awareness with configurable error cooldown features
- Kiro provider integration
- **Database Configuration**: Support for SQLite and MySQL backends with automatic table creation and migration
- **Flexible Caching System**: Redis, file-based, and memory caching backends for model embeddings and API responses
- **Cache Abstraction Layer**: Unified caching interface with automatic fallback and configurable TTL
- **Redis Cache Support**: High-performance distributed caching for production deployments
- **Database Manager Updates**: Multi-database support with SQL syntax adaptation between SQLite and MySQL
- **Cache Manager**: Configurable cache backends with SQLite, MySQL, Redis, file-based, and memory options with automatic fallback
- **Response Caching (Semantic Deduplication)**: Intelligent response caching system with multiple backend support
- Multiple backends: In-memory LRU cache, Redis, SQLite, MySQL
- SHA256-based cache key generation for request deduplication
- TTL-based expiration (default: 600 seconds)
- LRU eviction for memory backend with configurable max size
- Cache statistics tracking (hits, misses, hit rate, evictions)
- Dashboard endpoints for cache statistics and clearing
- Granular cache control at model, provider, rotation, and autoselect levels
- Hierarchical configuration: Model > Provider > Rotation > Autoselect > Global
- Automatic cache initialization on startup
- Skip caching for streaming requests
- Comprehensive test suite with 6 test scenarios
- **Streaming Response Optimization**: Memory-efficient streaming with provider-specific optimizations
- Chunk Pooling: Reuses chunk objects to reduce memory allocations
- Backpressure Handling: Flow control to prevent overwhelming consumers
- Google Delta Calculation: Only sends new text since last chunk
- Kiro SSE Parsing: Optimized SSE parser with reduced string allocations
- OptimizedTextAccumulator: Memory-efficient text accumulation with truncation
- Configurable optimization settings via StreamingConfig
- 10-20% memory reduction in streaming operations
- **Smart Request Batching**: Intelligent request batching for improved performance
- Batches similar requests within configurable time window (default: 100ms)
- Provider-specific batch configurations
- Automatic batch size optimization
- 15-25% latency reduction for similar concurrent requests
- Configurable via aisbf.json with batch_window, max_batch_size, etc.
- **Enhanced Context Condensation**: 8 condensation methods for intelligent token reduction
- Hierarchical: Separates context into persistent, middle (summarized), and active sections
- Conversational: Summarizes old messages using LLM or internal model
......@@ -80,64 +96,257 @@
- Adaptive condensation based on context size
- Condensation method chaining
- Condensation bypass for short contexts
- **Streaming Response Optimization**: Memory-efficient streaming with provider-specific optimizations
- Chunk Pooling: Reuses chunk objects to reduce memory allocations
- Backpressure Handling: Flow control to prevent overwhelming consumers
- Google Delta Calculation: Only sends new text since last chunk
- Kiro SSE Parsing: Optimized SSE parser with reduced string allocations
- OptimizedTextAccumulator: Memory-efficient text accumulation with truncation
- Configurable optimization settings via StreamingConfig
- **Response Caching (Semantic Deduplication)**: Intelligent response caching system with multiple backend support
- Multiple backends: In-memory LRU cache, Redis, SQLite, MySQL
- SHA256-based cache key generation for request deduplication
- TTL-based expiration (default: 600 seconds)
- LRU eviction for memory backend with configurable max size
- Cache statistics tracking (hits, misses, hit rate, evictions)
- Dashboard endpoints for cache statistics and clearing
- Granular cache control at model, provider, rotation, and autoselect levels
- Hierarchical configuration: Model > Provider > Rotation > Autoselect > Global
- Automatic cache initialization on startup
- Skip caching for streaming requests
- 20-30% cache hit rate in typical usage
- **Provider-Native Caching**: 50-70% cost reduction using provider-specific caching mechanisms
- Anthropic `cache_control` with `{"type": "ephemeral"}`
- Google Context Caching API (`cached_contents.create`)
- OpenAI automatic prefix caching (no code change needed)
- OpenRouter `cache_control` (wraps Anthropic)
- DeepSeek automatic caching in 64-token chunks
- Configurable via `enable_native_caching`, `min_cacheable_tokens`, `cache_ttl`
- Optional `prompt_cache_key` for OpenAI load balancer routing optimization
- **Claude OAuth2 Provider**: Full OAuth2 PKCE authentication for Claude Code (claude.ai)
- ClaudeAuth class (`aisbf/claude_auth.py`) implementing OAuth2 PKCE flow
- ClaudeProviderHandler for Claude API integration
- Automatic token refresh with refresh token rotation
- Chrome extension for remote server OAuth2 callback interception
- Dashboard integration with authentication UI
- Credentials stored in `~/.aisbf/claude_credentials.json`
- Support for curl_cffi TLS fingerprinting (optional, for Cloudflare bypass)
- Compatible with official claude-cli credentials
- OAuth2 endpoints: `/dashboard/claude/auth/start`, `/dashboard/claude/auth/complete`, `/dashboard/claude/auth/status`
- Extension endpoints: `/dashboard/extension/download`, `/dashboard/oauth2/callback`
- Comprehensive documentation in CLAUDE_OAUTH2_SETUP.md and CLAUDE_OAUTH2_DEEP_DIVE.md
- **Kiro Provider Integration**: Native support for Kiro (Amazon Q Developer / AWS CodeWhisperer)
- KiroAuth class (`aisbf/kiro_auth.py`) for AWS credential management
- Support for multiple authentication methods:
- Kiro IDE credentials file (`~/.config/Code/User/globalStorage/amazon.q/credentials.json`)
- kiro-cli SQLite database
- Direct refresh token with AWS SSO OIDC
- Kiro converters (`aisbf/kiro_converters.py`, `aisbf/kiro_converters_openai.py`) for request/response transformation
- Kiro parsers (`aisbf/kiro_parsers.py`) for AWS Event Stream parsing
- Kiro models (`aisbf/kiro_models.py`) for model definitions
- Kiro utilities (`aisbf/kiro_utils.py`) for helper functions
- Dashboard support for kiro-specific configuration fields
- Credential validation for kiro/kiro-cli providers
- Streaming support with AWS Event Stream parsing
- Tool calling support with proper finalization
- **TOR Hidden Service Support**: Full support for exposing AISBF over TOR network
- TorHiddenService class (`aisbf/tor.py`) for managing TOR connections
- TorConfig model in config.py for TOR configuration management
- Support for both ephemeral (temporary) and persistent (fixed onion address) hidden services
- Dashboard TOR configuration UI with real-time status display
- "Create Persistent" button to convert ephemeral to persistent service
- MCP `get_tor_status` tool for monitoring TOR hidden service status (fullconfig access required)
- Automatic TOR service initialization on startup when enabled
- Proper cleanup on shutdown to remove ephemeral services
- All AISBF endpoints (API, dashboard, MCP) accessible over TOR network
- Configurable via aisbf.json or dashboard settings
- **MCP (Model Context Protocol) Server**: Complete MCP server implementation
- SSE endpoint: `GET /mcp` - Server-Sent Events for MCP communication
- HTTP endpoint: `POST /mcp` - Direct HTTP transport for MCP
- MCP tools for model access, configuration management, and system control
- User authentication support with role-based tool access
- Admin users get full access to global and user tools
- Regular users get access to user-only tools
- Dashboard MCP settings and documentation
- **Multi-User Database Integration**: Comprehensive multi-user support with persistent storage
- SQLite/MySQL database backends with automatic table creation and migration
- User management with role-based access control (admin/user roles)
- Isolated configurations per user (providers, rotations, autoselects)
- API token management with usage tracking
- Token usage tracking and analytics per user
- Automatic database cleanup with configurable retention periods
- Dashboard user management interface (admin only)
- User dashboard for personal configuration and usage statistics
- **Flexible Caching System**: Multi-backend caching for improved performance
- Redis cache support for high-performance distributed caching
- SQLite/MySQL cache backends for persistent caching
- File-based cache for legacy compatibility
- Memory cache for ephemeral caching
- Automatic fallback between cache backends
- Configurable TTL per data type
- Cache for model embeddings, provider models, and other cached data
- **NSFW/Privacy Content Filtering**: Automatic content classification and model routing
- Models can be flagged with `nsfw` and `privacy` boolean flags
- Automatic analysis of last 3 user messages for content classification
- Routes requests only to appropriate models based on content
- Returns 404 if no suitable models available
- Configurable classification windows
- Global enable/disable via `classify_nsfw` and `classify_privacy` settings
- **Semantic Model Selection**: Fast hybrid BM25 + semantic search for autoselect
- Uses sentence transformers for content understanding
- Combines keyword matching with semantic similarity
- Automatic model library indexing and caching
- Faster than AI-based selection (no API calls)
- Lower costs (no tokens consumed)
- Deterministic results based on content similarity
- Automatic fallback to AI selection if semantic fails
- Enable via `classify_semantic: true` in autoselect config
- **OpenRouter-Style Extended Fields**: Enhanced model metadata
- `description` - Model description
- `context_length` - Maximum context size
- `architecture` - Model architecture details
- `pricing` - Detailed pricing information
- `top_provider` - Primary provider
- `supported_parameters` - List of supported API parameters
- `default_parameters` - Default parameter values
- **Proxy-Awareness**: Full support for reverse proxy deployments
- ProxyHeadersMiddleware for automatic proxy header detection
- Supports X-Forwarded-Proto, X-Forwarded-Host, X-Forwarded-Port, X-Forwarded-Prefix, X-Forwarded-For
- Automatic URL generation based on proxy configuration
- Template integration with proxy-aware url_for() function
- Support for subpath deployments
- Comprehensive nginx configuration examples in DOCUMENTATION.md
- **Configurable Error Cooldown**: Customizable cooldown periods after consecutive failures
- `error_cooldown` field in Model class for model-specific cooldown
- `default_error_cooldown` field in ProviderConfig for provider-level defaults
- `default_error_cooldown` field in RotationConfig for rotation-level defaults
- Cascading configuration: model > provider > rotation > system default (300 seconds)
- Replaces hardcoded 5-minute cooldown with flexible configuration
- **SSL/TLS Support**: Built-in HTTPS support with automatic certificate management
- Self-signed certificate generation for development/testing
- Let's Encrypt integration with automatic certificate generation and renewal
- Automatic certificate expiry checking on startup
- Renewal when certificates expire within 30 days
- Dashboard SSL/TLS configuration UI
- **Web Dashboard Enhancements**: Comprehensive web-based management interface
- Provider management with API key configuration
- Rotation configuration with weighted load balancing
- Autoselect configuration with AI-powered selection
- Server settings management (SSL/TLS, authentication, TOR)
- User management (admin only)
- Token usage analytics with charts and export
- Rate limits dashboard with adaptive learning
- Cache statistics and management
- Real-time monitoring and status display
- Collapsible UI sections for better organization
- **CLI Argument Support**: Command-line arguments for server configuration
- Port configuration via `--port` argument
- Host binding via `--host` argument
- Default port changed to 17765
- **Intelligent 429 Rate Limit Handling**: Automatic rate limit detection and configuration
- Parses retry-after headers from 429 responses
- Auto-configures rate limits based on provider responses
- Exponential backoff with jitter
- Configurable retry strategies
### Fixed
- Model class now supports OpenRouter metadata fields preventing crashes in models list API
- Aligned Model class with ProviderModelConfig, RotationConfig, and AutoselectConfig field definitions
- Premature tool call finalization in Kiro streaming responses
- Kiro credential validation to handle dict-based config
- Template session references for Python 3.13 compatibility
- Python 3.13 compatibility issue with Jinja2 template caching
- Ollama Provider Handler initialization
- PyPI package: include mcp.py, tor.py and kiro modules in distribution
- **Model Class Compatibility**: Model class now supports OpenRouter metadata fields preventing crashes in models list API
- **Field Alignment**: Aligned Model class with ProviderModelConfig, RotationConfig, and AutoselectConfig field definitions
- **Kiro Streaming**: Fixed premature tool call finalization in Kiro streaming responses
- **Kiro Credentials**: Fixed credential validation to handle dict-based config
- **Python 3.13 Compatibility**: Fixed template session references and Jinja2 template caching for Python 3.13
- **Ollama Provider**: Fixed Ollama Provider Handler initialization
- **PyPI Package**: Include mcp.py, tor.py and kiro modules in distribution
- **Google Tool Calling**: Fixed Google provider tool formatting and tool call extraction in streaming responses
- **Streaming Error Responses**: Fixed error response handling in streaming mode
- **Rotation Error Messages**: Improved error messages when no models are available in rotation
- **Assistant Wrapper Pattern**: Handle assistant wrapper pattern in streaming responses
- **Tool Call Parsing**: Robust JSON extraction for tool calls in streaming responses
- **Unicode Handling**: Decode unicode escape sequences in tool JSON
- **Error Message Formatting**: Improved error message formatting with bold text and JSON pretty printing
- **HTTP Status Codes**: Use appropriate status codes (429 vs 503) based on notifyerrors configuration
- **Duplicate Error Messages**: Skip first line of error_details to avoid duplication
### Changed
- Improved venv handling to use system-installed aisbf package
- Auto-update venv feature on pip package upgrade
- Default port changed to 17765
- Intelligent 429 rate limit handling and improved configuration
- Automatic --break-system-packages detection in build.sh
- **Virtual Environment Handling**: Improved venv handling to use system-installed aisbf package
- **Auto-Update Feature**: Auto-update venv on pip package upgrade
- **Default Port**: Changed default port from 8000 to 17765
- **Build Script**: Automatic --break-system-packages detection in build.sh
- **Configuration Architecture**: Centralized API key storage in providers.json
- API keys stored only in provider definitions
- Rotation and autoselect configurations reference providers by name only
- Provider-only entries in rotations (no models specified) randomly select from provider's models
- Default settings support at provider and rotation levels
- Settings priority: model-specific > rotation defaults > provider defaults
- **Error Handling**: Always return formatted error responses for rotation providers with appropriate status codes
## [0.8.0] - 2026-03-XX
### Added
- Smart Request Batching with 15-25% latency reduction
- Provider-specific batch configurations
- Automatic batch size optimization
## [0.7.0] - 2026-03-XX
### Added
- Enhanced Context Condensation with 8 methods
- Condensation analytics tracking
- Internal model improvements with warm-up functionality
## [0.6.0] - 2026-03-XX
### Added
- Response Caching with semantic deduplication
- Multiple cache backends (memory, Redis, SQLite, MySQL)
- Cache statistics and management dashboard
## [0.5.0] - 2026-03-XX
### Added
- TOR Hidden Service support
- Ephemeral and persistent hidden services
- Dashboard TOR configuration UI
## [0.4.0] - 2026-02-XX
### Added
- Configuration refactoring with centralized API key storage
- Autoselect enhancements with improved prompt structure
- Provider-level default settings
## [0.3.3] - 2026-02-XX
### Added
- Improved error messages when no models are available in rotation
- notifyerrors configuration to rotations
## [0.2.7] - 2026-02-07
### Added
- max_request_tokens support for automatic request splitting
- Token counting utilities using tiktoken and langchain-text-splitters
- Automatic request splitting when exceeding token limits
## [0.2.6] - 2026-02-06
### Added
- Comprehensive API endpoint documentation in README.md and DOCUMENTATION.md
- Detailed sections for General, Provider, Rotation, and Autoselect endpoints
- Documentation for rotation load balancing and AI-assisted autoselect
## [0.1.2] - 2026-02-06
### Changed
- Updated version from 0.1.1 to 0.1.2 for PyPI release
- Changed system installation path from /usr/local/share/aisbf to /usr/share/aisbf
- Updated aisbf.sh script to dynamically determine correct paths at runtime
- Script now checks for /usr/share/aisbf first, then falls back to ~/.local/share/aisbf
- Updated setup.py to install script with dynamic path detection
- Updated config.py to check for /usr/share/aisbf instead of /usr/local/share/aisbf
- Updated AI.PROMPT documentation to reflect new installation paths
- Script creates venv in appropriate location based on installation type
- Ensures proper main.py location is used regardless of who launches the script
- System installation path from /usr/local/share/aisbf to /usr/share/aisbf
- aisbf.sh script to dynamically determine correct paths at runtime
- Script checks for /usr/share/aisbf first, then falls back to ~/.local/share/aisbf
- config.py to check for /usr/share/aisbf instead of /usr/local/share/aisbf
### Added
- Comprehensive logging module with rotating file handlers
- Log files stored in /var/log/aisbf when launched by root
- Log files stored in ~/.local/var/log/aisbf when launched by user
- Automatic log directory creation if it doesn't exist
- Log files stored in /var/log/aisbf (root) or ~/.local/var/log/aisbf (user)
- Automatic log directory creation
- Rotating file handlers with 50MB max file size and 5 backup files
- Separate log files for general logs (aisbf.log) and error logs (aisbf_error.log)
- stdout and stderr output duplicated to rotating log files
- Console logging for immediate feedback
- Logging configuration in main.py with proper setup function
- Updated aisbf.sh script to redirect output to log files
- Updated setup.py to include logging configuration in installed script
## [0.1.1] - 2026-02-06
### Changed
- Updated version from 0.1.0 to 0.1.1 for PyPI release
- Version bump for PyPI release
## [0.1.0] - 2026-02-06
### Initial Release
- First public release of AISBF
- Complete AI Service Broker Framework
- Support for multiple AI providers
- Support for multiple AI providers (Google, OpenAI, Anthropic, Ollama)
- Provider rotation and error tracking
- Comprehensive configuration management
\ No newline at end of file
- Comprehensive configuration management
- Web dashboard for configuration
- Streaming support
- Rate limiting and error handling
- OpenAI-compatible API endpoints
......@@ -2,7 +2,15 @@
## Overview
AISBF now supports Claude Code (claude.ai) as a provider using OAuth2 authentication. This implementation mimics the official Claude CLI authentication flow and includes a Chrome extension to handle OAuth2 callbacks when AISBF runs on a remote server.
AISBF supports Claude Code (claude.ai) as a provider using OAuth2 authentication with automatic token refresh. This implementation matches the official Claude CLI authentication flow and includes a Chrome extension to handle OAuth2 callbacks when AISBF runs on a remote server.
**Key Features:**
- Full OAuth2 PKCE flow matching official claude-cli
- Automatic token refresh with refresh token rotation
- Chrome extension for remote server OAuth2 callback interception
- Dashboard integration with authentication UI
- Optional curl_cffi TLS fingerprinting for Cloudflare bypass
- Compatible with official claude-cli credentials
## Architecture
......
......@@ -2,11 +2,12 @@
## Overview
AISBF is a modular proxy server for managing multiple AI provider integrations. It provides a unified API interface for interacting with various AI services (Google, OpenAI, Anthropic, Ollama) with support for provider rotation, AI-assisted model selection, and error tracking.
AISBF is a modular proxy server for managing multiple AI provider integrations. It provides a unified API interface for interacting with various AI services (Google, OpenAI, Anthropic, Claude Code, Ollama, Kiro) with support for provider rotation, AI-assisted model selection, and error tracking.
### Key Features
- **Multi-Provider Support**: Unified interface for Google, OpenAI, Anthropic, Ollama, and Kiro (Amazon Q Developer)
- **Multi-Provider Support**: Unified interface for Google, OpenAI, Anthropic, Claude Code (OAuth2), Ollama, and Kiro (Amazon Q Developer)
- **Claude OAuth2 Authentication**: Full OAuth2 PKCE flow for Claude Code with automatic token refresh, Chrome extension for remote servers, and curl_cffi TLS fingerprinting support
- **Rotation Models**: Intelligent load balancing across multiple providers with weighted model selection and automatic failover
- **Autoselect Models**: AI-powered model selection that analyzes request content to route to the most appropriate specialized model
- **Streaming Support**: Full support for streaming responses from all providers with proper serialization
......@@ -633,16 +634,31 @@ AISBF supports the following AI providers:
- Uses google-genai SDK
- Requires API key
- Supports streaming and non-streaming responses
- Context Caching API support for cost reduction
### OpenAI
- Uses openai SDK
- Requires API key
- Supports streaming and non-streaming responses
- Automatic prefix caching (no configuration needed)
### Anthropic
- Uses anthropic SDK
- Requires API key
- Static model list (no dynamic model discovery)
- cache_control support for cost reduction
### Claude Code (OAuth2)
- Full OAuth2 PKCE authentication flow
- Automatic token refresh with refresh token rotation
- Chrome extension for remote server OAuth2 callback interception
- Dashboard integration with authentication UI
- Credentials stored in `~/.aisbf/claude_credentials.json`
- Optional curl_cffi TLS fingerprinting for Cloudflare bypass
- Compatible with official claude-cli credentials
- Access to latest Claude models (3.7 Sonnet, 3.5 Sonnet, 3.5 Haiku, etc.)
- Supports streaming, tool calling, vision, and all Claude features
- See [`CLAUDE_OAUTH2_SETUP.md`](CLAUDE_OAUTH2_SETUP.md) for setup instructions
### Ollama
- Uses direct HTTP API
......
......@@ -21,13 +21,15 @@ Access the dashboard at `http://localhost:17765/dashboard` (default credentials:
## Key Features
- **Multi-Provider Support**: Unified interface for Google, OpenAI, Anthropic, Ollama, and Kiro (Amazon Q Developer)
- **Multi-Provider Support**: Unified interface for Google, OpenAI, Anthropic, Ollama, Kiro (Amazon Q Developer), and Claude Code (OAuth2)
- **Claude OAuth2 Authentication**: Full OAuth2 PKCE flow for Claude Code with automatic token refresh and Chrome extension for remote servers
- **Rotation Models**: Weighted load balancing across multiple providers with automatic failover
- **Autoselect Models**: AI-powered model selection based on content analysis and request characteristics
- **Semantic Classification**: Fast hybrid BM25 + semantic model selection using sentence transformers (optional)
- **Content Classification**: NSFW/privacy content filtering with configurable classification windows
- **Streaming Support**: Full support for streaming responses from all providers
- **Error Tracking**: Automatic provider disabling after consecutive failures with cooldown periods
- **Error Tracking**: Automatic provider disabling after consecutive failures with configurable cooldown periods
- **Adaptive Rate Limiting**: Intelligent rate limit management that learns from 429 responses with exponential backoff and gradual recovery
- **Rate Limiting**: Built-in rate limiting and graceful error handling
- **Request Splitting**: Automatic splitting of large requests when exceeding `max_request_tokens` limit
- **Token Rate Limiting**: Per-model token usage tracking with TPM (tokens per minute), TPH (tokens per hour), and TPD (tokens per day) limits
......@@ -37,18 +39,33 @@ Access the dashboard at `http://localhost:17765/dashboard` (default credentials:
- **Effective Context Tracking**: Reports total tokens used (effective_context) for every request
- **Enhanced Context Condensation**: 8 condensation methods including hierarchical, conversational, semantic, algorithmic, sliding window, importance-based, entity-aware, and code-aware condensation
- **Provider-Native Caching**: 50-70% cost reduction using Anthropic `cache_control`, Google Context Caching, and OpenAI-compatible APIs (including prompt_cache_key for OpenAI load balancer routing)
- **Response Caching**: 20-30% cache hit rate with semantic deduplication across multiple backends (memory, Redis, SQLite, MySQL)
- **Response Caching (Semantic Deduplication)**: 20-30% cache hit rate with intelligent request deduplication
- Multiple backends: In-memory LRU cache, Redis, SQLite, MySQL, file-based
- SHA256-based cache key generation for request deduplication
- TTL-based expiration with configurable timeouts
- Granular cache control at model, provider, rotation, and autoselect levels
- Cache statistics tracking (hits, misses, hit rate, evictions)
- Dashboard endpoints for cache management
- **Smart Request Batching**: 15-25% latency reduction by batching similar requests within 100ms window with provider-specific configurations
- **Streaming Response Optimization**: 10-20% memory reduction with chunk pooling, backpressure handling, and provider-specific streaming optimizations for Google and Kiro providers
- **Token Usage Analytics**: Comprehensive analytics dashboard with charts, cost estimation, performance tracking, and export functionality
- **SSL/TLS Support**: Built-in HTTPS support with Let's Encrypt integration and automatic certificate renewal
- **Self-Signed Certificates**: Automatic generation of self-signed certificates for development/testing
- **TOR Hidden Service**: Full support for exposing AISBF over TOR network as a hidden service
- **TOR Hidden Service**: Full support for exposing AISBF over TOR network as a hidden service (ephemeral and persistent)
- **MCP Server**: Model Context Protocol server for remote agent configuration and model access (SSE and HTTP streaming)
- **Persistent Database**: SQLite/MySQL-based tracking of token usage, context dimensions, and model embeddings with automatic cleanup
- **Multi-User Support**: User management with isolated configurations, role-based access control, and API token management
- **User-Specific API Endpoints**: Dedicated API endpoints for authenticated users to access their own configurations
- **Database Integration**: SQLite/MySQL-based persistent storage for user configurations, token usage tracking, and context management
- **User-Specific Configurations**: Each user can have their own providers, rotations, and autoselect configurations stored in the database
- **Flexible Caching**: SQLite/MySQL/Redis/file/memory-based caching system for model embeddings and other cached data with automatic fallback
- **Flexible Caching System**: Multi-backend caching for model embeddings and performance optimization
- Redis: High-performance distributed caching for production
- SQLite/MySQL: Persistent database-backed caching
- File-based: Legacy local file storage
- Memory: In-memory caching for development
- Automatic fallback between backends
- Configurable TTL per data type
- **Proxy-Awareness**: Full support for reverse proxy deployments with automatic URL generation and subpath support
## Author
......@@ -107,6 +124,7 @@ See [`PYPI.md`](PYPI.md) for detailed instructions on publishing to PyPI.
- Google (google-genai)
- OpenAI and openai-compatible endpoints (openai)
- Anthropic (anthropic)
- Claude Code (OAuth2 authentication via claude.ai)
- Ollama (direct HTTP)
- Kiro (Amazon Q Developer / AWS CodeWhisperer)
## Configuration
......@@ -256,6 +274,98 @@ http://your-onion-address.onion/
- Monitor access logs for suspicious activity
- Keep TOR and AISBF updated
### Claude OAuth2 Authentication
AISBF supports Claude Code (claude.ai) as a provider using OAuth2 authentication with automatic token refresh:
#### Features
- Full OAuth2 PKCE flow matching official claude-cli
- Automatic token refresh with refresh token rotation
- Chrome extension for remote server OAuth2 callback interception
- Dashboard integration with authentication UI
- Credentials stored in `~/.aisbf/claude_credentials.json`
- Optional curl_cffi TLS fingerprinting for Cloudflare bypass
- Compatible with official claude-cli credentials
#### Setup
1. Add Claude provider to configuration (via dashboard or `~/.aisbf/providers.json`)
2. For remote servers: Install Chrome extension (download from dashboard)
3. Click "Authenticate with Claude" in dashboard
4. Log in with your Claude account
5. Use Claude models via API: `claude/claude-3-7-sonnet-20250219`
#### Configuration Example
```json
{
"providers": {
"claude": {
"id": "claude",
"name": "Claude Code (OAuth2)",
"endpoint": "https://api.anthropic.com/v1",
"type": "claude",
"api_key_required": false,
"claude_config": {
"credentials_file": "~/.aisbf/claude_credentials.json"
},
"models": [
{
"name": "claude-3-7-sonnet-20250219",
"context_size": 200000
}
]
}
}
}
```
See [`CLAUDE_OAUTH2_SETUP.md`](CLAUDE_OAUTH2_SETUP.md) for detailed setup instructions and [`CLAUDE_OAUTH2_DEEP_DIVE.md`](CLAUDE_OAUTH2_DEEP_DIVE.md) for technical details.
### Response Caching (Semantic Deduplication)
AISBF includes an intelligent response caching system that deduplicates similar requests to reduce API costs and latency:
#### Supported Cache Backends
- **Memory (LRU)**: In-memory cache with LRU eviction, fast but ephemeral
- **Redis**: High-performance distributed caching, recommended for production
- **SQLite**: Persistent local database caching
- **MySQL**: Network database caching for multi-server deployments
#### Features
- **SHA256-based Deduplication**: Generates cache keys from request content for intelligent deduplication
- **TTL-based Expiration**: Configurable timeout (default: 600 seconds)
- **LRU Eviction**: Automatic eviction of least recently used entries (memory backend)
- **Cache Statistics**: Tracks hits, misses, hit rate, and evictions
- **Granular Control**: Enable/disable caching at model, provider, rotation, or autoselect level
- **Hierarchical Configuration**: Model > Provider > Rotation > Autoselect > Global
- **Dashboard Management**: View statistics and clear cache via dashboard
- **Streaming Skip**: Automatically skips caching for streaming requests
#### Configuration
**Via Dashboard:**
1. Navigate to Dashboard → Settings → Response Cache
2. Enable response caching
3. Select cache backend (Memory, Redis, SQLite, MySQL)
4. Configure TTL and max size
5. Save settings and restart server
**Via Configuration File:**
Edit `~/.aisbf/aisbf.json`:
```json
{
"response_cache": {
"enabled": true,
"backend": "redis",
"ttl": 600,
"max_size": 1000,
"redis_host": "localhost",
"redis_port": 6379
}
}
```
**Cache Hit Rate:** Typically achieves 20-30% cache hit rate in production workloads, significantly reducing API costs and latency.
### Database Configuration
AISBF supports multiple database backends for persistent storage of configurations, token usage tracking, and context management:
......
......@@ -74,11 +74,12 @@ def _generate_client_id():
# Generate UUID5 (name-based) from the machine ID
return str(uuid.uuid5(uuid.NAMESPACE_DNS, machine_id))
# Use the provided client ID for Claude OAuth2
CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"
AUTH_URL = "https://claude.ai/oauth/authorize"
TOKEN_URL = "https://api.anthropic.com/v1/oauth/token" # Correct endpoint from CLIProxyAPI
REDIRECT_URI = "http://localhost:54545/callback"
# Claude OAuth2 Configuration
# These values match the official claude-cli implementation
CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e" # Official Claude Code client ID
AUTH_URL = "https://claude.ai/oauth/authorize" # Authorization endpoint
TOKEN_URL = "https://api.anthropic.com/v1/oauth/token" # Token exchange endpoint
REDIRECT_URI = "http://localhost:54545/callback" # OAuth2 callback URI
CLI_USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
logger = logging.getLogger(__name__)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment