Release v0.9.2: Documentation updates and version bump

- Updated README.md with comprehensive documentation for new features: * User-Specific API Endpoints with Bearer token authentication * Adaptive Rate Limiting with learning from 429 responses * Model Metadata Extraction with automatic pricing/rate limit detection * Enhanced Analytics Filtering by provider/model/rotation * Updated Web Dashboard feature list - Updated DOCUMENTATION.md with detailed sections: * Adaptive Rate Limiting configuration and benefits * Model Metadata Extraction features and dashboard integration - Updated CHANGELOG.md: * Moved Unreleased section to version 0.9.2 (2026-04-03) * Added comprehensive list of new features and changes - Version bump to 0.9.2: * Updated pyproject.toml version * Updated aisbf/__init__.py version This release focuses on improving documentation coverage for recently added features including user-specific API endpoints, adaptive rate limiting, model metadata extraction, and analytics filtering.

Release v0.9.2: Documentation updates and version bump
- Updated README.md with comprehensive documentation for new features: * User-Specific API Endpoints with Bearer token authentication * Adaptive Rate Limiting with learning from 429 responses * Model Metadata Extraction with automatic pricing/rate limit detection * Enhanced Analytics Filtering by provider/model/rotation * Updated Web Dashboard feature list - Updated DOCUMENTATION.md with detailed sections: * Adaptive Rate Limiting configuration and benefits * Model Metadata Extraction features and dashboard integration - Updated CHANGELOG.md: * Moved Unreleased section to version 0.9.2 (2026-04-03) * Added comprehensive list of new features and changes - Version bump to 0.9.2: * Updated pyproject.toml version * Updated aisbf/__init__.py version This release focuses on improving documentation coverage for recently added features including user-specific API endpoints, adaptive rate limiting, model metadata extraction, and analytics filtering.
fbb49301 · Your Name · 252d45e4 · fbb49301 · 252d45e4 · 252d45e4
Commit fbb49301 authored Apr 03, 2026 by Your Name
13 changed files
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ## [Unreleased]

+## [0.9.2] - 2026-04-03
+
 ### Added
 - **User-Specific API Endpoints**: New API endpoints for authenticated users to access their own configurations
  - `GET /api/user/models` - List user's own models
@@ -56,6 +58,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  - Per-provider reset functionality and reset-all button
  - Configurable via aisbf.json with learning_rate, headroom_percent, recovery_rate, etc.
  - Integration with BaseProviderHandler.apply_rate_limit() and handle_429_error()
+
+### Changed
+- **Documentation Updates**: Updated README.md and DOCUMENTATION.md with comprehensive coverage of new features
+  - Enhanced User-Specific API Endpoints documentation
+  - Added Adaptive Rate Limiting configuration guide
+  - Updated Web Dashboard feature list
+  - Added Model Metadata Extraction details
+  - Improved Analytics Filtering documentation
+
+## [0.9.1] - 2026-03-XX
 - **Token Usage Analytics**: Comprehensive analytics dashboard for tracking token usage, costs, and performance
  - Analytics module (`aisbf/analytics.py`) with token usage tracking, cost estimation, and optimization recommendations
  - Dashboard page with charts for token usage over time (1h, 6h, 24h, 7d)

--- a/CLAUDE_OAUTH2_DEEP_DIVE.md
+++ b/CLAUDE_OAUTH2_DEEP_DIVE.md
-# Claude OAuth2 Authentication Deep Dive
-
-If you have ever typed `claude auth login` into a terminal and watched a browser tab pop open, you already know the surface-level experience. You sign in, something happens behind the scenes, and a moment later your terminal says you are authenticated. But what actually happened during those few seconds is a surprisingly detailed chain of cryptographic handshakes, HTTP exchanges, and local file writes that together form a complete OAuth 2.0 authorization-code flow with PKCE. This essay pulls that chain apart, link by link, so that when something inevitably breaks you will know exactly where to look.
-
-## The Problem
-
-Claude Code is a command-line tool. It runs in your terminal. But the credentials that prove your identity live on Anthropic's servers, and the only trusted way to prove you are who you say you are is through the same login page you would use in a browser on claude.ai. The terminal cannot render that login page. It cannot handle CAPTCHAs, two-factor prompts, or account selection screens. So the CLI needs a way to delegate the authentication step to a browser, get the result back, and then store that result locally for future use.
-
-OAuth 2.0 is the protocol that makes this delegation possible. It was designed for exactly this kind of situation: one application needs to act on behalf of a user, but the user's actual credentials should never pass through that application. Instead of your password, the CLI ends up with a pair of tokens, one short-lived access token and one longer-lived refresh token, that together let it make authenticated requests to Anthropic's API without ever knowing your password.
-
-PKCE, which stands for Proof Key for Code Exchange and is pronounced "pixy," is an extension to OAuth that protects against a specific class of attack. Without PKCE, if someone intercepted the authorization code during the redirect back to your machine, they could exchange it for tokens themselves. PKCE prevents that by tying the token exchange to a secret that only the original client knows.
-
-## Preparation
-
-Before the browser opens, the CLI has some prep work to do. It generates three values.
-
-The first is a PKCE verifier. This is a high-entropy random string, typically between 43 and 128 characters, drawn from the unreserved URI character set. Think of it as a one-time secret that the CLI creates and keeps to itself. In most implementations this is generated using a cryptographically secure random number generator, then base64url-encoded.
-
-The second value is the PKCE challenge. This is derived from the verifier by taking its SHA-256 hash and then base64url-encoding the result. The relationship between verifier and challenge is one-way: given the challenge, you cannot recover the verifier, but given the verifier, anyone can recompute the challenge. That asymmetry is the whole point.
-
-```python
-import hashlib, base64, os
-verifier = base64.urlsafe_b64encode(os.urandom(32)).rstrip(b"=").decode()
-challenge = base64.urlsafe_b64encode(
-    hashlib.sha256(verifier.encode()).digest()
-).rstrip(b"=").decode()
-```
-
-The third value is a random state parameter. This is an anti-CSRF measure. The CLI generates it, includes it in the authorize request, and later checks that the same value comes back in the callback. If it does not match, the response is discarded.
-
-## The Authorization Request
-
-With those three values ready, the CLI constructs a URL that points to Anthropic's authorization endpoint. For Claude Code, that endpoint is:
-
-```
-https://claude.ai/oauth/authorize
-```
-
-The URL includes several query parameters. The `client_id` identifies the application requesting access. For Claude Code, the observed client ID is `9d1c250a-e61b-44d9-88ed-5944d1962f5e`. The `response_type` is set to `code`, which tells the server this is an authorization-code flow rather than an implicit flow. The `redirect_uri` tells the server where to send the user after they authenticate. The `scope` parameter lists the permissions being requested. The `code_challenge` carries the PKCE challenge computed a moment ago, and `code_challenge_method` is set to `S256` to indicate that SHA-256 was used. Finally, the `state` parameter carries the random anti-CSRF value.
-
-A fully assembled authorize URL might look something like:
-
-```
-https://claude.ai/oauth/authorize?client_id=9d1c250a-e61b-44d9-88ed-5944d1962f5e&response_type=code&redirect_uri=http://localhost:54545/callback&code_challenge=...&code_challenge_method=S256&state=xyz123&scope=user:profile+user:inference+user:sessions:claude_code+user:mcp_servers
-```
-
-The CLI opens this URL in the user's default browser. From this point on, the CLI is waiting. It has started a tiny HTTP server on localhost, listening on a specific port (typically 54545), ready to catch the callback.
-
-## The Browser Flow
-
-What happens next is entirely in the browser. The user sees Anthropic's login page. They might enter an email and password, they might use a social login, they might go through a two-factor authentication step. The CLI has no visibility into any of this. It does not need to.
-
-Once the user successfully authenticates and grants consent, Anthropic's server constructs a redirect response. The redirect URL points back to the localhost address the CLI registered as its `redirect_uri`, and it includes two query parameters: `code` and `state`.
-
-```
-http://localhost:54545/callback?code=SplxlOBeZQQYbYS6WxSbIA&state=xyz123
-```
-
-The authorization code in that URL is short-lived, typically valid for only a few minutes, and can only be used once. It is also useless on its own. To turn it into actual tokens, you need the PKCE verifier that matches the challenge sent earlier. This is why intercepting the code alone is not enough for an attacker.
-
-## The Token Exchange
-
-The CLI's localhost server receives the callback, extracts the `code` and `state`, and immediately verifies that the state matches what it originally generated. If the state does not match, the whole flow is aborted.
-
-Then the CLI makes a POST request to Anthropic's token endpoint. For Claude Code, that endpoint is:
-
-```
-https://platform.claude.com/v1/oauth/token
-```
-
-**One detail worth highlighting here:** this endpoint expects JSON in the request body, not `application/x-www-form-urlencoded`. Many OAuth implementations use form encoding for the token exchange, so if you are building or debugging tooling around this, sending form data will silently fail or return an unhelpful error.
-
-The request body contains:
-
-```json
-{
-  "grant_type": "authorization_code",
-  "code": "SplxlOBeZQQYbYS6WxSbIA",
-  "redirect_uri": "http://localhost:54545/callback",
-  "client_id": "9d1c250a-e61b-44d9-88ed-5944d1962f5e",
-  "code_verifier": "dBjftJeZ4CVP-mB92K27uhbUJU1p1r_wW1gFWFOEjXk",
-  "state": "xyz123"
-}
-```
-
-The server receives this, recomputes the SHA-256 hash of the provided `code_verifier`, and checks that it matches the `code_challenge` from the original authorize request. If it matches, the server knows that whoever is making this token exchange is the same party that initiated the flow. The authorization code is consumed and a token set is returned.
-
-The response typically includes:
-
-```json
-{
-  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
-  "refresh_token": "tGzv3JOkF0XG5Qx2TlKWIA...",
-  "expires_in": 3600,
-  "scope": "user:profile user:inference user:sessions:claude_code user:mcp_servers"
-}
-```
-
-The access token is what gets sent with every authenticated API request. The refresh token is what gets used later to obtain new access tokens without going through the browser flow again. The `expires_in` value tells you how many seconds the access token will remain valid.
-
-## Local Storage
-
-Once the tokens are in hand, Claude Code writes them to disk. The primary storage location is `~/.claude/.credentials.json`. The token data sits under a key called `claudeAiOauth`:
-
-```json
-{
-  "claudeAiOauth": {
-    "accessToken": "<bearer-token>",
-    "refreshToken": "<refresh-token>",
-    "expiresAt": 1760000000000,
-    "scopes": [
-      "user:profile",
-      "user:inference",
-      "user:sessions:claude_code",
-      "user:mcp_servers"
-    ],
-    "subscriptionType": "max",
-    "rateLimitTier": "default_claude_max_20x"
-  }
-}
-```
-
-Note that `expiresAt` is stored as a Unix timestamp in milliseconds. Comparing it against `Date.now()` in JavaScript or `time.time() * 1000` in Python tells you whether the token is still valid.
-
-A second file, `~/.claude.json`, holds account-level metadata: the account UUID, email address, organization UUID, organization name, and billing type. This file is used by Claude Code to display status information and to set context for API requests, but it does not contain the actual bearer tokens.
-
-On macOS, there is an additional storage layer. Claude Code may store credentials in the system keychain under a service name like `Claude Code-credentials`. When reading credentials programmatically, the keychain entry can be fresher than what is on disk, especially if a recent re-login updated the keychain but the file write was interrupted or delayed. On Linux, the file is generally the authoritative source.
-
-## Using the Tokens
-
-With a valid access token stored locally, every subsequent request to Anthropic's API includes it as a bearer token in the Authorization header:
-
-```bash
-curl -H "Authorization: Bearer <access-token>" \
-  -H "Content-Type: application/json" \
-  https://api.anthropic.com/v1/messages \
-  -d '{"model":"claude-sonnet-4-20250514","max_tokens":1024,"messages":[{"role":"user","content":"hello"}]}'
-```
-
-The API server validates the token, checks its scopes and expiration, and either processes the request or returns a 401 if something is wrong.
-
-## Diagnostic Endpoints
-
-Two diagnostic endpoints are worth knowing about when you are troubleshooting. The profile endpoint tells you which account and organization the token resolves to:
-
-```bash
-curl -H "Authorization: Bearer <access-token>" \
-  -H "Content-Type: application/json" \
-  https://api.anthropic.com/api/oauth/profile
-```
-
-The CLI roles endpoint reveals what permissions and rate-limit tier the token carries, though it requires a beta header:
-
-```bash
-curl -H "Authorization: Bearer <access-token>" \
-  -H "anthropic-beta: oauth-2025-04-20" \
-  -H "Content-Type: application/json" \
-  https://api.anthropic.com/api/oauth/claude_cli/roles
-```
-
-These are invaluable when you can see that authentication is succeeding but the behavior is not what you expect—maybe you are hitting an unexpected rate limit, or the token is resolving to a different organization than intended.
-
-## Token Refresh
-
-Access tokens expire. The `expires_in` value from the original token response tells you the window, and once that window closes, any request using the old access token will fail. The refresh token exists so you do not have to send the user through the browser flow every time this happens.
-
-The refresh request is simpler than the initial token exchange:
-
-```json
-{
-  "grant_type": "refresh_token",
-  "client_id": "9d1c250a-e61b-44d9-88ed-5944d1962f5e",
-  "refresh_token": "<refresh-token>"
-}
-```
-
-This goes to the same token endpoint, `https://platform.claude.com/v1/oauth/token`, and the response has the same shape as the original token response: a new access token, a new expiration, and potentially a new refresh token as well.
-
-That last point is important. When the server issues a new refresh token alongside the new access token, the old refresh token is typically invalidated. This is called refresh token rotation, and it is a security measure. If an attacker somehow captured an old refresh token, it would already be dead by the time they tried to use it. But it also means that any system holding a copy of the old refresh token is now holding a useless string.
-
-## The Refresh Token Problem
-
-This is where things get interesting in practice. The initial login is rarely the problem. The problems come later, when multiple processes or tools share the same identity and tokens start rotating out from under each other.
-
-Consider this scenario. You log in with `claude auth login`. Refresh token A is stored in `~/.claude/.credentials.json`. Some time later, maybe an hour, maybe a day, the access token expires. Claude Code transparently refreshes it, receiving a new access token and a new refresh token B. Refresh token A is now dead.
-
-But what if another process, maybe a long-running automation script, read the credentials file earlier and cached refresh token A in memory? When that process tries to refresh, it sends the revoked token A to Anthropic's token endpoint and gets back an error:
-
-```json
-{
-  "error": "invalid_grant",
-  "error_description": "Refresh token not found or invalid"
-}
-```
-
-The same thing happens if you log in a second time from a different terminal session, or if you log in from a different machine using the same Anthropic account. Each new login can rotate the refresh token, killing whatever was stored before.
-
-This is not a bug in OAuth. It is the intended security behavior. But it creates a real operational challenge for any system that caches credentials.
-
-## Headless Authentication
-
-The entire PKCE flow described above assumes the CLI can open a browser and listen on a localhost port for the callback. On a normal desktop, that works. On a headless server, an SSH session, or a remote container, it does not.
-
-There are a few workarounds. One approach is to use a manual PKCE flow. The idea is to separate the steps that need a browser from the steps that need a terminal. You generate the PKCE verifier and challenge on the headless machine, construct the authorize URL, copy that URL to a machine that does have a browser, complete the login there, and then paste the resulting authorization code back into the headless machine's terminal. The headless machine already has the verifier, so it can complete the token exchange.
-
-When using the manual approach with Claude's OAuth, the redirect URI is typically set to `https://platform.claude.com/oauth/code/callback`, which displays the authorization code on a web page instead of redirecting to localhost. You then copy the code and paste it back.
-
-A minimal Python implementation of the verifier and challenge generation looks like this:
-
-```python
-import hashlib, base64, secrets
-
-def generate_pkce():
-    verifier = secrets.token_urlsafe(32)
-    challenge = base64.urlsafe_b64encode(
-        hashlib.sha256(verifier.encode("ascii")).digest()
-    ).rstrip(b"=").decode("ascii")
-    return verifier, challenge
-
-verifier, challenge = generate_pkce()
-state = secrets.token_urlsafe(16)
-```
-
-You would then assemble the authorize URL with these values, open it in any browser you have access to, complete the login, grab the code from the callback, and run the token exchange from the headless machine using curl or a script:
-
-```bash
-curl -X POST https://platform.claude.com/v1/oauth/token \
-  -H "Content-Type: application/json" \
-  -d '{
-    "grant_type": "authorization_code",
-    "code": "<paste-code-here>",
-    "redirect_uri": "https://platform.claude.com/oauth/code/callback",
-    "client_id": "9d1c250a-e61b-44d9-88ed-5944d1962f5e",
-    "code_verifier": "<your-verifier>",
-    "state": "<your-state>"
-  }'
-```
-
-If that returns a valid token set, you write it into `~/.claude/.credentials.json` in the shape described earlier, and Claude Code will pick it up.
-
-## Scopes
-
-The `scope` parameter in the authorize request determines what the resulting tokens are allowed to do. For Claude Code, the observed scopes include `user:profile`, `user:inference`, `user:sessions:claude_code`, and `user:mcp_servers`.
-
-The `user:profile` scope allows reading account information. The `user:inference` scope is what actually grants permission to send messages to Claude models. The `user:sessions:claude_code` scope ties the session specifically to Claude Code usage. The `user:mcp_servers` scope allows interaction with MCP (Model Context Protocol) server configurations associated with the account.
-
-If you request fewer scopes, you get a token that can do less. If you request scopes that the account or organization does not permit, the authorization server will either strip them silently or reject the request.
-
-## Token Lifetime
-
-Access tokens from Anthropic's OAuth flow are short-lived. The exact duration can vary, but a common value is around one hour. This is a deliberate design choice. Short-lived access tokens limit the damage if one is leaked: an attacker who steals an access token only has a narrow window to use it.
-
-Refresh tokens last longer, but they are not immortal. They can be revoked explicitly by the server, rotated during a refresh operation, or invalidated by a new login. In practice, a refresh token that is not used for a long period may also expire, though the exact policy is up to Anthropic's implementation.
-
-The `expiresAt` field stored in the credentials file is your best guide. Before making an API call, check whether the current time has passed that value. If it has, refresh first. A simple check in JavaScript:
-
-```javascript
-const creds = JSON.parse(fs.readFileSync(
-  path.join(os.homedir(), ".claude", ".credentials.json"),
-  "utf8"
-));
-const oauth = creds.claudeAiOauth;
-if (Date.now() >= oauth.expiresAt) {
-  // refresh needed
-}
-```
-
-## Security Properties
-
-A few security properties of this flow are worth calling out explicitly.
-
-The PKCE verifier never leaves the client machine during the authorize phase. Only the challenge, which is a one-way hash of the verifier, is sent to the server. An attacker who intercepts the authorize request sees the challenge but cannot derive the verifier from it. When the token exchange happens, the verifier is sent directly to the token endpoint over HTTPS, so it is protected by TLS.
-
-The state parameter protects against CSRF attacks. Without it, an attacker could craft a malicious authorize URL and trick a user into completing the login, then intercept the callback. With a random state value that the client checks, this attack fails because the attacker cannot predict the state.
-
-Refresh token rotation means that even if a refresh token leaks, it becomes useless after the next legitimate refresh operation. The tradeoff is the synchronization complexity described earlier, but the security benefit is substantial.
-
-The credentials file at `~/.claude/.credentials.json` should be treated like any other secret on disk. Its permissions should be restricted to the owning user. On a shared machine, anyone who can read that file can impersonate the authenticated user against Anthropic's API.
-
-## Debugging
-
-When authentication stops working, a methodical approach saves time. Start by checking whether Claude Code itself thinks it is logged in:
-
-```bash
-claude auth status --json
-```
-
-If that shows a valid session, the problem is probably not in the login flow itself. If it shows expired or missing credentials, check the file directly:
-
-```bash
-cat ~/.claude/.credentials.json | python3 -m json.tool
-```
-
-Look for the `claudeAiOauth` object. Is the `accessToken` present? Is `expiresAt` in the future? Is the `refreshToken` present and non-empty?
-
-On macOS, also check whether the keychain has a different (possibly fresher) credential:
-
-```bash
-security find-generic-password -s "Claude Code-credentials" -w 2>/dev/null
-```
-
-If the access token is expired but the refresh token looks valid, try a manual refresh:
-
-```bash
-curl -X POST https://platform.claude.com/v1/oauth/token \
-  -H "Content-Type: application/json" \
-  -d '{
-    "grant_type": "refresh_token",
-    "client_id": "9d1c250a-e61b-44d9-88ed-5944d1962f5e",
-    "refresh_token": "<your-refresh-token>"
-  }'
-```
-
-If that returns `invalid_grant`, the refresh token has been revoked. You need to log in again from scratch with `claude auth login` or the manual PKCE flow.
-
-If the refresh succeeds but API calls still fail, hit the profile endpoint to confirm the token resolves to the expected account and organization. A token that works but belongs to a different org than you expect is a surprisingly common source of confusion, especially on machines where multiple accounts have been used.
-
-## Conclusion
-
-This is, at its core, a well-trodden OAuth flow. The PKCE extension adds a small amount of complexity at the start, but it meaningfully raises the security bar for CLI-based authentication. The local storage model is straightforward once you know which files to look at. And the refresh mechanics, while they can create headaches when multiple consumers share a single identity, follow standard OAuth 2.0 conventions. When something goes wrong, the debugging path is almost always the same: check the files, check the expiration, try a manual refresh, and if all else fails, log in again.
--- a/CLAUDE_OAUTH2_SETUP.md
+++ b/CLAUDE_OAUTH2_SETUP.md
-# Claude OAuth2 Provider Setup Guide
-
-## ⚠️ IMPORTANT: Current Implementation Status
-
-**The Claude provider implementation is currently NOT WORKING as documented.**
-
-### The Problem
-
-The Anthropic API at `api.anthropic.com/v1/messages` **does NOT support OAuth2 Bearer token authentication**. When attempting to use OAuth2 tokens, the API returns:
-
-```json
-{
-  "type": "error",
-  "error": {
-    "type": "authentication_error",
-    "message": "OAuth authentication is currently not supported."
-  }
-}
-```
-
-### What Actually Works
-
-The working implementation in `vendors/opencode-claude-max-proxy` uses a **completely different approach**:
-
-1. **Uses Claude SDK** (`@anthropic-ai/claude-agent-sdk`) - NOT direct API calls
-2. **Authenticates via Claude CLI** (`claude auth status`) - NOT OAuth2 Bearer tokens
-3. **Session-based authentication** through Claude Code infrastructure
-4. **Acts as a proxy** that translates OpenAI-format requests to Claude SDK calls
-
-### Required Changes
-
-The current implementation in [`aisbf/providers.py`](aisbf/providers.py:2300) needs to be completely rewritten to:
- Use the Claude SDK instead of direct HTTP calls
- Authenticate using Claude CLI credentials (stored in `~/.claude/.credentials.json`)
- Proxy requests through the Claude Code infrastructure
- NOT use OAuth2 Bearer tokens against the public API
-
-## Overview (Original Documentation - OUTDATED)
-
-AISBF **attempts** to support Claude Code (claude.ai) as a provider using OAuth2 authentication with automatic token refresh. This implementation matches the official Claude CLI authentication flow and includes a Chrome extension to handle OAuth2 callbacks when AISBF runs on a remote server.
-
-**Intended Features (NOT CURRENTLY WORKING):**
- Full OAuth2 PKCE flow matching official claude-cli
- Automatic token refresh with refresh token rotation
- Chrome extension for remote server OAuth2 callback interception
- Dashboard integration with authentication UI
- Optional curl_cffi TLS fingerprinting for Cloudflare bypass
- Compatible with official claude-cli credentials
-
-## Architecture (OUTDATED)
-
-### Components
-
-1. **ClaudeAuth Class** (`aisbf/claude_auth.py`)
-   - Handles OAuth2 PKCE flow ✅ (Works)
-   - Manages token storage and refresh ✅ (Works)
-   - Stores credentials in `~/.claude_credentials.json` by default ✅ (Works)
-
-2. **ClaudeProviderHandler** (`aisbf/providers.py`)
-   - ❌ **BROKEN**: Attempts to use OAuth2 Bearer tokens against `api.anthropic.com`
-   - ❌ **BROKEN**: API explicitly rejects OAuth authentication
-   - ❌ **NEEDS REWRITE**: Should use Claude SDK like the working proxy implementation
-
-3. **Chrome Extension** (`static/extension/`)
-   - Intercepts localhost OAuth2 callbacks (port 54545) ✅ (Works)
-   - Redirects callbacks to remote AISBF server ✅ (Works)
-   - Auto-configures with server URL ✅ (Works)
-
-4. **Dashboard Integration** (`templates/dashboard/providers.html`)
-   - Extension detection and installation prompt ✅ (Works)
-   - OAuth2 flow initiation ✅ (Works)
-   - Authentication status checking ✅ (Works)
-
-5. **Backend Endpoints** (`main.py`)
-   - `/dashboard/extension/download` - Download extension ZIP ✅ (Works)
-   - `/dashboard/oauth2/callback` - Receive OAuth2 callbacks ✅ (Works)
-   - `/dashboard/claude/auth/start` - Start OAuth2 flow ✅ (Works)
-   - `/dashboard/claude/auth/complete` - Complete token exchange ✅ (Works)
-   - `/dashboard/claude/auth/status` - Check authentication status ✅ (Works)
-
-**Summary**: OAuth2 authentication flow works perfectly. The problem is that the obtained tokens **cannot be used** to call the Anthropic API because the API doesn't support OAuth2 Bearer authentication.
-
-## Why This Doesn't Work
-
-The fundamental issue is an **architectural mismatch**:
-
-### What We Implemented (WRONG)
-```python
-# aisbf/providers.py - ClaudeProviderHandler
-headers = {
-    'Authorization': f'Bearer {access_token}',  # ❌ API rejects this
-    'anthropic-version': '2023-06-01',
-    'anthropic-beta': 'claude-code-20250219',
-    'Content-Type': 'application/json'
-}
-
-response = await self.client.post(
-    'https://api.anthropic.com/v1/messages',  # ❌ This endpoint doesn't support OAuth2
-    headers=headers,
-    json=request_payload
-)
-```
-
-### What Actually Works (vendors/opencode-claude-max-proxy)
-```typescript
-// Uses Claude SDK, NOT direct API calls
-import { query } from "@anthropic-ai/claude-agent-sdk"
-
-// Authenticates via Claude CLI credentials
-const { stdout } = await exec("claude auth status", { timeout: 5000 })
-const auth = JSON.parse(stdout)
-
-// Makes SDK calls (session-based, NOT Bearer tokens)
-for await (const event of query({
-    prompt: makePrompt(),
-    model,
-    workingDirectory,
-    // ... SDK-specific options
-})) {
-    // Process SDK events
-}
-```
-
-## What Needs to Be Fixed
-
-To make the Claude provider work, the implementation needs to:
-
-1. **Install Claude SDK** as a dependency (Node.js package)
-2. **Use Claude CLI credentials** from `~/.claude/.credentials.json`
-3. **Call Claude SDK** instead of making direct HTTP requests
-4. **Proxy SDK responses** back to OpenAI format
-5. **Remove OAuth2 Bearer token usage** from API calls
-
-This is a **major architectural change** that requires:
- Adding Node.js/TypeScript dependencies
- Rewriting ClaudeProviderHandler to use the SDK
- Potentially running a separate Node.js proxy process
- Or using the existing `opencode-claude-max-proxy` as a subprocess
-
-## Setup Instructions (OUTDATED - DO NOT FOLLOW)
-
-⚠️ **WARNING**: The following instructions will allow you to authenticate via OAuth2, but the resulting tokens **will not work** for API calls. The provider will fail with "OAuth authentication is currently not supported."
-
-### 1. Add Claude Provider to Configuration (OUTDATED)
-
-Edit `~/.aisbf/providers.json` or use the dashboard:
-
-```json
-{
-  "providers": {
-    "claude": {
-      "id": "claude",
-      "name": "Claude Code (OAuth2)",
-      "endpoint": "https://api.anthropic.com/v1",
-      "type": "claude",
-      "api_key_required": false,
-      "rate_limit": 0,
-      "claude_config": {
-        "credentials_file": "~/.claude_credentials.json"
-      },
-      "models": [
-        {
-          "name": "claude-3-7-sonnet-20250219",
-          "context_size": 200000,
-          "rate_limit": 0
-        }
-      ]
-    }
-  }
-}
-```
-
-⚠️ **This configuration will NOT work** because the endpoint `https://api.anthropic.com/v1` does not support OAuth2 Bearer authentication.
-
-### 2. Install Chrome Extension (For Remote Servers) (STILL WORKS)
-
-If AISBF runs on a remote server (not localhost), you need the OAuth2 redirect extension:
-
-1. **Download Extension**:
-   - Go to AISBF Dashboard → Providers
-   - Expand the Claude provider
-   - Click "Authenticate with Claude"
-   - If extension is not detected, click "Download Extension"
-
-2. **Install in Chrome**:
-   - Extract the downloaded ZIP file
-   - Open Chrome and go to `chrome://extensions/`
-   - Enable "Developer mode" (toggle in top-right)
-   - Click "Load unpacked"
-   - Select the extracted extension folder
-
-3. **Verify Installation**:
-   - Extension icon should appear in toolbar
-   - Click "Check Status" in dashboard to verify
-
-✅ **This part works correctly** - the extension successfully intercepts OAuth2 callbacks.
-
-### 3. Authenticate with Claude (WORKS BUT TOKENS ARE UNUSABLE)
-
-1. Go to AISBF Dashboard → Providers
-2. Expand the Claude provider
-3. Click "🔐 Authenticate with Claude"
-4. A browser window will open to claude.ai
-5. Log in with your Claude account
-6. Authorize the application
-7. The window will close automatically
-8. Dashboard will show "✓ Authentication successful!"
-
-✅ **OAuth2 flow works perfectly** - you will successfully obtain access and refresh tokens.
-
-❌ **BUT**: These tokens cannot be used to call the Anthropic API because the API doesn't support OAuth2 Bearer authentication.
-
-### 4. Use Claude Provider (DOES NOT WORK)
-
-Once authenticated, attempting to use Claude models via the API will fail:
-
-```bash
-curl -X POST http://your-server:17765/api/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -H "Authorization: Bearer YOUR_TOKEN" \
-  -d '{
-    "model": "claude/claude-3-7-sonnet-20250219",
-    "messages": [
-      {"role": "user", "content": "Hello, Claude!"}
-    ]
-  }'
-```
-
-**Result**: 401 Unauthorized with error message:
-```json
-{
-  "type": "error",
-  "error": {
-    "type": "authentication_error",
-    "message": "OAuth authentication is currently not supported."
-  }
-}
-```
-
-## How It Works (OAuth2 Flow - This Part Works)
-
-### OAuth2 Flow (FUNCTIONAL)
-
-1. **Initiation**:
-   - User clicks "Authenticate" in dashboard ✅
-   - Dashboard calls `/dashboard/claude/auth/start` ✅
-   - Server generates PKCE challenge and returns OAuth2 URL ✅
-   - Dashboard opens URL in new window ✅
-
-2. **Authorization**:
-   - User logs in to claude.ai ✅
-   - Claude redirects to `http://localhost:54545/callback?code=...` ✅
-
-3. **Callback Interception** (Remote Server):
-   - Chrome extension intercepts localhost callback ✅
-   - Extension redirects to `https://your-server/dashboard/oauth2/callback?code=...` ✅
-   - Server stores code in session ✅
-
-4. **Token Exchange**:
-   - Dashboard detects window closed ✅
-   - Calls `/dashboard/claude/auth/complete` ✅
-   - Server exchanges code for access/refresh tokens ✅
-   - Tokens saved to credentials file ✅
-
-5. **API Usage** (❌ THIS IS WHERE IT FAILS):
-   - ClaudeProviderHandler loads tokens from file ✅
-   - Automatically refreshes expired tokens ✅
-   - Injects Bearer token in API requests ✅
-   - **API rejects OAuth2 Bearer tokens** ❌
-   - **Returns "OAuth authentication is currently not supported"** ❌
-
-### Extension Configuration (WORKS CORRECTLY)
-
-The extension automatically configures itself with your AISBF server URL. It intercepts requests to:
- `http://localhost:54545/*`
- `http://127.0.0.1:54545/*`
-
-And redirects them to:
- `https://your-server/dashboard/oauth2/callback?...`
-
-✅ **This works perfectly** - the extension successfully handles OAuth2 callback redirection.
-
-## Troubleshooting
-
-### The Real Problem: API Doesn't Support OAuth2
-
-**Problem**: All API requests fail with 401 Unauthorized:
-```json
-{
-  "type": "error",
-  "error": {
-    "type": "authentication_error",
-    "message": "OAuth authentication is currently not supported."
-  }
-}
-```
-
-**Root Cause**: The Anthropic API at `api.anthropic.com/v1/messages` does **NOT** support OAuth2 Bearer token authentication. This is a fundamental architectural issue, not a configuration problem.
-
-**Solution**: The implementation needs to be completely rewritten to use the Claude SDK (like `vendors/opencode-claude-max-proxy`) instead of direct API calls.
-
-### Extension Not Detected (STILL RELEVANT)
-
-**Problem**: Dashboard shows "OAuth2 Redirect Extension Required"
-
-**Solution**:
-1. Verify extension is installed in Chrome ✅
-2. Check extension is enabled in `chrome://extensions/` ✅
-3. Refresh the dashboard page ✅
-4. Try clicking "Check Status" button ✅
-
-✅ **This troubleshooting is still valid** - extension detection works correctly.
-
-### Authentication Timeout (STILL RELEVANT)
-
-**Problem**: "Authentication timeout. Please try again."
-
-**Solution**:
-1. Ensure extension is installed and enabled ✅
-2. Check browser console for errors ✅
-3. Verify server is accessible from browser ✅
-4. Try authentication again ✅
-
-✅ **This troubleshooting is still valid** - OAuth2 flow works correctly.
-
-### Token Expired (MISLEADING - TOKENS DON'T WORK AT ALL)
-
-**Problem**: API requests fail with 401 Unauthorized
-
-**Original Solution** (WRONG):
-1. Click "Check Status" in dashboard
-2. If expired, click "Authenticate with Claude" again
-3. Tokens are automatically refreshed on API calls
-
-**Actual Problem**: The API doesn't support OAuth2 Bearer tokens at all. Token expiration is irrelevant because even fresh tokens are rejected with "OAuth authentication is currently not supported."
-
-### Credentials File Not Found (STILL RELEVANT)
-
-**Problem**: "Provider 'claude' credentials not available"
-
-**Solution**:
-1. Check credentials file path in provider config ✅
-2. Ensure file exists: `ls -la ~/.claude_credentials.json` ✅
-3. Re-authenticate if file is missing or corrupted ✅
-
-✅ **This troubleshooting is still valid** - credentials file management works correctly.
-
-## Security Considerations (STILL VALID)
-
-1. **Credentials Storage**:
-   - Tokens stored in `~/.claude_credentials.json` ✅
-   - File should have restricted permissions (600) ✅
-   - Contains access_token, refresh_token, and expiry ✅
-
-2. **Extension Permissions**:
-   - Extension only intercepts localhost:54545 ✅
-   - Does not access or store any data ✅
-   - Only redirects OAuth2 callbacks ✅
-
-3. **Token Refresh**:
-   - Access tokens expire after ~1 hour ✅
-   - Automatically refreshed using refresh_token ✅
-   - Refresh tokens are long-lived ✅
-
-✅ **All security considerations are still valid** - the OAuth2 implementation is secure.
-
-## API Compatibility (INCORRECT - NOTHING WORKS)
-
-The Claude provider **claims** to support:
- ❌ Chat completions (`/v1/chat/completions`) - **FAILS: OAuth not supported**
- ❌ Streaming responses - **FAILS: OAuth not supported**
- ❌ System messages - **FAILS: OAuth not supported**
- ❌ Multi-turn conversations - **FAILS: OAuth not supported**
- ❌ Tool/function calling - **FAILS: OAuth not supported**
- ❌ Vision (image inputs) - **FAILS: OAuth not supported**
- ❌ Audio transcription - **Not supported by Claude API**
- ❌ Text-to-speech - **Not supported by Claude API**
- ❌ Image generation - **Not supported by Claude API**
-
-**Reality**: Nothing works because the API rejects OAuth2 Bearer tokens.
-
-## Required Headers (CORRECT BUT INEFFECTIVE)
-
-When using Claude provider, the following headers are automatically added:
-
-```
-Authorization: Bearer <access_token>  # ❌ API rejects this
-anthropic-version: 2023-06-01
-anthropic-beta: claude-code-20250219
-Content-Type: application/json
-```
-
-✅ **Headers are correctly formatted** - the implementation properly constructs the headers.
-
-❌ **But the API rejects them** - the `Authorization: Bearer` header causes the API to return "OAuth authentication is currently not supported."
-
-## Example Configuration (WILL NOT WORK)
-
-Complete provider configuration with multiple models:
-
-```json
-{
-  "providers": {
-    "claude": {
-      "id": "claude",
-      "name": "Claude Code",
-      "endpoint": "https://api.anthropic.com/v1",
-      "type": "claude",
-      "api_key_required": false,
-      "rate_limit": 0,
-      "default_rate_limit_TPM": 40000,
-      "default_rate_limit_TPH": 400000,
-      "default_context_size": 200000,
-      "claude_config": {
-        "credentials_file": "~/.claude_credentials.json"
-      },
-      "models": [
-        {
-          "name": "claude-3-7-sonnet-20250219",
-          "context_size": 200000,
-          "rate_limit": 0,
-          "rate_limit_TPM": 40000,
-          "rate_limit_TPH": 400000
-        },
-        {
-          "name": "claude-3-5-sonnet-20241022",
-          "context_size": 200000,
-          "rate_limit": 0,
-          "rate_limit_TPM": 40000,
-          "rate_limit_TPH": 400000
-        }
-      ]
-    }
-  }
-}
-```
-
-⚠️ **This configuration is syntactically correct but functionally broken** - all API calls will fail with "OAuth authentication is currently not supported."
-
-## Files Modified/Created (ACCURATE)
-
-### New Files
- `aisbf/claude_auth.py` - OAuth2 authentication handler ✅ (Works correctly)
- `static/extension/manifest.json` - Extension manifest ✅ (Works correctly)
- `static/extension/background.js` - Extension service worker ✅ (Works correctly)
- `static/extension/popup.html` - Extension popup UI ✅ (Works correctly)
- `static/extension/popup.js` - Popup logic ✅ (Works correctly)
- `static/extension/options.html` - Extension options page ✅ (Works correctly)
- `static/extension/options.js` - Options logic ✅ (Works correctly)
- `static/extension/icons/*.svg` - Extension icons ✅ (Works correctly)
- `static/extension/README.md` - Extension documentation ✅ (Works correctly)
- `CLAUDE_OAUTH2_SETUP.md` - This guide ⚠️ (Now updated with reality)
-
-### Modified Files
- `aisbf/providers.py` - Added ClaudeProviderHandler ❌ (Broken - uses wrong auth method)
- `aisbf/config.py` - Added claude provider type support ✅ (Works correctly)
- `main.py` - Added OAuth2 endpoints ✅ (Works correctly)
- `templates/dashboard/providers.html` - Added OAuth2 UI ✅ (Works correctly)
- `templates/dashboard/user_providers.html` - Added OAuth2 UI ✅ (Works correctly)
- `config/providers.json` - Added example configuration ⚠️ (Config is correct but won't work)
- `AI.PROMPT` - Added Claude provider documentation ⚠️ (Needs updating)
-
-## Summary: What Works and What Doesn't
-
-### ✅ What Works Perfectly
-
-1. **OAuth2 Authentication Flow**
-   - PKCE challenge generation
-   - Authorization URL creation
-   - Chrome extension callback interception
-   - Token exchange
-   - Token storage and refresh
-   - Dashboard UI integration
-
-2. **Infrastructure**
-   - Chrome extension (fully functional)
-   - Backend OAuth2 endpoints (fully functional)
-   - Credentials file management (fully functional)
-   - Token refresh mechanism (fully functional)
-
-### ❌ What Doesn't Work At All
-
-1. **API Calls**
-   - All requests to `api.anthropic.com/v1/messages` fail
-   - API explicitly rejects OAuth2 Bearer tokens
-   - Error: "OAuth authentication is currently not supported"
-   - No workaround available with current architecture
-
-2. **ClaudeProviderHandler**
-   - Correctly formats requests
-   - Correctly adds headers
-   - But uses wrong authentication method
-   - Needs complete rewrite to use Claude SDK
-
-### 🔧 What Needs to Be Fixed
-
-To make the Claude provider actually work, the implementation needs to:
-
-1. **Use Claude SDK** (`@anthropic-ai/claude-agent-sdk`)
-   - Install as Node.js dependency
-   - Call SDK methods instead of HTTP API
-   - Handle SDK event stream format
-
-2. **Use Claude CLI Credentials**
-   - Read from `~/.claude/.credentials.json` (not `~/.claude_credentials.json`)
-   - Use session-based authentication
-   - Not OAuth2 Bearer tokens
-
-3. **Implement Proxy Architecture**
-   - Run Node.js subprocess with Claude SDK
-   - Translate OpenAI format → Claude SDK format
-   - Translate Claude SDK events → OpenAI format
-   - Or use existing `opencode-claude-max-proxy` as subprocess
-
-4. **Update Documentation**
-   - Clarify this is a proxy to Claude Code, not direct API
-   - Document Claude CLI requirement
-   - Explain session-based authentication
-   - Remove misleading OAuth2 Bearer token claims
-
-## Support (UPDATED)
-
-For issues or questions:
-1. **OAuth2 Flow Issues**: Check the troubleshooting section above - OAuth2 works correctly
-2. **API Call Failures**: This is expected - the API doesn't support OAuth2 Bearer tokens
-3. **Extension Issues**: Review extension console logs - extension works correctly
-4. **Server Issues**: Check AISBF server logs - backend endpoints work correctly
-5. **Implementation Issues**: See "What Needs to Be Fixed" section above
-
-**Known Issue**: The Claude provider implementation is fundamentally broken because it attempts to use OAuth2 Bearer tokens against an API that doesn't support them. This requires a complete architectural rewrite to use the Claude SDK instead of direct API calls.
-
-## References
-
- Claude API Documentation: https://docs.anthropic.com/
- OAuth2 PKCE Flow: https://oauth.net/2/pkce/
- Chrome Extension Development: https://developer.chrome.com/docs/extensions/
- **Working Implementation**: See `vendors/opencode-claude-max-proxy` for a functional Claude Code proxy using the Claude SDK
- **Claude SDK**: `@anthropic-ai/claude-agent-sdk` (Node.js package required for working implementation)
-
-## Conclusion
-
-This documentation has been updated to reflect the **actual state** of the Claude provider implementation:
-
- ✅ **OAuth2 authentication works perfectly** - you can successfully obtain tokens
- ❌ **API calls don't work at all** - the API rejects OAuth2 Bearer tokens
- 🔧 **Major rewrite required** - needs to use Claude SDK instead of direct API calls
-
-The implementation in [`aisbf/providers.py`](aisbf/providers.py:2300) (ClaudeProviderHandler) needs to be completely rewritten to match the working implementation in [`vendors/opencode-claude-max-proxy`](vendors/opencode-claude-max-proxy/src/proxy/server.ts:1), which uses the Claude SDK with session-based authentication instead of OAuth2 Bearer tokens against the public API.
-
-**DO NOT attempt to use this provider** until the implementation is fixed. All API calls will fail with "OAuth authentication is currently not supported."
--- a/DOCUMENTATION.md
+++ b/DOCUMENTATION.md
@@ -630,6 +630,30 @@ User tokens authenticate MCP requests, with admin users getting full access and

 AISBF supports the following AI providers:

+### Model Metadata Extraction
+
+AISBF automatically extracts and tracks model metadata from provider responses:
+
+**Automatic Extraction:**
+- **Pricing Information**: `rate_multiplier`, `rate_unit` (e.g., "per million tokens")
+- **Token Usage**: `prompt_tokens`, `completion_tokens` from API responses
+- **Rate Limits**: Auto-configures rate limits from 429 responses with retry-after headers
+- **Model Details**: `description`, `context_length`, `architecture`, `supported_parameters`
+
+**Dashboard Features:**
+- **"Get Models" Button**: Fetches and displays comprehensive model metadata
+- **Real-time Display**: Shows pricing, rate limits, and capabilities for each model
+- **Extended Fields**: OpenRouter-style metadata including top_provider, pricing details, and architecture
+
+**Configuration:**
+Model metadata is automatically extracted from provider responses and stored in the database. No manual configuration required.
+
+**Benefits:**
+- Automatic rate limit configuration from provider responses
+- Cost estimation based on actual pricing data
+- Better model selection with detailed capability information
+- Reduced manual configuration overhead
+
 ### Google
 - Uses google-genai SDK
 - Requires API key
@@ -1179,6 +1203,58 @@ In this example:
 ```

 ### Rate Limiting
+
+#### Adaptive Rate Limiting
+
+AISBF includes intelligent rate limit management that learns from provider 429 responses and automatically adjusts request rates:
+
+**Features:**
+- **Learning from 429 Responses**: Automatically detects rate limits from provider error responses
+- **Exponential Backoff with Jitter**: Configurable backoff strategy to avoid thundering herd
+- **Rate Limit Headroom**: Stays 10% below learned limits to prevent hitting rate limits
+- **Gradual Recovery**: Slowly increases rate after consecutive successful requests
+- **Per-Provider Tracking**: Independent rate limiters for each provider
+- **Dashboard Monitoring**: Real-time view of current limits, 429 counts, success rates, and recovery progress
+
+**Configuration:**
+
+Via Dashboard:
+1. Navigate to Dashboard → Rate Limits
+2. View current rate limits and statistics for each provider
+3. Reset individual provider limits or reset all
+4. Monitor 429 response patterns and success rates
+
+Via Configuration File (`~/.aisbf/aisbf.json`):
+```json
+{
+  "adaptive_rate_limiting": {
+    "enabled": true,
+    "learning_rate": 0.1,
+    "headroom_percent": 10,
+    "recovery_rate": 0.05,
+    "base_backoff": 1.0,
+    "jitter_factor": 0.1,
+    "history_window": 100
+  }
+}
+```
+
+**Configuration Fields:**
+- `enabled`: Enable adaptive rate limiting (default: true)
+- `learning_rate`: How quickly to adjust limits (0.0-1.0, default: 0.1)
+- `headroom_percent`: Safety margin below learned limit (default: 10%)
+- `recovery_rate`: Rate of limit increase after successes (default: 0.05)
+- `base_backoff`: Base backoff time in seconds (default: 1.0)
+- `jitter_factor`: Random jitter to prevent synchronized retries (default: 0.1)
+- `history_window`: Number of recent requests to track (default: 100)
+
+**Benefits:**
+- Automatic optimization without manual rate limit configuration
+- Reduced 429 errors by learning optimal request rates
+- Better resource utilization by maximizing throughput while respecting limits
+- Provider-specific tracking for independent rate limit management
+
+#### Traditional Rate Limiting
 - Automatic provider disabling when rate limited
 - Intelligent parsing of 429 responses to determine wait time
 - Graceful error handling

--- a/README.md
+++ b/README.md
@@ -8,14 +8,17 @@ A modular proxy server for managing multiple AI provider integrations with unifi

 AISBF includes a comprehensive web-based dashboard for easy configuration and management:

- **Provider Management**: Configure API keys, endpoints, and model settings
+- **Provider Management**: Configure API keys, endpoints, and model settings with automatic metadata extraction
 - **Rotation Configuration**: Set up weighted load balancing across providers
 - **Autoselect Configuration**: Configure AI-powered model selection
 - **Server Settings**: Manage SSL/TLS, authentication, and TOR hidden service
 - **User Management**: Create/manage users with role-based access control (admin users only)
 - **Multi-User Support**: Isolated configurations per user with API token management
 - **Real-time Monitoring**: View provider status and configuration
- **Token Usage Analytics**: Track token usage, costs, and performance with charts and export functionality
+- **Token Usage Analytics**: Track token usage, costs, and performance with charts, filtering by provider/model/rotation, and export functionality
+- **Rate Limits Dashboard**: Monitor adaptive rate limiting with real-time statistics, 429 counts, success rates, and recovery progress
+- **Model Metadata Display**: View detailed model information including pricing, rate limits, and supported parameters
+- **Cache Management**: View cache statistics and clear cache via dashboard endpoints

 Access the dashboard at `http://localhost:17765/dashboard` (default credentials: admin/admin)

@@ -29,7 +32,7 @@ Access the dashboard at `http://localhost:17765/dashboard` (default credentials:
 - **Content Classification**: NSFW/privacy content filtering with configurable classification windows
 - **Streaming Support**: Full support for streaming responses from all providers
 - **Error Tracking**: Automatic provider disabling after consecutive failures with configurable cooldown periods
- **Adaptive Rate Limiting**: Intelligent rate limit management that learns from 429 responses with exponential backoff and gradual recovery
+- **Adaptive Rate Limiting**: Intelligent rate limit management that learns from 429 responses with exponential backoff, gradual recovery, and dashboard monitoring
 - **Rate Limiting**: Built-in rate limiting and graceful error handling
 - **Request Splitting**: Automatic splitting of large requests when exceeding `max_request_tokens` limit
 - **Token Rate Limiting**: Per-model token usage tracking with TPM (tokens per minute), TPH (tokens per hour), and TPD (tokens per day) limits
@@ -48,14 +51,15 @@ Access the dashboard at `http://localhost:17765/dashboard` (default credentials:
  - Dashboard endpoints for cache management
 - **Smart Request Batching**: 15-25% latency reduction by batching similar requests within 100ms window with provider-specific configurations
 - **Streaming Response Optimization**: 10-20% memory reduction with chunk pooling, backpressure handling, and provider-specific streaming optimizations for Google and Kiro providers
- **Token Usage Analytics**: Comprehensive analytics dashboard with charts, cost estimation, performance tracking, and export functionality
+- **Token Usage Analytics**: Comprehensive analytics dashboard with charts, cost estimation, performance tracking, filtering by provider/model/rotation, and export functionality
+- **Model Metadata Extraction**: Automatic extraction of pricing, rate limits, and model information from provider responses with dashboard display
 - **SSL/TLS Support**: Built-in HTTPS support with Let's Encrypt integration and automatic certificate renewal
 - **Self-Signed Certificates**: Automatic generation of self-signed certificates for development/testing
 - **TOR Hidden Service**: Full support for exposing AISBF over TOR network as a hidden service (ephemeral and persistent)
 - **MCP Server**: Model Context Protocol server for remote agent configuration and model access (SSE and HTTP streaming)
 - **Persistent Database**: SQLite/MySQL-based tracking of token usage, context dimensions, and model embeddings with automatic cleanup
 - **Multi-User Support**: User management with isolated configurations, role-based access control, and API token management
- **User-Specific API Endpoints**: Dedicated API endpoints for authenticated users to access their own configurations
+- **User-Specific API Endpoints**: Dedicated API endpoints for authenticated users to access their own configurations with Bearer token authentication
 - **Database Integration**: SQLite/MySQL-based persistent storage for user configurations, token usage tracking, and context management
 - **User-Specific Configurations**: Each user can have their own providers, rotations, and autoselect configurations stored in the database
 - **Flexible Caching System**: Multi-backend caching for model embeddings and performance optimization
@@ -613,6 +617,57 @@ Users can create and manage their own:
 - **User Dashboard**: Personal configuration management and usage statistics
 - **API Token Management**: Create, view, and delete API tokens with usage analytics

+### Adaptive Rate Limiting
+
+AISBF includes intelligent rate limit management that learns from provider 429 responses and automatically adjusts request rates:
+
+#### Features
+- **Learning from 429 Responses**: Automatically detects rate limits from provider error responses
+- **Exponential Backoff with Jitter**: Configurable backoff strategy to avoid thundering herd
+- **Rate Limit Headroom**: Stays 10% below learned limits to prevent hitting rate limits
+- **Gradual Recovery**: Slowly increases rate after consecutive successful requests
+- **Per-Provider Tracking**: Independent rate limiters for each provider
+- **Dashboard Monitoring**: Real-time view of current limits, 429 counts, success rates, and recovery progress
+
+#### Configuration
+
+**Via Dashboard:**
+1. Navigate to Dashboard → Rate Limits
+2. View current rate limits and statistics for each provider
+3. Reset individual provider limits or reset all
+4. Monitor 429 response patterns and success rates
+
+**Via Configuration File:**
+Edit `~/.aisbf/aisbf.json`:
+```json
+{
+  "adaptive_rate_limiting": {
+    "enabled": true,
+    "learning_rate": 0.1,
+    "headroom_percent": 10,
+    "recovery_rate": 0.05,
+    "base_backoff": 1.0,
+    "jitter_factor": 0.1,
+    "history_window": 100
+  }
+}
+```
+
+#### Configuration Fields
+- **`enabled`**: Enable adaptive rate limiting (default: true)
+- **`learning_rate`**: How quickly to adjust limits (0.0-1.0, default: 0.1)
+- **`headroom_percent`**: Safety margin below learned limit (default: 10%)
+- **`recovery_rate`**: Rate of limit increase after successes (default: 0.05)
+- **`base_backoff`**: Base backoff time in seconds (default: 1.0)
+- **`jitter_factor`**: Random jitter to prevent synchronized retries (default: 0.1)
+- **`history_window`**: Number of recent requests to track (default: 100)
+
+#### Benefits
+- **Automatic Optimization**: No manual rate limit configuration needed
+- **Reduced 429 Errors**: Learns optimal request rates for each provider
+- **Better Resource Utilization**: Maximizes throughput while respecting limits
+- **Provider-Specific**: Each provider has independent rate limit tracking
+
 ### Content Classification and Semantic Selection

 AISBF provides advanced content filtering and intelligent model selection based on content analysis:
@@ -785,12 +840,23 @@ Authorization: Bearer YOUR_API_TOKEN
 | `POST /api/user/chat/completions` | Chat completions using user's own models |
 | `GET /api/user/{config_type}/models` | List models for specific config type (provider, rotation, autoselect) |

+#### Access Control
+
 **Admin Users** have access to both global and user configurations when using user API endpoints.

 **Regular Users** can only access their own configurations.

 **Global Tokens** (configured in aisbf.json) have full access to all configurations.

+#### Token Management
+
+Users can create and manage API tokens through the dashboard:
+1. Navigate to Dashboard → User Dashboard → API Tokens
+2. Click "Generate New Token" to create a token
+3. Copy the token immediately (it won't be shown again)
+4. Use the token in API requests via Bearer authentication
+5. View token usage statistics and delete tokens as needed
+
 #### Example: Using User API with cURL

 ```bash
@@ -802,8 +868,21 @@ curl -X POST -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "your-rotation/model", "messages": [{"role": "user", "content": "Hello"}]}' \
  http://localhost:17765/api/user/chat/completions
+
+# List user's providers
+curl -H "Authorization: Bearer YOUR_TOKEN" http://localhost:17765/api/user/providers
+
+# List user's rotations
+curl -H "Authorization: Bearer YOUR_TOKEN" http://localhost:17765/api/user/rotations
 ```

+#### MCP Integration
+
+User tokens also work with MCP (Model Context Protocol) endpoints:
+- Admin users get access to both global and user-specific MCP tools
+- Regular users get access to user-only MCP tools
+- Tools include model access, configuration management, and usage statistics
+
 ### MCP (Model Context Protocol)

 AISBF provides an MCP server for remote agent configuration and model access:

--- a/aisbf/__init__.py
+++ b/aisbf/__init__.py
@@ -46,7 +46,7 @@ from .providers import (
 from .handlers import RequestHandler, RotationHandler, AutoselectHandler
 from .utils import count_messages_tokens, split_messages_into_chunks, get_max_request_tokens_for_model

-__version__ = "0.3.3"
+__version__ = "0.9.2"
 __all__ = [
    # Config
    "config",

--- a/aisbf/claude_auth.py
+++ b/aisbf/claude_auth.py
@@ -77,7 +77,7 @@ def _generate_client_id():
 # Claude OAuth2 Configuration
 # These values match the official claude-cli implementation
 CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"  # Official Claude Code client ID
-AUTH_URL = "https://claude.ai/oauth/authorize"  # Authorization endpoint
+AUTH_URL = "https://claude.com/cai/oauth/authorize"  # Authorization endpoint (note: /cai path is required)
 TOKEN_URL = "https://api.anthropic.com/v1/oauth/token"  # Token exchange endpoint
 REDIRECT_URI = "http://localhost:54545/callback"  # OAuth2 callback URI
 CLI_USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
@@ -141,6 +141,9 @@ class ClaudeAuth:
        """Save credentials to file with file locking to prevent race conditions."""
        try:
            self.tokens = data
+            # Store id_token if received (contains account info)
+            if 'id_token' in data:
+                self.tokens['id_token'] = data['id_token']
            # Add local expiry timestamp for easier checking
            self.tokens['expires_at'] = time.time() + data.get('expires_in', 3600)
            
@@ -281,14 +284,24 @@ class ClaudeAuth:
        logger.error(f"Token refresh failed after {max_retries} attempts")
        return False
    
-    def get_valid_token(self) -> str:
+    def get_valid_token(self, auto_login: bool = False) -> str:
        """
        Get a valid access token, refreshing it if necessary.
        
+        Args:
+            auto_login: If True, automatically trigger login flow when no credentials exist.
+                       If False, raise an exception instead (default: False for security).
+        
        Returns:
            Valid access token
+            
+        Raises:
+            Exception: If no credentials exist and auto_login is False
        """
        if not self.tokens:
+            if not auto_login:
+                logger.error("No Claude credentials available. Please authenticate via dashboard or MCP.")
+                raise Exception("Claude authentication required. Please authenticate via /dashboard/claude/auth/start or MCP tool.")
            logger.info("No tokens available, starting login flow")
            self.login()
        
@@ -296,10 +309,51 @@ class ClaudeAuth:
        if time.time() > (self.tokens.get('expires_at', 0) - 300):
            logger.info("Token expiring soon, refreshing...")
            if not self.refresh_token():
+                if not auto_login:
+                    logger.error("Token refresh failed and auto_login is disabled")
+                    raise Exception("Claude token refresh failed. Please re-authenticate via /dashboard/claude/auth/start or MCP tool.")
                logger.warning("Refresh failed, re-authenticating...")
                self.login()
        
        return self.tokens['access_token']
+
+    def get_account_id(self) -> Optional[str]:
+        """
+        Get account_id from OAuth2 credentials.
+        
+        Returns:
+            Account ID if available, None otherwise
+        """
+        if not self.tokens:
+            return None
+        
+        # First check for account.uuid in token response (Claude OAuth2 format)
+        account = self.tokens.get('account')
+        if account and isinstance(account, dict):
+            account_uuid = account.get('uuid')
+            if account_uuid:
+                return account_uuid
+        
+        # Then try to get from id_token (JWT claim)
+        id_token = self.tokens.get('id_token')
+        if id_token:
+            try:
+                import base64
+                import json
+                # Decode JWT payload (second part of JWT)
+                parts = id_token.split('.')
+                if len(parts) >= 2:
+                    # Add padding if needed
+                    payload = parts[1] + '=' * (4 - len(parts[1]) % 4)
+                    decoded = base64.urlsafe_b64decode(payload)
+                    claims = json.loads(decoded)
+                    # Try sub claim first, then account_id
+                    return claims.get('sub') or claims.get('account_id')
+            except Exception:
+                pass
+        
+        # Fall back to direct account_id field in token response
+        return self.tokens.get('account_id') or self.tokens.get('account_uuid')
    
    def login(self, use_local_server=True):
        """
@@ -333,7 +387,7 @@ class ClaudeAuth:
                "client_id": CLIENT_ID,
                "response_type": "code",
                "redirect_uri": REDIRECT_URI,
-                "scope": "org:create_api_key user:profile user:inference",
+                "scope": "org:create_api_key user:profile user:inference user:sessions:claude_code user:mcp_servers user:file_upload",
                "code_challenge": challenge,
                "code_challenge_method": "S256",
                "state": state
@@ -442,7 +496,7 @@ class ClaudeAuth:
            "client_id": CLIENT_ID,
            "response_type": "code",
            "redirect_uri": REDIRECT_URI,
-            "scope": "org:create_api_key user:profile user:inference",
+            "scope": "org:create_api_key user:profile user:inference user:sessions:claude_code user:mcp_servers user:file_upload",
            "code_challenge": challenge,
            "code_challenge_method": "S256",
            "state": state

--- a/aisbf/providers.py
+++ b/aisbf/providers.py
@@ -43,6 +43,482 @@ from .batching import get_request_batcher
 AISBF_DEBUG = os.environ.get('AISBF_DEBUG', '').lower() in ('true', '1', 'yes')


+class AnthropicFormatConverter:
+    """
+    Shared utility class for converting between OpenAI and Anthropic message formats.
+    Used by both AnthropicProviderHandler and ClaudeProviderHandler.
+    
+    All methods are static to allow usage without instantiation.
+    """
+    
+    # Anthropic stop_reason → OpenAI finish_reason mapping
+    STOP_REASON_MAP = {
+        'end_turn': 'stop',
+        'max_tokens': 'length',
+        'stop_sequence': 'stop',
+        'tool_use': 'tool_calls'
+    }
+    
+    @staticmethod
+    def sanitize_tool_call_id(tool_call_id: str) -> str:
+        """Sanitize tool call ID for Anthropic API (alphanumeric, underscore, hyphen only)."""
+        import re
+        return re.sub(r'[^a-zA-Z0-9_-]', '_', tool_call_id)
+    
+    @staticmethod
+    def filter_empty_content(content) -> Union[str, list, None]:
+        """Filter empty content from messages for Anthropic API compatibility."""
+        if content is None:
+            return None
+        if isinstance(content, str):
+            return None if content.strip() == "" else content
+        if isinstance(content, list):
+            filtered = []
+            for block in content:
+                if isinstance(block, dict):
+                    if block.get('type') == 'text':
+                        text = block.get('text', '')
+                        if text and text.strip():
+                            filtered.append(block)
+                    else:
+                        filtered.append(block)
+                else:
+                    filtered.append(block)
+            return filtered if filtered else None
+        return content
+    
+    @staticmethod
+    def extract_images_from_content(content) -> list:
+        """
+        Convert OpenAI image_url content blocks to Anthropic image source format.
+        
+        Handles:
+        - data:image/jpeg;base64,... → {"type": "image", "source": {"type": "base64", ...}}
+        - https://... → {"type": "image", "source": {"type": "url", ...}}
+        """
+        import logging
+        logger = logging.getLogger(__name__)
+        
+        if not isinstance(content, list):
+            return []
+        
+        images = []
+        max_image_size = 5 * 1024 * 1024  # 5MB
+        
+        for block in content:
+            if not isinstance(block, dict):
+                continue
+            if block.get('type') != 'image_url':
+                continue
+            
+            image_url_obj = block.get('image_url', {})
+            url = image_url_obj.get('url', '') if isinstance(image_url_obj, dict) else ''
+            if not url:
+                continue
+            
+            if url.startswith('data:'):
+                try:
+                    header, data = url.split(',', 1)
+                    media_type = header.split(';')[0].replace('data:', '')
+                    if len(data) > max_image_size:
+                        logger.warning(f"Image too large ({len(data)} bytes), skipping")
+                        continue
+                    images.append({
+                        'type': 'image',
+                        'source': {'type': 'base64', 'media_type': media_type, 'data': data}
+                    })
+                except (ValueError, IndexError) as e:
+                    logger.warning(f"Failed to parse data URL: {e}")
+            elif url.startswith(('http://', 'https://')):
+                images.append({
+                    'type': 'image',
+                    'source': {'type': 'url', 'url': url}
+                })
+            elif block.get('type') == 'image' and 'source' in block:
+                images.append(block)
+        
+        return images
+    
+    @staticmethod
+    def convert_messages_to_anthropic(messages: list, sanitize_ids: bool = True) -> tuple:
+        """
+        Convert OpenAI messages to Anthropic format.
+        
+        Handles:
+        - System message extraction (separate 'system' parameter)
+        - Tool role → user message with tool_result content blocks
+        - Assistant tool_calls → tool_use content blocks
+        - Multimodal content (images)
+        - Empty content filtering
+        
+        Args:
+            messages: OpenAI format messages
+            sanitize_ids: Whether to sanitize tool call IDs
+            
+        Returns:
+            Tuple of (system_message: str|None, anthropic_messages: list)
+        """
+        import logging
+        import json
+        
+        system_message = None
+        anthropic_messages = []
+        
+        for msg in messages:
+            role = msg.get('role')
+            content = msg.get('content')
+            
+            if role == 'system':
+                system_message = content
+                logging.info(f"AnthropicFormatConverter: Extracted system message ({len(content) if content else 0} chars)")
+            
+            elif role == 'tool':
+                tool_call_id = msg.get('tool_call_id', msg.get('name', 'unknown'))
+                tool_result_block = {
+                    'type': 'tool_result',
+                    'tool_use_id': tool_call_id,
+                    'content': content or ""
+                }
+                
+                if anthropic_messages and anthropic_messages[-1]['role'] == 'user':
+                    last_content = anthropic_messages[-1]['content']
+                    if isinstance(last_content, str):
+                        anthropic_messages[-1]['content'] = [
+                            {'type': 'text', 'text': last_content},
+                            tool_result_block
+                        ]
+                    elif isinstance(last_content, list):
+                        anthropic_messages[-1]['content'].append(tool_result_block)
+                else:
+                    anthropic_messages.append({
+                        'role': 'user',
+                        'content': [tool_result_block]
+                    })
+            
+            elif role == 'assistant':
+                tool_calls = msg.get('tool_calls')
+                
+                if tool_calls:
+                    content_blocks = []
+                    filtered = AnthropicFormatConverter.filter_empty_content(content)
+                    if filtered:
+                        if isinstance(filtered, str):
+                            content_blocks.append({'type': 'text', 'text': filtered})
+                        elif isinstance(filtered, list):
+                            content_blocks.extend(filtered)
+                    
+                    for tc in tool_calls:
+                        raw_id = tc.get('id', f"toolu_{len(content_blocks)}")
+                        tool_id = AnthropicFormatConverter.sanitize_tool_call_id(raw_id) if sanitize_ids else raw_id
+                        function = tc.get('function', {})
+                        arguments = function.get('arguments', {})
+                        if isinstance(arguments, str):
+                            try:
+                                arguments = json.loads(arguments)
+                            except json.JSONDecodeError:
+                                arguments = {}
+                        
+                        content_blocks.append({
+                            'type': 'tool_use',
+                            'id': tool_id,
+                            'name': function.get('name', ''),
+                            'input': arguments
+                        })
+                    
+                    if content_blocks:
+                        anthropic_messages.append({
+                            'role': 'assistant',
+                            'content': content_blocks
+                        })
+                else:
+                    filtered = AnthropicFormatConverter.filter_empty_content(content)
+                    if filtered is None:
+                        continue
+                    
+                    if isinstance(filtered, list):
+                        text_parts = []
+                        for block in filtered:
+                            if isinstance(block, dict):
+                                text_parts.append(block.get('text', ''))
+                            elif isinstance(block, str):
+                                text_parts.append(block)
+                        content_str = '\n'.join(text_parts)
+                    else:
+                        content_str = filtered or ""
+                    
+                    if content_str:
+                        anthropic_messages.append({
+                            'role': 'assistant',
+                            'content': content_str
+                        })
+            
+            elif role == 'user':
+                if isinstance(content, list):
+                    content_blocks = []
+                    images = AnthropicFormatConverter.extract_images_from_content(content)
+                    
+                    for block in content:
+                        if isinstance(block, dict):
+                            btype = block.get('type', '')
+                            if btype == 'text':
+                                content_blocks.append(block)
+                            elif btype not in ('image_url', 'image'):
+                                content_blocks.append(block)
+                        elif isinstance(block, str):
+                            content_blocks.append({'type': 'text', 'text': block})
+                    
+                    content_blocks.extend(images)
+                    anthropic_messages.append({
+                        'role': 'user',
+                        'content': content_blocks if content_blocks else content or ""
+                    })
+                else:
+                    anthropic_messages.append({
+                        'role': 'user',
+                        'content': content or ""
+                    })
+            
+            else:
+                logging.warning(f"AnthropicFormatConverter: Unknown role '{role}', treating as user")
+                anthropic_messages.append({
+                    'role': 'user',
+                    'content': content or ""
+                })
+        
+        logging.info(f"AnthropicFormatConverter: Converted {len(messages)} OpenAI → {len(anthropic_messages)} Anthropic messages")
+        return system_message, anthropic_messages
+    
+    @staticmethod
+    def convert_tools_to_anthropic(tools: list) -> Optional[list]:
+        """
+        Convert OpenAI tools to Anthropic format with schema normalization.
+        
+        Normalizes:
+        - ["string", "null"] → "string"
+        - Removes additionalProperties: false
+        - Cleans up required array for nullable fields
+        """
+        import logging
+        
+        if not tools:
+            return None
+        
+        def normalize_schema(schema):
+            if not isinstance(schema, dict):
+                return schema
+            result = {}
+            for key, value in schema.items():
+                if key == "type" and isinstance(value, list):
+                    non_null = [t for t in value if t != "null"]
+                    result[key] = non_null[0] if len(non_null) == 1 else (non_null if non_null else "string")
+                elif key == "properties" and isinstance(value, dict):
+                    result[key] = {k: normalize_schema(v) for k, v in value.items()}
+                elif key == "items" and isinstance(value, dict):
+                    result[key] = normalize_schema(value)
+                elif key == "additionalProperties" and value is False:
+                    continue
+                elif key == "required" and isinstance(value, list):
+                    props = schema.get("properties", {})
+                    cleaned = [f for f in value if f in props and not (isinstance(props.get(f, {}), dict) and isinstance(props[f].get("type"), list) and "null" in props[f]["type"])]
+                    if cleaned:
+                        result[key] = cleaned
+                else:
+                    result[key] = value
+            return result
+        
+        anthropic_tools = []
+        for tool in tools:
+            if tool.get("type") == "function":
+                function = tool.get("function", {})
+                anthropic_tools.append({
+                    "name": function.get("name", ""),
+                    "description": function.get("description", ""),
+                    "input_schema": normalize_schema(function.get("parameters", {}))
+                })
+                logging.info(f"AnthropicFormatConverter: Converted tool: {function.get('name')}")
+        
+        return anthropic_tools if anthropic_tools else None
+    
+    @staticmethod
+    def convert_tool_choice_to_anthropic(tool_choice) -> Optional[dict]:
+        """
+        Convert OpenAI tool_choice to Anthropic format.
+        
+        "auto" → {"type": "auto"}
+        "none" → None
+        "required" → {"type": "any"}
+        {"type": "function", "function": {"name": "X"}} → {"type": "tool", "name": "X"}
+        """
+        import logging
+        
+        if not tool_choice:
+            return None
+        
+        if isinstance(tool_choice, str):
+            if tool_choice == "auto":
+                return {"type": "auto"}
+            elif tool_choice == "none":
+                return None
+            elif tool_choice == "required":
+                return {"type": "any"}
+            else:
+                logging.warning(f"Unknown tool_choice: {tool_choice}")
+                return {"type": "auto"}
+        
+        if isinstance(tool_choice, dict):
+            if tool_choice.get("type") == "function":
+                name = tool_choice.get("function", {}).get("name")
+                return {"type": "tool", "name": name} if name else {"type": "auto"}
+            return tool_choice
+        
+        return {"type": "auto"}
+    
+    @staticmethod
+    def convert_anthropic_response_to_openai(response_data: dict, provider_id: str, model: str) -> dict:
+        """
+        Convert Anthropic API response (dict) to OpenAI chat completion format.
+        
+        Handles text blocks, tool_use blocks, thinking blocks, usage metadata, stop reasons.
+        """
+        import json
+        import logging
+        logger = logging.getLogger(__name__)
+        
+        content_text = ""
+        tool_calls = []
+        thinking_text = ""
+        
+        for block in response_data.get('content', []):
+            btype = block.get('type', '')
+            if btype == 'text':
+                content_text += block.get('text', '')
+            elif btype == 'tool_use':
+                tool_calls.append({
+                    'id': block.get('id', f"call_{len(tool_calls)}"),
+                    'type': 'function',
+                    'function': {
+                        'name': block.get('name', ''),
+                        'arguments': json.dumps(block.get('input', {}))
+                    }
+                })
+            elif btype == 'thinking':
+                thinking_text = block.get('thinking', '')
+            elif btype == 'redacted_thinking':
+                logger.debug("Found redacted_thinking block")
+        
+        stop_reason = response_data.get('stop_reason', 'end_turn')
+        finish_reason = AnthropicFormatConverter.STOP_REASON_MAP.get(stop_reason, 'stop')
+        
+        usage = response_data.get('usage', {})
+        input_tokens = usage.get('input_tokens', 0)
+        output_tokens = usage.get('output_tokens', 0)
+        cache_read = usage.get('cache_read_input_tokens', 0)
+        cache_creation = usage.get('cache_creation_input_tokens', 0)
+        
+        openai_response = {
+            'id': f"{provider_id}-{model}-{int(time.time())}",
+            'object': 'chat.completion',
+            'created': int(time.time()),
+            'model': f'{provider_id}/{model}',
+            'choices': [{
+                'index': 0,
+                'message': {
+                    'role': 'assistant',
+                    'content': content_text if content_text else None
+                },
+                'finish_reason': finish_reason
+            }],
+            'usage': {
+                'prompt_tokens': input_tokens,
+                'completion_tokens': output_tokens,
+                'total_tokens': input_tokens + output_tokens,
+                'prompt_tokens_details': {'cached_tokens': cache_read, 'audio_tokens': 0},
+                'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0}
+            }
+        }
+        
+        if tool_calls:
+            openai_response['choices'][0]['message']['tool_calls'] = tool_calls
+        
+        if thinking_text:
+            openai_response['choices'][0]['message']['provider_options'] = {
+                'anthropic': {'thinking': thinking_text}
+            }
+        
+        return openai_response
+    
+    @staticmethod
+    def convert_anthropic_sdk_response_to_openai(response, provider_id: str, model: str) -> dict:
+        """
+        Convert Anthropic SDK response object (with attributes) to OpenAI format.
+        """
+        import json
+        import logging
+        logger = logging.getLogger(__name__)
+        
+        content_text = ""
+        tool_calls = []
+        thinking_text = ""
+        
+        for block in getattr(response, 'content', []):
+            btype = getattr(block, 'type', '')
+            if btype == 'text' or hasattr(block, 'text'):
+                content_text += getattr(block, 'text', '')
+            elif btype == 'tool_use':
+                raw_input = getattr(block, 'input', {})
+                tool_calls.append({
+                    'id': getattr(block, 'id', f"call_{len(tool_calls)}"),
+                    'type': 'function',
+                    'function': {
+                        'name': getattr(block, 'name', ''),
+                        'arguments': json.dumps(raw_input) if isinstance(raw_input, dict) else str(raw_input)
+                    }
+                })
+            elif btype == 'thinking':
+                thinking_text = getattr(block, 'thinking', '')
+        
+        stop_reason = getattr(response, 'stop_reason', 'end_turn') or 'end_turn'
+        finish_reason = AnthropicFormatConverter.STOP_REASON_MAP.get(stop_reason, 'stop')
+        
+        usage_obj = getattr(response, 'usage', None)
+        input_tokens = getattr(usage_obj, 'input_tokens', 0) or 0 if usage_obj else 0
+        output_tokens = getattr(usage_obj, 'output_tokens', 0) or 0 if usage_obj else 0
+        cache_read = getattr(usage_obj, 'cache_read_input_tokens', 0) or 0 if usage_obj else 0
+        cache_creation = getattr(usage_obj, 'cache_creation_input_tokens', 0) or 0 if usage_obj else 0
+        
+        openai_response = {
+            'id': getattr(response, 'id', f"{provider_id}-{model}-{int(time.time())}"),
+            'object': 'chat.completion',
+            'created': int(time.time()),
+            'model': f'{provider_id}/{model}',
+            'choices': [{
+                'index': 0,
+                'message': {
+                    'role': 'assistant',
+                    'content': content_text if content_text else None
+                },
+                'finish_reason': finish_reason
+            }],
+            'usage': {
+                'prompt_tokens': input_tokens,
+                'completion_tokens': output_tokens,
+                'total_tokens': input_tokens + output_tokens,
+                'prompt_tokens_details': {'cached_tokens': cache_read, 'audio_tokens': 0},
+                'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0}
+            }
+        }
+        
+        if tool_calls:
+            openai_response['choices'][0]['message']['tool_calls'] = tool_calls
+        
+        if thinking_text:
+            openai_response['choices'][0]['message']['provider_options'] = {
+                'anthropic': {'thinking': thinking_text}
+            }
+        
+        return openai_response
+
+
 class AdaptiveRateLimiter:
    """
    Adaptive Rate Limiter that learns optimal rate limits from 429 responses.
@@ -2031,39 +2507,261 @@ class AnthropicProviderHandler(BaseProviderHandler):
            if enable_native_caching:
                logging.info(f"AnthropicProviderHandler: Min cacheable tokens: {min_cacheable_tokens}")

-            # Prepare messages with cache_control if enabled
+            # Convert OpenAI messages to Anthropic format
+            # Key differences:
+            # 1. System messages extracted to separate 'system' parameter
+            # 2. Tool role messages → user messages with tool_result content blocks
+            # 3. Assistant messages with tool_calls → tool_use content blocks
+            # 4. Images: OpenAI image_url → Anthropic image source format
+            system_message = None
            anthropic_messages = []
+            
+            for msg in messages:
+                role = msg.get('role')
+                content = msg.get('content')
+                
+                if role == 'system':
+                    # Extract system message (Anthropic uses separate 'system' parameter)
+                    system_message = content
+                    logging.info(f"AnthropicProviderHandler: Extracted system message ({len(content) if content else 0} chars)")
+                
+                elif role == 'tool':
+                    # Convert tool message to user message with tool_result content block
+                    tool_call_id = msg.get('tool_call_id', msg.get('name', 'unknown'))
+                    tool_result_block = {
+                        'type': 'tool_result',
+                        'tool_use_id': tool_call_id,
+                        'content': content or ""
+                    }
+                    
+                    # Merge into existing user message if last message is user
+                    if anthropic_messages and anthropic_messages[-1]['role'] == 'user':
+                        last_content = anthropic_messages[-1]['content']
+                        if isinstance(last_content, str):
+                            anthropic_messages[-1]['content'] = [
+                                {'type': 'text', 'text': last_content},
+                                tool_result_block
+                            ]
+                        elif isinstance(last_content, list):
+                            anthropic_messages[-1]['content'].append(tool_result_block)
+                        logging.info(f"AnthropicProviderHandler: Appended tool_result to existing user message")
+                    else:
+                        anthropic_messages.append({
+                            'role': 'user',
+                            'content': [tool_result_block]
+                        })
+                        logging.info(f"AnthropicProviderHandler: Created new user message with tool_result")
+                
+                elif role == 'assistant':
+                    tool_calls = msg.get('tool_calls')
+                    
+                    if tool_calls:
+                        # Convert to Anthropic format with tool_use content blocks
+                        content_blocks = []
+                        
+                        # Add text content if present
+                        if content and isinstance(content, str) and content.strip():
+                            content_blocks.append({'type': 'text', 'text': content})
+                        elif content and isinstance(content, list):
+                            content_blocks.extend(content)
+                        
+                        # Add tool_use blocks
+                        import json as _json
+                        for tc in tool_calls:
+                            tool_id = tc.get('id', f"toolu_{len(content_blocks)}")
+                            function = tc.get('function', {})
+                            tool_name = function.get('name', '')
+                            arguments = function.get('arguments', {})
+                            if isinstance(arguments, str):
+                                try:
+                                    arguments = _json.loads(arguments)
+                                except _json.JSONDecodeError:
+                                    logging.warning(f"AnthropicProviderHandler: Failed to parse tool arguments: {arguments}")
+                                    arguments = {}
+                            
+                            content_blocks.append({
+                                'type': 'tool_use',
+                                'id': tool_id,
+                                'name': tool_name,
+                                'input': arguments
+                            })
+                            logging.info(f"AnthropicProviderHandler: Converted tool_call to tool_use: {tool_name}")
+                        
+                        if content_blocks:
+                            anthropic_messages.append({
+                                'role': 'assistant',
+                                'content': content_blocks
+                            })
+                    else:
+                        # Regular assistant message - handle potentially None content
+                        if content is not None:
+                            anthropic_messages.append({
+                                'role': 'assistant',
+                                'content': content
+                            })
+                        else:
+                            # Skip assistant messages with None content (tool_calls-only messages
+                            # that were already handled above shouldn't reach here)
+                            logging.info(f"AnthropicProviderHandler: Skipping assistant message with None content")
+                
+                elif role == 'user':
+                    # Handle multimodal content (images)
+                    if isinstance(content, list):
+                        content_blocks = []
+                        for block in content:
+                            if isinstance(block, dict):
+                                block_type = block.get('type', '')
+                                if block_type == 'text':
+                                    content_blocks.append(block)
+                                elif block_type == 'image_url':
+                                    # Convert OpenAI image_url to Anthropic image source
+                                    image_url_obj = block.get('image_url', {})
+                                    url = image_url_obj.get('url', '') if isinstance(image_url_obj, dict) else ''
+                                    if url.startswith('data:'):
+                                        try:
+                                            header, data = url.split(',', 1)
+                                            media_type = header.split(';')[0].replace('data:', '')
+                                            content_blocks.append({
+                                                'type': 'image',
+                                                'source': {
+                                                    'type': 'base64',
+                                                    'media_type': media_type,
+                                                    'data': data
+                                                }
+                                            })
+                                        except (ValueError, IndexError) as e:
+                                            logging.warning(f"AnthropicProviderHandler: Failed to parse data URL: {e}")
+                                    elif url.startswith(('http://', 'https://')):
+                                        content_blocks.append({
+                                            'type': 'image',
+                                            'source': {
+                                                'type': 'url',
+                                                'url': url
+                                            }
+                                        })
+                                else:
+                                    content_blocks.append(block)
+                            elif isinstance(block, str):
+                                content_blocks.append({'type': 'text', 'text': block})
+                        
+                        anthropic_messages.append({
+                            'role': 'user',
+                            'content': content_blocks if content_blocks else content or ""
+                        })
+                    else:
+                        anthropic_messages.append({
+                            'role': 'user',
+                            'content': content or ""
+                        })
+                
+                else:
+                    logging.warning(f"AnthropicProviderHandler: Unknown message role '{role}', treating as user")
+                    anthropic_messages.append({
+                        'role': 'user',
+                        'content': content or ""
+                    })
+            
+            logging.info(f"AnthropicProviderHandler: Converted {len(messages)} OpenAI messages to {len(anthropic_messages)} Anthropic messages")
+            if system_message:
+                logging.info(f"AnthropicProviderHandler: System message extracted ({len(system_message)} chars)")
+            
+            # Apply cache_control if native caching is enabled
            if enable_native_caching:
-                # Count cumulative tokens for cache decision
                cumulative_tokens = 0
-                for i, msg in enumerate(messages):
-                    # Count tokens in this message
-                    message_tokens = count_messages_tokens([msg], model)
+                for i, msg in enumerate(anthropic_messages):
+                    message_tokens = count_messages_tokens([{'role': msg['role'], 'content': msg['content'] if isinstance(msg['content'], str) else str(msg['content'])}], model)
                    cumulative_tokens += message_tokens
-
-                    # Convert to Anthropic message format
-                    anthropic_msg = {"role": msg["role"], "content": msg["content"]}
-
-                    # Apply cache_control based on position and token count
-                    # Cache system messages and long conversation prefixes
-                    if (msg["role"] == "system" or
-                        (i < len(messages) - 2 and cumulative_tokens >= min_cacheable_tokens)):
-                        anthropic_msg["cache_control"] = {"type": "ephemeral"}
+                    
+                    if i < len(anthropic_messages) - 2 and cumulative_tokens >= min_cacheable_tokens:
+                        content = msg.get('content')
+                        if isinstance(content, str) and content.strip():
+                            msg['content'] = [
+                                {
+                                    'type': 'text',
+                                    'text': content,
+                                    'cache_control': {'type': 'ephemeral'}
+                                }
+                            ]
+                        elif isinstance(content, list) and content:
+                            content[-1]['cache_control'] = {'type': 'ephemeral'}
                        logging.info(f"AnthropicProviderHandler: Applied cache_control to message {i} ({message_tokens} tokens, cumulative: {cumulative_tokens})")
-                    else:
-                        logging.info(f"AnthropicProviderHandler: Not caching message {i} ({message_tokens} tokens, cumulative: {cumulative_tokens})")
-
-                    anthropic_messages.append(anthropic_msg)
+                
+                # Also apply cache_control to system message if present
+                if system_message:
+                    system_message_param = [{
+                        'type': 'text',
+                        'text': system_message,
+                        'cache_control': {'type': 'ephemeral'}
+                    }]
+                else:
+                    system_message_param = None
            else:
-                # Standard message formatting without caching
-                anthropic_messages = [{"role": msg["role"], "content": msg["content"]} for msg in messages]
-
-            response = self.client.messages.create(
-                model=model,
-                messages=anthropic_messages,
-                max_tokens=max_tokens,
-                temperature=temperature
-            )
+                system_message_param = system_message
+            
+            # Convert OpenAI tools to Anthropic format
+            anthropic_tools = None
+            if tools:
+                anthropic_tools = []
+                for tool in tools:
+                    if tool.get("type") == "function":
+                        function = tool.get("function", {})
+                        anthropic_tools.append({
+                            "name": function.get("name", ""),
+                            "description": function.get("description", ""),
+                            "input_schema": function.get("parameters", {})
+                        })
+                        logging.info(f"AnthropicProviderHandler: Converted tool to Anthropic format: {function.get('name')}")
+                if not anthropic_tools:
+                    anthropic_tools = None
+            
+            # Convert OpenAI tool_choice to Anthropic format
+            anthropic_tool_choice = None
+            if tool_choice and anthropic_tools:
+                if isinstance(tool_choice, str):
+                    if tool_choice == "auto":
+                        anthropic_tool_choice = {"type": "auto"}
+                    elif tool_choice == "required":
+                        anthropic_tool_choice = {"type": "any"}
+                    elif tool_choice == "none":
+                        anthropic_tool_choice = None
+                elif isinstance(tool_choice, dict):
+                    if tool_choice.get("type") == "function":
+                        func_name = tool_choice.get("function", {}).get("name")
+                        if func_name:
+                            anthropic_tool_choice = {"type": "tool", "name": func_name}
+            
+            # Build API call parameters
+            api_params = {
+                'model': model,
+                'messages': anthropic_messages,
+                'max_tokens': max_tokens or 4096,
+                'temperature': temperature,
+            }
+            
+            if system_message_param:
+                api_params['system'] = system_message_param
+            
+            if anthropic_tools:
+                api_params['tools'] = anthropic_tools
+            
+            if anthropic_tool_choice:
+                api_params['tool_choice'] = anthropic_tool_choice
+            
+            if AISBF_DEBUG:
+                import json as _json
+                logging.info(f"=== ANTHROPIC API REQUEST PAYLOAD ===")
+                # Sanitize for logging (don't log full base64 images)
+                debug_params = dict(api_params)
+                logging.info(f"Request keys: {list(debug_params.keys())}")
+                logging.info(f"Model: {debug_params.get('model')}")
+                logging.info(f"Messages count: {len(debug_params.get('messages', []))}")
+                logging.info(f"Tools count: {len(debug_params.get('tools', []) or [])}")
+                logging.info(f"Tool choice: {debug_params.get('tool_choice')}")
+                logging.info(f"System: {'present' if debug_params.get('system') else 'none'}")
+                logging.info(f"Full payload: {_json.dumps(debug_params, indent=2, default=str)}")
+                logging.info(f"=== END ANTHROPIC API REQUEST PAYLOAD ===")
+            
+            response = self.client.messages.create(**api_params)
            logging.info(f"AnthropicProviderHandler: Response received: {response}")
            self.record_success()
            
@@ -2112,13 +2810,17 @@ class AnthropicProviderHandler(BaseProviderHandler):
                            logging.info(f"Tool use block: {block}")
                            
                            try:
+                                import json as _json_tc
                                # Convert Anthropic tool_use to OpenAI tool_calls format
+                                # OpenAI requires arguments to be a JSON string, not a dict
+                                raw_input = block.input if hasattr(block, 'input') else {}
+                                arguments_str = _json_tc.dumps(raw_input) if isinstance(raw_input, dict) else str(raw_input)
                                openai_tool_call = {
-                                    "id": f"call_{call_id}",
+                                    "id": block.id if hasattr(block, 'id') else f"call_{call_id}",
                                    "type": "function",
                                    "function": {
                                        "name": block.name if hasattr(block, 'name') else "",
-                                        "arguments": block.input if hasattr(block, 'input') else {}
+                                        "arguments": arguments_str
                                    }
                                }
                                openai_tool_calls.append(openai_tool_call)
@@ -2176,9 +2878,9 @@ class AnthropicProviderHandler(BaseProviderHandler):
                    "finish_reason": finish_reason
                }],
                "usage": {
-                    "prompt_tokens": getattr(response, "usage", {}).get("input_tokens", 0),
-                    "completion_tokens": getattr(response, "usage", {}).get("output_tokens", 0),
-                    "total_tokens": getattr(response, "usage", {}).get("input_tokens", 0) + getattr(response, "usage", {}).get("output_tokens", 0)
+                    "prompt_tokens": getattr(getattr(response, "usage", None), "input_tokens", 0) or 0,
+                    "completion_tokens": getattr(getattr(response, "usage", None), "output_tokens", 0) or 0,
+                    "total_tokens": (getattr(getattr(response, "usage", None), "input_tokens", 0) or 0) + (getattr(getattr(response, "usage", None), "output_tokens", 0) or 0)
                }
            }
            
@@ -2333,8 +3035,8 @@ class ClaudeProviderHandler(BaseProviderHandler):
        from .claude_auth import ClaudeAuth
        self.auth = ClaudeAuth(credentials_file=credentials_file)
        
-        # Anthropic SDK client - created lazily with OAuth2 token
-        self._sdk_client = None
+        # HTTP client for direct API requests (OAuth2 requires direct HTTP, not SDK)
+        self.client = httpx.AsyncClient(timeout=httpx.Timeout(300.0, connect=30.0))
        
        # Streaming idle watchdog configuration (Phase 1.3)
        self.stream_idle_timeout = 90.0  # seconds - matches vendors/claude
@@ -2347,6 +3049,184 @@ class ClaudeProviderHandler(BaseProviderHandler):
            'cache_tokens_created': 0,
            'total_requests': 0,
        }
+        
+        # Session management for quota tracking
+        self.session_state = {
+            'initialized': False,
+            'session_id': None,
+            'device_id': None,
+            'account_uuid': None,
+            'organization_id': None,
+            'last_initialized': None,
+            'quota_5h_reset': None,
+            'quota_5h_utilization': None,
+            'quota_7d_reset': None,
+            'quota_7d_utilization': None,
+            'representative_claim': None,
+            'status': None,
+            'session_timeout': 3600,  # 1 hour session timeout
+        }
+        
+        # Initialize persistent identifiers for metadata
+        self._init_session_identifiers()
+    
+    def _init_session_identifiers(self):
+        """Initialize persistent session identifiers (device_id, account_uuid, session_id)."""
+        import uuid
+        import hashlib
+        
+        # Generate device_id (consistent hash based on provider_id)
+        if not self.session_state.get('device_id'):
+            device_seed = f"{self.provider_id}-{time.time()}"
+            self.session_state['device_id'] = hashlib.sha256(device_seed.encode()).hexdigest()
+        
+        # Get account_uuid from OAuth2 credentials (persistent per user)
+        if not self.session_state.get('account_uuid'):
+            # Try to get from OAuth2 credentials first
+            account_id = self.auth.get_account_id()
+            if account_id:
+                self.session_state['account_uuid'] = account_id
+            else:
+                # Fall back to UUID if not available
+                self.session_state['account_uuid'] = str(uuid.uuid4())
+        
+        # Session ID will be generated on first use in _get_auth_headers
+    
+    async def _initialize_session(self):
+        """
+        Initialize session by sending a quota request to get rate limit information.
+        
+        This matches the claude-cli behavior of sending an initial "quota" request
+        to obtain subscriber quota information from the API headers.
+        """
+        import logging
+        import json
+        
+        logger = logging.getLogger(__name__)
+        logger.info("ClaudeProviderHandler: Initializing session for quota tracking")
+        
+        try:
+            # Get auth headers (this will initialize session_id if needed)
+            headers = self._get_auth_headers(stream=False)
+            
+            # Build minimal quota request (matching claude-cli)
+            # Use persistent identifiers from session_state
+            payload = {
+                'model': 'claude-haiku-4-5-20251001',  # Use cheapest model for quota check
+                'max_tokens': 1,
+                'messages': [
+                    {
+                        'role': 'user',
+                        'content': 'quota'
+                    }
+                ],
+                'metadata': {
+                    'user_id': json.dumps({
+                        'device_id': self.session_state['device_id'],
+                        'account_uuid': self.session_state['account_uuid'],
+                        'session_id': self.session_state['session_id']
+                    })
+                }
+            }
+            
+            # Send quota request
+            api_url = 'https://api.anthropic.com/v1/messages?beta=true'
+            response = await self.client.post(api_url, headers=headers, json=payload)
+            
+            if response.status_code == 200:
+                # Parse rate limit headers
+                headers_dict = dict(response.headers)
+                
+                self.session_state.update({
+                    'initialized': True,
+                    'last_initialized': time.time(),
+                    'organization_id': headers_dict.get('anthropic-organization-id'),
+                    'quota_5h_reset': headers_dict.get('anthropic-ratelimit-unified-5h-reset'),
+                    'quota_5h_utilization': headers_dict.get('anthropic-ratelimit-unified-5h-utilization'),
+                    'quota_7d_reset': headers_dict.get('anthropic-ratelimit-unified-7d-reset'),
+                    'quota_7d_utilization': headers_dict.get('anthropic-ratelimit-unified-7d-utilization'),
+                    'representative_claim': headers_dict.get('anthropic-ratelimit-unified-representative-claim'),
+                    'status': headers_dict.get('anthropic-ratelimit-unified-status'),
+                })
+                
+                logger.info(f"ClaudeProviderHandler: Session initialized successfully")
+                logger.info(f"  Organization ID: {self.session_state['organization_id']}")
+                logger.info(f"  5h utilization: {self.session_state['quota_5h_utilization']}")
+                logger.info(f"  7d utilization: {self.session_state['quota_7d_utilization']}")
+                logger.info(f"  Representative claim: {self.session_state['representative_claim']}")
+                logger.info(f"  Status: {self.session_state['status']}")
+                
+                return True
+            else:
+                logger.warning(f"ClaudeProviderHandler: Session initialization failed: {response.status_code}")
+                return False
+                
+        except Exception as e:
+            logger.error(f"ClaudeProviderHandler: Session initialization error: {e}", exc_info=True)
+            return False
+    
+    def _should_refresh_session(self) -> bool:
+        """
+        Check if session should be refreshed based on timeout or rate limit status.
+        
+        Returns:
+            True if session needs refresh, False otherwise
+        """
+        if not self.session_state['initialized']:
+            return True
+        
+        # Check session timeout
+        if self.session_state['last_initialized']:
+            age = time.time() - self.session_state['last_initialized']
+            if age > self.session_state['session_timeout']:
+                return True
+        
+        # Check if rate limited
+        if self.session_state['status'] != 'allowed':
+            return True
+        
+        return False
+    
+    async def _ensure_session(self):
+        """
+        Ensure session is initialized and valid before making requests.
+        
+        This is called before each request to maintain quota tracking.
+        """
+        if self._should_refresh_session():
+            import logging
+            logger = logging.getLogger(__name__)
+            logger.info("ClaudeProviderHandler: Session needs refresh, initializing...")
+            await self._initialize_session()
+    
+    def _update_session_from_headers(self, headers: Dict):
+        """
+        Update session state from response headers.
+        
+        This is called after each request to keep quota information current.
+        
+        Args:
+            headers: Response headers dict
+        """
+        import logging
+        logger = logging.getLogger(__name__)
+        
+        # Update quota information from headers
+        if 'anthropic-ratelimit-unified-5h-utilization' in headers:
+            old_util = self.session_state.get('quota_5h_utilization')
+            new_util = headers.get('anthropic-ratelimit-unified-5h-utilization')
+            
+            self.session_state.update({
+                'quota_5h_reset': headers.get('anthropic-ratelimit-unified-5h-reset'),
+                'quota_5h_utilization': new_util,
+                'quota_7d_reset': headers.get('anthropic-ratelimit-unified-7d-reset'),
+                'quota_7d_utilization': headers.get('anthropic-ratelimit-unified-7d-utilization'),
+                'representative_claim': headers.get('anthropic-ratelimit-unified-representative-claim'),
+                'status': headers.get('anthropic-ratelimit-unified-status'),
+            })
+            
+            if old_util != new_util:
+                logger.debug(f"ClaudeProviderHandler: Quota utilization updated: {old_util} -> {new_util}")
    
    def _get_sdk_client(self):
        """
@@ -2388,36 +3268,61 @@ class ClaudeProviderHandler(BaseProviderHandler):
        """
        Get HTTP headers with OAuth2 Bearer token.
        Used for direct HTTP calls (not SDK).
+        
+        Headers match the original claude-cli client exactly.
        """
        import logging
+        import uuid
+        import platform
        logger = logging.getLogger(__name__)
        
        # Get valid OAuth2 access token
        access_token = self.auth.get_valid_token()
        
-        # Build headers matching Claude Code implementation
+        # Use stored session ID (consistent across requests in the same session)
+        # Generate new one only if not initialized
+        if not self.session_state.get('session_id'):
+            self.session_state['session_id'] = str(uuid.uuid4())
+        
+        session_id = self.session_state['session_id']
+        request_id = str(uuid.uuid4())  # Request ID is unique per request
+        
+        # Build headers matching claude-cli implementation exactly
+        # Reference: original claude code client request headers
        headers = {
-            'Authorization': f'Bearer {access_token}',
-            'Content-Type': 'application/json',
-            'Anthropic-Version': '2023-06-01',
-            'Anthropic-Beta': 'claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05',
-            'X-App': 'cli',
-            'X-Stainless-Retry-Count': '0',
-            'X-Stainless-Runtime': 'node',
-            'X-Stainless-Lang': 'js',
-            'X-Stainless-Timeout': '600',
-            'Connection': 'keep-alive',
+            'accept': 'application/json',
+            'anthropic-beta': 'oauth-2025-04-20,interleaved-thinking-2025-05-14,redact-thinking-2026-02-12,context-management-2025-06-27,prompt-caching-scope-2026-01-05,structured-outputs-2025-12-15',
+            'anthropic-dangerous-direct-browser-access': 'true',
+            'anthropic-version': '2023-06-01',
+            'authorization': f'Bearer {access_token}',
+            'content-type': 'application/json',
+            'user-agent': 'claude-cli/99.0.0 (undefined, cli)',
+            'x-app': 'cli',
+            'x-claude-code-session-id': session_id,
+            'x-client-request-id': request_id,
+            'x-stainless-arch': platform.machine() or 'x64',
+            'x-stainless-lang': 'js',
+            'x-stainless-os': platform.system() or 'Linux',
+            'x-stainless-package-version': '0.81.0',
+            'x-stainless-retry-count': '0',
+            'x-stainless-runtime': 'node',
+            'x-stainless-runtime-version': 'v22.22.0',
+            'x-stainless-timeout': '600',
        }
        
-        # Set Accept and Accept-Encoding based on streaming mode
+        # Override Accept and Accept-Encoding for streaming mode
        if stream:
-            headers['Accept'] = 'text/event-stream'
-            headers['Accept-Encoding'] = 'identity'
+            headers['accept'] = 'text/event-stream'
+            headers['accept-encoding'] = 'identity'
        else:
-            headers['Accept'] = 'application/json'
-            headers['Accept-Encoding'] = 'gzip, deflate, br, zstd'
+            headers['accept-encoding'] = 'gzip, deflate, br, zstd'
        
-        logger.info("ClaudeProviderHandler: Created auth headers with OAuth2 Bearer token")
+        logger.info("ClaudeProviderHandler: Created auth headers matching claude-cli client")
+        logger.debug(f"ClaudeProviderHandler: Session ID: {session_id}, Request ID: {request_id}")
+        
+        # Log full headers for debugging
+        import json
+        logger.debug(f"ClaudeProviderHandler: Full headers: {json.dumps(headers, indent=2)}")
        return headers
    
    def _sanitize_tool_call_id(self, tool_call_id: str) -> str:
@@ -2946,208 +3851,9 @@ class ClaudeProviderHandler(BaseProviderHandler):
    def _convert_messages_to_anthropic(self, messages: List[Dict]) -> tuple[Optional[str], List[Dict]]:
        """
        Convert OpenAI messages format to Anthropic format.
-        
-        Key differences:
-        1. System messages are extracted to a separate 'system' parameter
-        2. Tool role messages must be converted to user messages with tool_result content blocks
-        3. Assistant messages with tool_calls must have tool_use content blocks
-        4. Messages must alternate between user and assistant roles
-        5. Image content blocks are converted to Anthropic image source format (Phase 4.1)
-        
-        Args:
-            messages: OpenAI format messages
-            
-        Returns:
-            Tuple of (system_message, anthropic_messages)
+        Delegates to shared AnthropicFormatConverter.convert_messages_to_anthropic().
        """
-        import logging
-        import json
-        
-        system_message = None
-        anthropic_messages = []
-        
-        for msg in messages:
-            role = msg.get('role')
-            content = msg.get('content')
-            
-            if role == 'system':
-                # Extract system message
-                system_message = content
-                logging.info(f"Extracted system message: {len(content) if content else 0} chars")
-            
-            elif role == 'tool':
-                # Convert tool message to user message with tool_result content block
-                tool_call_id = msg.get('tool_call_id', msg.get('name', 'unknown'))
-                
-                # Build tool_result content block
-                tool_result_block = {
-                    'type': 'tool_result',
-                    'tool_use_id': tool_call_id,
-                    'content': content or ""
-                }
-                
-                # Check if last message is a user message - if so, append to it
-                if anthropic_messages and anthropic_messages[-1]['role'] == 'user':
-                    # Append to existing user message
-                    last_content = anthropic_messages[-1]['content']
-                    if isinstance(last_content, str):
-                        # Convert string content to list
-                        anthropic_messages[-1]['content'] = [
-                            {'type': 'text', 'text': last_content},
-                            tool_result_block
-                        ]
-                    elif isinstance(last_content, list):
-                        # Append to existing list
-                        anthropic_messages[-1]['content'].append(tool_result_block)
-                    logging.info(f"Appended tool_result to existing user message")
-                else:
-                    # Create new user message with tool_result
-                    anthropic_messages.append({
-                        'role': 'user',
-                        'content': [tool_result_block]
-                    })
-                    logging.info(f"Created new user message with tool_result")
-            
-            elif role == 'assistant':
-                # Check if message has tool_calls
-                tool_calls = msg.get('tool_calls')
-                
-                if tool_calls:
-                    # Convert to Anthropic format with tool_use content blocks
-                    content_blocks = []
-                    
-                    # Add text content if present (filter empty content)
-                    filtered_content = self._filter_empty_content(content)
-                    if filtered_content:
-                        if isinstance(filtered_content, str):
-                            content_blocks.append({
-                                'type': 'text',
-                                'text': filtered_content
-                            })
-                        elif isinstance(filtered_content, list):
-                            content_blocks.extend(filtered_content)
-                    
-                    # Add tool_use blocks
-                    for tc in tool_calls:
-                        # Sanitize tool call ID for Claude API compatibility
-                        raw_tool_id = tc.get('id', f"toolu_{len(content_blocks)}")
-                        tool_id = self._sanitize_tool_call_id(raw_tool_id)
-                        if tool_id != raw_tool_id:
-                            logging.info(f"ClaudeProviderHandler: Sanitized tool call ID: {raw_tool_id} -> {tool_id}")
-                        
-                        function = tc.get('function', {})
-                        tool_name = function.get('name', '')
-                        
-                        # Parse arguments (may be string or dict)
-                        arguments = function.get('arguments', {})
-                        if isinstance(arguments, str):
-                            try:
-                                arguments = json.loads(arguments)
-                            except json.JSONDecodeError:
-                                logging.warning(f"Failed to parse tool arguments as JSON: {arguments}")
-                                arguments = {}
-                        
-                        tool_use_block = {
-                            'type': 'tool_use',
-                            'id': tool_id,
-                            'name': tool_name,
-                            'input': arguments
-                        }
-                        content_blocks.append(tool_use_block)
-                        logging.info(f"Converted tool_call to tool_use block: {tool_name}")
-                    
-                    # Only add message if we have content blocks
-                    if content_blocks:
-                        anthropic_messages.append({
-                            'role': 'assistant',
-                            'content': content_blocks
-                        })
-                else:
-                    # Regular assistant message
-                    # Filter empty content before processing
-                    filtered_content = self._filter_empty_content(content)
-                    if filtered_content is None:
-                        # Skip empty assistant messages
-                        logging.info(f"ClaudeProviderHandler: Skipping empty assistant message")
-                        continue
-                    
-                    # Handle case where content might already be an array (from previous API responses)
-                    if isinstance(filtered_content, list):
-                        # Extract text from content blocks
-                        text_parts = []
-                        for block in filtered_content:
-                            if isinstance(block, dict):
-                                if block.get('type') == 'text':
-                                    text_parts.append(block.get('text', ''))
-                                elif 'text' in block:
-                                    text_parts.append(block['text'])
-                            elif isinstance(block, str):
-                                text_parts.append(block)
-                        content_str = '\n'.join(text_parts) if text_parts else ""
-                        logging.info(f"Normalized assistant message content from array to string ({len(text_parts)} blocks)")
-                    else:
-                        content_str = filtered_content or ""
-                    
-                    # Only add non-empty assistant messages
-                    if content_str:
-                        anthropic_messages.append({
-                            'role': 'assistant',
-                            'content': content_str
-                        })
-            
-            elif role == 'user':
-                # Regular user message - handle images (Phase 4.1)
-                content_blocks = []
-                
-                if isinstance(content, list):
-                    # Extract images from content
-                    images = self._extract_images_from_content(content)
-                    
-                    # Extract text content
-                    for block in content:
-                        if isinstance(block, dict):
-                            block_type = block.get('type', '')
-                            if block_type == 'text':
-                                content_blocks.append(block)
-                            elif block_type == 'image_url':
-                                # Images are handled separately via _extract_images_from_content
-                                pass
-                            else:
-                                # Pass through other block types
-                                content_blocks.append(block)
-                        elif isinstance(block, str):
-                            content_blocks.append({'type': 'text', 'text': block})
-                    
-                    # Add image blocks
-                    content_blocks.extend(images)
-                    
-                    # If we have content blocks, use them; otherwise use original content
-                    if content_blocks:
-                        anthropic_messages.append({
-                            'role': 'user',
-                            'content': content_blocks
-                        })
-                    else:
-                        anthropic_messages.append({
-                            'role': 'user',
-                            'content': content or ""
-                        })
-                else:
-                    # String content - no images
-                    anthropic_messages.append({
-                        'role': 'user',
-                        'content': content or ""
-                    })
-            
-            else:
-                logging.warning(f"Unknown message role: {role}, treating as user")
-                anthropic_messages.append({
-                    'role': 'user',
-                    'content': content or ""
-                })
-        
-        logging.info(f"Converted {len(messages)} OpenAI messages to {len(anthropic_messages)} Anthropic messages")
-        return system_message, anthropic_messages
+        return AnthropicFormatConverter.convert_messages_to_anthropic(messages, sanitize_ids=True)
    
    async def handle_request(self, model: str, messages: List[Dict], max_tokens: Optional[int] = None,
                           temperature: Optional[float] = 1.0, stream: Optional[bool] = False,
@@ -3268,27 +3974,25 @@ class ClaudeProviderHandler(BaseProviderHandler):
                                        temperature: Optional[float] = 1.0, stream: Optional[bool] = False,
                                        tools: Optional[List[Dict]] = None, tool_choice: Optional[Union[str, Dict]] = None) -> Union[Dict, object]:
        """
-        Handle request with a specific model using the Anthropic SDK.
+        Handle request with a specific model using direct HTTP requests.
        
-        The SDK handles:
-        - Proper message format conversion
-        - Automatic retries with exponential backoff
-        - Correct headers and beta features
-        - Better error handling and rate limit management
-        
-        This fixes the rate limiting issues we were seeing with direct HTTP calls.
+        OAuth2 authentication requires direct HTTP requests with Bearer token,
+        as the Anthropic SDK doesn't support OAuth2 via auth_token parameter.
        """
        import logging
        import json
        logger = logging.getLogger(__name__)
        
-        logger.info(f"ClaudeProviderHandler: Handling request for model {model} (SDK mode)")
+        logger.info(f"ClaudeProviderHandler: Handling request for model {model} (Direct HTTP mode)")
        
        if AISBF_DEBUG:
            logger.info(f"ClaudeProviderHandler: Messages: {messages}")
        else:
            logger.info(f"ClaudeProviderHandler: Messages count: {len(messages)}")

+        # Ensure session is initialized for quota tracking
+        await self._ensure_session()
+
        # Apply rate limiting
        await self.apply_rate_limit()
        
@@ -3298,61 +4002,107 @@ class ClaudeProviderHandler(BaseProviderHandler):
        # Convert messages to Anthropic format (handles tool messages properly)
        system_message, anthropic_messages = self._convert_messages_to_anthropic(validated_messages)
        
-        # Get SDK client with OAuth2 token
-        client = self._get_sdk_client()
-        
-        # Build request parameters for SDK
-        # The SDK handles proper message formatting and headers
-        request_kwargs = {
+        # Build request payload
+        payload = {
            'model': model,
            'messages': anthropic_messages,
            'max_tokens': max_tokens or 4096,
        }
        
        # Only add temperature if not None and not 0.0
-        # Claude API requires temperature: 1.0 when thinking is enabled
        if temperature is not None and temperature > 0:
-            request_kwargs['temperature'] = temperature
+            payload['temperature'] = temperature
        
        if system_message:
-            request_kwargs['system'] = system_message
+            # Format system message as Anthropic blocks with billing header
+            # Matches claude-cli format for billing/tracking
+            billing_header = {
+                'type': 'text',
+                'text': 'x-anthropic-billing-header: cc_version=99.0.0.e8c; cc_entrypoint=cli;'
+            }
+            claude_intro = {
+                'type': 'text',
+                'text': 'You are Claude Code, Anthropic\'s official CLI for Claude.'
+            }
+            user_system = {
+                'type': 'text',
+                'text': system_message
+            }
+            payload['system'] = [billing_header, claude_intro, user_system]
+        
+        # Add metadata with user_id (matching claude-cli format)
+        payload['metadata'] = {
+            'user_id': json.dumps({
+                'device_id': self.session_state['device_id'],
+                'account_uuid': self.session_state['account_uuid'],
+                'session_id': self.session_state['session_id']
+            })
+        }
        
        # Convert OpenAI tools to Anthropic format
        if tools:
            anthropic_tools = self._convert_tools_to_anthropic(tools)
            if anthropic_tools:
-                request_kwargs['tools'] = anthropic_tools
+                payload['tools'] = anthropic_tools
        
        # Convert OpenAI tool_choice format to Anthropic format
        if tool_choice and tools:
            anthropic_tool_choice = self._convert_tool_choice_to_anthropic(tool_choice)
            if anthropic_tool_choice:
-                request_kwargs['tool_choice'] = anthropic_tool_choice
+                payload['tool_choice'] = anthropic_tool_choice
+        
+        # Get auth headers with OAuth2 Bearer token
+        headers = self._get_auth_headers(stream=stream)
+        
+        # API endpoint
+        api_url = 'https://api.anthropic.com/v1/messages?beta=true'
        
        # Log request for debugging
-        logger.info(f"ClaudeProviderHandler: SDK request kwargs: {json.dumps({k: str(v)[:200] for k, v in request_kwargs.items()}, indent=2)}")
+        logger.info(f"ClaudeProviderHandler: Request payload keys: {list(payload.keys())}")
+        if AISBF_DEBUG:
+            logger.info(f"ClaudeProviderHandler: Full payload: {json.dumps(payload, indent=2)}")
        
        try:
            if stream:
-                # Streaming request using SDK
-                logger.info(f"ClaudeProviderHandler: Using SDK streaming mode")
-                return self._handle_streaming_request_sdk(client, request_kwargs, model)
+                # Add stream: true to payload for Anthropic API
+                payload['stream'] = True
+                # Streaming request using direct HTTP
+                logger.info(f"ClaudeProviderHandler: Using direct HTTP streaming mode")
+                return self._handle_streaming_request_with_retry(api_url, payload, headers, model)
            else:
-                # Non-streaming request using SDK
-                # The SDK handles automatic retries (max_retries=3)
-                logger.info(f"ClaudeProviderHandler: Using SDK non-streaming mode")
-                response = await client.messages.create(**request_kwargs)
+                # Non-streaming request using direct HTTP
+                logger.info(f"ClaudeProviderHandler: Using direct HTTP non-streaming mode")
+                response = await self._request_with_retry(api_url, headers, payload, max_retries=3)
+                
+                logger.info(f"ClaudeProviderHandler: HTTP response received successfully")
+                
+                # Update session state from response headers
+                self._update_session_from_headers(dict(response.headers))
                
-                logger.info(f"ClaudeProviderHandler: SDK response received successfully")
                self.record_success()
                
-                # Convert SDK response to OpenAI format
-                openai_response = self._convert_sdk_response_to_openai(response, model)
+                # Parse response
+                response_data = response.json()
+                
+                # Dump raw response if AISBF_DEBUG is enabled
+                if AISBF_DEBUG:
+                    logger.info(f"=== RAW CLAUDE RESPONSE ===")
+                    logger.info(f"Raw response data: {json.dumps(response_data, indent=2, default=str)}")
+                    logger.info(f"=== END RAW CLAUDE RESPONSE ===")
+                
+                # Convert to OpenAI format
+                openai_response = self._convert_to_openai_format(response_data, model)
+                
+                # Dump final response dict if AISBF_DEBUG is enabled
+                if AISBF_DEBUG:
+                    logger.info(f"=== FINAL CLAUDE RESPONSE DICT ===")
+                    logger.info(f"Final response: {json.dumps(openai_response, indent=2, default=str)}")
+                    logger.info(f"=== END FINAL CLAUDE RESPONSE DICT ===")
                
                return openai_response
                
        except Exception as e:
-            logger.error(f"ClaudeProviderHandler: SDK request failed: {e}", exc_info=True)
+            logger.error(f"ClaudeProviderHandler: HTTP request failed: {e}", exc_info=True)
            raise
    
    async def _request_with_retry(self, api_url: str, headers: Dict, payload: Dict, max_retries: int = 3):
@@ -3495,6 +4245,9 @@ class ClaudeProviderHandler(BaseProviderHandler):
            ) as response:
                logger.info(f"ClaudeProviderHandler: Streaming response status: {response.status_code}")
                
+                # Update session state from response headers (available at stream start)
+                self._update_session_from_headers(dict(response.headers))
+                
                if response.status_code >= 400:
                    error_text = await response.aread()
                    logger.error(f"ClaudeProviderHandler: Streaming error response: {error_text}")
@@ -3544,6 +4297,9 @@ class ClaudeProviderHandler(BaseProviderHandler):
                last_event_time = time.time()
                idle_timeout = self.stream_idle_timeout
                
+                # Track stop_reason from message_delta events
+                stream_stop_reason = None
+                
                async for line in response.aiter_lines():
                    # Check for idle timeout (Phase 1.3)
                    if time.time() - last_event_time > idle_timeout:
@@ -3696,8 +4452,15 @@ class ClaudeProviderHandler(BaseProviderHandler):
                                thinking_signature = ""
                        
                        elif event_type == 'message_delta':
-                            # Handle usage metadata in streaming (Phase 2.3)
+                            # Handle usage metadata and stop_reason in streaming (Phase 2.3)
+                            delta_data = chunk_data.get('delta', {})
                            usage = chunk_data.get('usage', {})
+                            
+                            # Extract stop_reason from message_delta (Anthropic sends it here)
+                            stream_stop_reason = delta_data.get('stop_reason')
+                            if stream_stop_reason:
+                                logger.debug(f"ClaudeProviderHandler: Stream stop_reason: {stream_stop_reason}")
+                            
                            if usage:
                                logger.debug(f"ClaudeProviderHandler: Streaming usage update: {usage}")
                                
@@ -3712,8 +4475,21 @@ class ClaudeProviderHandler(BaseProviderHandler):
                                    self.cache_stats['cache_tokens_created'] += cache_creation
                        
                        elif event_type == 'message_stop':
-                            # Final chunk
-                            finish_reason = 'stop'
+                            # Final chunk - map Anthropic stop_reason to OpenAI finish_reason
+                            stop_reason_map = {
+                                'end_turn': 'stop',
+                                'max_tokens': 'length',
+                                'stop_sequence': 'stop',
+                                'tool_use': 'tool_calls'
+                            }
+                            # Use stop_reason from message_delta if available, otherwise check tool_calls
+                            if stream_stop_reason:
+                                finish_reason = stop_reason_map.get(stream_stop_reason, 'stop')
+                            elif current_tool_calls:
+                                finish_reason = 'tool_calls'
+                            else:
+                                finish_reason = 'stop'
+                            logger.debug(f"ClaudeProviderHandler: Final finish_reason: {finish_reason}")
                            
                            final_chunk = {
                                'id': completion_id,
@@ -3735,10 +4511,25 @@ class ClaudeProviderHandler(BaseProviderHandler):
                        continue
    
    def _convert_to_openai_format(self, claude_response: Dict, model: str) -> Dict:
-        """Convert Claude API response to OpenAI format."""
+        """Convert Claude API response to OpenAI format.
+        
+        This converts the raw Claude API response (in Anthropic format) to OpenAI chat
+        completion format so it can be used seamlessly with other providers.
+        
+        Handles:
+        - Text content blocks
+        - Tool use blocks (function calls)
+        - Thinking blocks (reasoning)
+        - Redacted thinking blocks
+        - Usage metadata including cache tokens
+        - Stop reason mapping
+        """
        import logging
+        import json
        logger = logging.getLogger(__name__)
        
+        logger.info(f"ClaudeProviderHandler: Converting response to OpenAI format")
+        
        # Extract content
        content_text = ""
        tool_calls = []
@@ -3798,7 +4589,7 @@ class ClaudeProviderHandler(BaseProviderHandler):
                'index': 0,
                'message': {
                    'role': 'assistant',
-                    'content': content_text if not tool_calls else None
+                    'content': content_text if content_text else None
                },
                'finish_reason': finish_reason
            }],
@@ -3900,7 +4691,7 @@ class ClaudeProviderHandler(BaseProviderHandler):
                'index': 0,
                'message': {
                    'role': 'assistant',
-                    'content': message_content if not tool_calls else None,
+                    'content': message_content if message_content else None,
                },
                'finish_reason': finish_reason
            }],
@@ -4686,6 +5477,12 @@ class KiroProviderHandler(BaseProviderHandler):
            # Build OpenAI-format response
            openai_response = self._build_openai_response(model, content, tool_calls)
            
+            # Dump final response dict if AISBF_DEBUG is enabled
+            if AISBF_DEBUG:
+                logging.info(f"=== FINAL KIRO RESPONSE DICT ===")
+                logging.info(f"Final response: {json.dumps(openai_response, indent=2, default=str)}")
+                logging.info(f"=== END FINAL KIRO RESPONSE DICT ===")
+            
            self.record_success()
            return openai_response
            

--- a/docs/claude_provider_comparison.md
+++ b/docs/claude_provider_comparison.md
-# Claude Provider Comparison: AISBF vs vendors/kilocode vs vendors/claude
-
-**Date:** 2026-03-31  
-**Reviewed by:** AI Assistant  
-**Updated:** 2026-04-01 - Deep dive into vendors/claude/src/services/api/claude.ts (3419 lines)
-
-**Sources compared:**
- **AISBF:** [`aisbf/providers.py`](aisbf/providers.py:2300) - `ClaudeProviderHandler` class
- **vendors/kilocode:** `vendors/kilocode/packages/opencode/src/provider/` - Provider transform + SDK integration
- **vendors/claude:** `vendors/claude/src/` - Original Claude Code TypeScript source (3419-line `claude.ts` + `messages.ts`)
-
---
-
-## Overview
-
-This document compares three Claude provider implementations found in the codebase:
-
-1. **AISBF** (`aisbf/providers.py`) - Direct HTTP implementation using OAuth2 tokens via `httpx.AsyncClient`
-2. **vendors/kilocode** (`vendors/kilocode/packages/opencode/src/provider/`) - TypeScript implementation using AI SDK (`@ai-sdk/anthropic`)
-3. **vendors/claude** (`vendors/claude/src/`) - Original Claude Code TypeScript/React implementation from Anthropic
-
---
-
-## 1. Architecture & Approach
-
-| Aspect | AISBF | vendors/kilocode | vendors/claude |
-|--------|-------|------------------|----------------|
-| **Language** | Python | TypeScript | TypeScript/React |
-| **API Method** | Direct HTTP via `httpx.AsyncClient` | AI SDK (`@ai-sdk/anthropic`) | Anthropic SDK + internal `callModel` |
-| **Authentication** | OAuth2 via `ClaudeAuth` class | API key / OAuth via Auth system | Internal OAuth2 + session management |
-| **Endpoint** | `https://api.anthropic.com/v1/messages` | Configurable (baseURL) | Internal SDK routing |
-| **Response Format** | Standard JSON / SSE | AI SDK streaming | SDK streaming |
-| **Protocol** | Anthropic Messages API | AI SDK (unified) | Anthropic Messages API |
-| **Beta Headers** | `claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14,...` | `claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14` | Internal |
-
-**Assessment:** 
- AISBF uses direct HTTP approach appropriate for OAuth2 tokens
- vendors/kilocode uses AI SDK (`@ai-sdk/anthropic`) for unified provider interface
- vendors/kilocode's custom loader ([`provider.ts:125`](vendors/kilocode/packages/opencode/src/provider/provider.ts:125)) sets beta headers similar to AISBF
-
---
-
-## 2. Message Format Conversion
-
-### AISBF: [`_convert_messages_to_anthropic()`](aisbf/providers.py:2890)
-
-**What it does well:**
- Correctly extracts system messages to separate `system` parameter
- Handles tool messages by converting to `tool_result` content blocks
- Converts assistant `tool_calls` to Anthropic `tool_use` blocks
- Handles message role alternation requirements
- Extracts images from OpenAI format content blocks
-
-### vendors/kilocode: [`normalizeMessages()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:49) + [`applyCaching()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:177)
-
-**What it does well:**
- **Empty content filtering**: Removes empty string messages and empty text/reasoning parts from array content
- **Tool call ID sanitization**: Sanitizes tool call IDs for Claude models (replaces non-alphanumeric chars with `_`)
- **Prompt caching**: Applies `cacheControl: { type: "ephemeral" }` to system messages and last 2 messages
- **Provider option remapping**: Remaps providerOptions keys from stored providerID to expected SDK key
- **Duplicate reasoning fix**: Removes duplicate reasoning_details from OpenRouter responses
-
-### vendors/claude: [`normalizeMessagesForAPI()`](vendors/claude/src/utils/messages.ts:1989)
-
-**What it does well (3419-line implementation):**
- **Thinking block preservation**: Walking backward to merge thinking blocks with their parent assistant messages
- **Protected thinking block signatures**: Preserves `signature` field on thinking blocks
- **Tool result pairing**: `ensureToolResultPairing()` inserts synthetic errors for orphaned tool_uses
- **Message UUID tracking**: Uses `message.id` for merging fragmented assistant messages
- **Tool input normalization**: `normalizeToolInputForAPI()` validates tool arguments
- **Caller field stripping**: Removes `caller` field from tool_use blocks for non-tool-search models
- **Advisor block stripping**: Removes advisor blocks when beta header not present
- **Media limit enforcement**: `stripExcessMediaItems()` caps at 100 media items per request
- **Empty content handling**: Inserts placeholder content for empty assistant messages
- **Tool use deduplication**: Prevents duplicate tool_use IDs across merged assistant messages
- **Orphan tool result handling**: Converts orphaned tool_results to user messages with error text
-
-### Key Architectural Difference:
-| Feature | AISBF | vendors/kilocode | vendors/claude |
-|---------|-------|------------------|----------------|
-| **Conversion Strategy** | Direct OpenAI → Anthropic | AI SDK message normalization | Internal SDK normalization |
-| **Image Support** | Yes (Phase 4.1) | Via AI SDK | Yes (native, with 100-item cap) |
-| **Message Validation** | Basic role normalization | Empty content filtering, ID sanitization | Thinking preservation, UUID tracking, media limits |
-| **Tool Result Pairing** | No | No | Yes (ensureToolResultPairing) |
-| **Synthetic Messages** | No | No | Yes (orphan tool_result → user error) |
-| **Caching** | Yes (ephemeral cache on last 2 msgs) | Yes (ephemeral cache on system/last 2 msgs) | Yes (sophisticated cache_control with 1h TTL) |
-| **Media Stripping** | No | No | Yes (stripExcessMediaItems at 100) |
-
---
-
-## 3. Tool Conversion
-
-### AISBF: [`_convert_tools_to_anthropic()`](aisbf/providers.py:2419)
-
-**What it does well:**
- Correctly converts OpenAI `parameters` → Anthropic `input_schema`
- Normalizes JSON Schema types (e.g., `["string", "null"]` → `"string"`)
- Removes `additionalProperties: false` (Anthropic doesn't need it)
- Recursively normalizes nested schemas
-
-### vendors/kilocode: AI SDK handles conversion internally
-
-**What it does:**
- Uses `@ai-sdk/anthropic` which handles OpenAI → Anthropic tool conversion internally
- Tool schemas pass through [`schema()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:954) for Gemini/Google models (integer enum → string enum conversion)
- No explicit Anthropic-specific tool conversion needed (SDK handles it)
-
-### vendors/claude: Internal SDK handling
-
-**What it does:**
- Tool validation including name length limits
- Parameter size limits
- Schema validation against Anthropic's stricter requirements
- Tool result size budgeting ([`applyToolResultBudget`](vendors/claude/src/query.ts:379))
-
---
-
-## 4. Tool Choice Conversion
-
-### AISBF: [`_convert_tool_choice_to_anthropic()`](aisbf/providers.py:2367)
-
-**Correctly handles:**
- `"auto"` → `{"type": "auto"}`
- `"required"` → `{"type": "any"}`
- Specific function → `{"type": "tool", "name": "..."}`
-
-### vendors/kilocode: AI SDK handles internally
- Uses AI SDK's unified tool choice handling
- No explicit tool_choice conversion needed
-
-### vendors/claude: Internal SDK handling
- More nuanced tool choice handling including `disable_parallel_tool_use` support
-
---
-
-## 5. Streaming Implementation
-
-### AISBF: [`_handle_streaming_request()`](aisbf/providers.py:3369)
-
-**What it does:**
- Uses SSE format parsing (`data:` prefixed lines)
- Handles `content_block_delta` events with `text_delta`
- Handles `input_json_delta` for tool call argument streaming (Phase 2.2)
- Handles `content_block_stop` to emit tool calls
- Handles `message_stop` for final chunk
- Yields OpenAI-compatible chunks
- Streaming retry with fallback models via `_wrap_streaming_with_retry()`
-
-### vendors/kilocode: AI SDK streaming
-
-**What it does:**
- Uses AI SDK's built-in streaming via `sdk.languageModel(modelId).doStream()` or `generateText()`
- Handles thinking blocks, tool calls, and text content through unified SDK interface
- Provider-specific streaming handled by `@ai-sdk/anthropic` package
- Supports `fine-grained-tool-streaming-2025-05-14` beta header for streaming tool calls
- Custom fetch wrapper with timeout handling ([`provider.ts:1138`](vendors/kilocode/packages/opencode/src/provider/provider.ts:1138))
-
-### vendors/claude: Raw stream via Anthropic SDK ([`queryModel()`](vendors/claude/src/services/api/claude.ts:1017))
-
-**What it does (3419-line implementation):**
- **Raw stream access**: Uses `anthropic.beta.messages.create({ stream: true }).withResponse()` instead of BetaMessageStream to avoid O(n²) partial JSON parsing
- **Streaming idle watchdog**: `STREAM_IDLE_TIMEOUT_MS` (default 90s) aborts hung streams via setTimeout
- **Stall detection**: Tracks gaps between events, logs stalls >30s with analytics
- **Content block accumulation**: Manually accumulates `input_json_delta` into tool_use blocks
- **Thinking block streaming**: Handles `thinking_delta` and `signature_delta` events
- **Connector text support**: Custom `connector_text_delta` event type for internal use
- **Advisor tool tracking**: Tracks `advisor` server_tool_use blocks with analytics
- **Research field capture**: Internal-only `research` field from message_start/content_block_delta
- **Non-streaming fallback**: Automatic fallback on stream errors with `executeNonStreamingRequest()`
- **Fallback timeout**: `getNonstreamingFallbackTimeoutMs()` (300s default, 120s for remote)
- **Stream resource cleanup**: `releaseStreamResources()` cancels Response body to prevent native memory leaks
- **Request ID tracking**: Generates `clientRequestId` for correlating timeout errors with server logs
- **Cache break detection**: `checkResponseForCacheBreak()` compares cache tokens across requests
- **Quota status extraction**: `extractQuotaStatusFromHeaders()` parses rate limit headers
- **Cost tracking**: `calculateUSDCost()` + `addToTotalSessionCost()` for session billing
- **Fast mode support**: Dynamic `speed='fast'` parameter with latched beta header
- **Task budget support**: `output_config.task_budget` for API-side token budgeting
- **Context management**: `getAPIContextManagement()` for API-side context compression
- **LSP tool deferral**: `shouldDeferLspTool()` defers tools until LSP init completes
- **Dynamic tool loading**: Only includes discovered deferred tools, not all upfront
- **Tool search beta**: Provider-specific beta headers (1P vs Bedrock vs Vertex)
- **Cache editing beta**: Latched `cache-editing` beta header for cached microcompact
- **AFK mode beta**: Latched `afk-mode` beta header for auto mode sessions
- **Thinking clear latch**: Latched `thinking-clear` beta after 1h idle to bust cache
- **Effort params**: `configureEffortParams()` for adaptive/budget thinking modes
- **Structured outputs**: `output_config.format` with `structured-outputs-2025-05-22` beta
- **Media stripping**: `stripExcessMediaItems()` caps at 100 media items before API call
- **Fingerprint computation**: `computeFingerprintFromMessages()` for attribution headers
- **System prompt building**: `buildSystemPromptBlocks()` with cache_control per block
- **Cache breakpoints**: `addCacheBreakpoints()` with cache_edits and pinned edits support
- **Global cache strategy**: `shouldUseGlobalCacheScope()` for prompt_caching_scope beta
- **MCP tool cache gating**: Disables global cache when MCP tools present (dynamic schemas)
- **1h TTL caching**: `should1hCacheTTL()` for eligible users with GrowthBook allowlist
- **Bedrock 1h TTL**: `ENABLE_PROMPT_CACHING_1H_BEDROCK` for 3P Bedrock users
- **Prompt cache break detection**: `recordPromptState()` hashes everything affecting cache key
- **LLM span tracing**: `startLLMRequestSpan()` for beta tracing integration
- **Session activity**: `startSessionActivity('api_call')` for OS-level activity indicators
- **VCR recording**: `withStreamingVCR()` for recording/replaying API responses
- **Anti-distillation**: `fake_tools` opt-in for 1P CLI only
-
-### Key Differences:
-| Feature | AISBF | vendors/kilocode | vendors/claude |
-|---------|-------|------------------|----------------|
-| **Protocol** | SSE (text) | AI SDK (abstracted) | Raw SDK stream |
-| **Tool Streaming** | Yes (Phase 2.2) | Yes (via fine-grained-tool-streaming beta) | Yes (manual accumulation) |
-| **Thinking Blocks** | Yes (Phase 2.1) | Yes (via SDK) | Yes (native, with signature) |
-| **Usage Tracking** | Yes (Phase 2.3) | Yes (via SDK) | Yes (cumulative, with cache_creation) |
-| **Error Recovery** | Fallback models | SDK-level | Non-streaming fallback + model fallback |
-| **Content Dedup** | No | No | Yes (text block dedup) |
-| **Idle Watchdog** | No | No | Yes (90s timeout) |
-| **Stall Detection** | No | No | Yes (30s threshold) |
-| **Memory Cleanup** | Basic | SDK-level | Explicit Response body cancel |
-| **Request ID** | No | No | Yes (client-generated UUID) |
-| **Cache Tracking** | No | No | Yes (cache_break detection) |
-| **Cost Tracking** | No | Yes (via SDK) | Yes (session-level USD) |
-| **VCR Support** | No | No | Yes (record/replay) |
-
---
-
-## 6. Response Conversion
-
-### AISBF: [`_convert_to_openai_format()`](aisbf/providers.py:2916)
-
-**Correctly handles:**
- Text content extraction
- `tool_use` → OpenAI `tool_calls` format
- Stop reason mapping (`end_turn` → `stop`, etc.)
- Usage metadata extraction
-
-### vendors/kilocode: AI SDK handles internally
- AI SDK provides unified response format across all providers
- No manual response conversion needed
- Usage metadata includes cache tokens via SDK
- Cost calculation handles provider-specific differences ([`session/index.ts:860`](vendors/kilocode/packages/opencode/src/session/index.ts:860))
-
-### vendors/claude: Internal SDK handling
- Thinking block preservation
- Protected thinking signatures
- More detailed usage tracking (cache tokens, etc.)
-
---
-
-## 7. Headers & Authentication
-
-### AISBF: [`_get_auth_headers()`](aisbf/providers.py:2331)
-
-**Includes:**
- OAuth2 Bearer token
- `Anthropic-Version: 2023-06-01`
- `Anthropic-Beta` with multiple beta features
- `X-App: cli` and other stainless headers
-
-### vendors/kilocode: [`CUSTOM_LOADERS.anthropic()`](vendors/kilocode/packages/opencode/src/provider/provider.ts:125)
-
-**Includes:**
- API key via `options.apiKey` or auth system
- `anthropic-beta: claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14`
- Custom fetch wrapper with timeout handling
- TODO comment for adaptive thinking headers: `adaptive-thinking-2026-01-28,effort-2025-11-24,max-effort-2026-01-24`
-
-### vendors/claude: Internal OAuth2
-
-**Includes:**
- Internal OAuth2 session management
- Internal SDK routing
-
---
-
-## 8. Model Name Resolution
-
-### AISBF: Direct API call
- Queries `https://api.anthropic.com/v1/models`
- Uses model names as returned by API
-
-### vendors/kilocode: Model ID passthrough
- Uses model IDs from models.dev database
- Model variants generated via [`variants()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:381) for reasoning efforts
- Model sorting via [`sort()`](vendors/kilocode/packages/opencode/src/provider/provider.ts:1348) with priority: `gpt-5`, `claude-sonnet-4`, `big-pickle`, `gemini-3-pro`
- Small model selection via [`getSmallModel()`](vendors/kilocode/packages/opencode/src/provider/provider.ts:1277) with priority list including `claude-haiku-4-5`
-
-### vendors/claude: Internal SDK handling
- Uses internal model registry
- No public models endpoint
-
---
-
-## 9. Reasoning/Thinking Support
-
-### AISBF: No explicit support
- No thinking block handling in current implementation
-
-### vendors/kilocode: Full thinking support via AI SDK
-
-**Features ([`variants()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:381)):**
- **Adaptive thinking** for Opus 4.6 / Sonnet 4.6: `thinking: { type: "adaptive" }` with effort levels
- **Budget-based thinking** for other Claude models: `thinking: { type: "enabled", budgetTokens: N }`
- Effort levels: `low`, `medium`, `high`, `max`
- Temperature returns `undefined` for Claude models (let SDK decide)
-
-### vendors/claude: Full thinking support
-
-**Features:**
- Thinking block preservation during streaming
- Protected thinking block signatures
- Interleaved thinking support (`interleaved-thinking-2025-05-14` beta)
-
---
-
-## 10. Prompt Caching
-
-### AISBF: No explicit support
- No cache_control headers applied
-
-### vendors/kilocode: Automatic ephemeral caching
-
-**Implementation ([`applyCaching()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:177)):**
- Applies `cacheControl: { type: "ephemeral" }` to:
-  - First 2 system messages
-  - Last 2 non-system messages
- Provider-specific cache options:
-  - Anthropic: `cacheControl: { type: "ephemeral" }`
-  - OpenRouter: `cacheControl: { type: "ephemeral" }`
-  - Bedrock: `cachePoint: { type: "default" }`
-  - OpenAI Compatible: `cache_control: { type: "ephemeral" }`
-  - GitHub Copilot: `copilot_cache_control: { type: "ephemeral" }`
-
-### vendors/claude: Internal caching
- Internal prompt caching via Anthropic API
- Message UUID tracking for cache hits
-
---
-
-## 11. Advanced Features
-
-### vendors/kilocode Exclusive Features:
-
-| Feature | Description | Location |
-|---------|-------------|----------|
-| **Reasoning Variants** | Auto-generates low/medium/high/max variants for reasoning models | [`variants()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:381) |
-| **Small Model Selection** | Automatic fallback to haiku/flash/nano models | [`getSmallModel()`](vendors/kilocode/packages/opencode/src/provider/provider.ts:1277) |
-| **Empty Content Filtering** | Removes empty messages and text/reasoning parts | [`normalizeMessages()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:49) |
-| **Tool Call ID Sanitization** | Replaces non-alphanumeric chars in tool call IDs for Claude | [`normalizeMessages()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:76) |
-| **Duplicate Reasoning Fix** | Removes duplicate reasoning_details from OpenRouter | [`fixDuplicateReasoning()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:256) |
-| **Provider Option Remapping** | Remaps providerOptions keys to match SDK expectations | [`message()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:318) |
-| **Gemini Schema Sanitization** | Converts integer enums to string enums for Google models | [`schema()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:954) |
-| **Unsupported Part Handling** | Converts unsupported media types to error text | [`unsupportedParts()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:217) |
-
-### vendors/claude Exclusive Features:
-
-| Feature | Description | Location |
-|---------|-------------|----------|
-| **Query loop** | Multi-turn tool execution with auto-compact, reactive compact | [`query.ts:219`](vendors/claude/src/query.ts:219) |
-| **Token budgeting** | Per-turn output token limits with continuation nudges | [`query.ts:1308`](vendors/claude/src/query.ts:1308) |
-| **Auto-compaction** | Automatic conversation summarization when context gets large | [`query.ts:454`](vendors/claude/src/query.ts:454) |
-| **Context collapse** | Granular context compression | [`query.ts:440`](vendors/claude/src/query.ts:440) |
-| **Stop hooks** | Pre/post-turn hook execution | [`query.ts:1267`](vendors/claude/src/query.ts:1267) |
-| **Memory prefetch** | Relevant memory file preloading | [`query.ts:301`](vendors/claude/src/query.ts:301) |
-| **Skill discovery** | Dynamic skill file detection | [`query.ts:331`](vendors/claude/src/query.ts:331) |
-| **Streaming tool execution** | Parallel tool execution during streaming | [`query.ts:1380`](vendors/claude/src/query.ts:1380) |
-| **Model fallback** | Automatic fallback to alternative models | [`query.ts:894`](vendors/claude/src/query.ts:894) |
-| **Task budget** | Agentic turn budget management | [`query.ts:291`](vendors/claude/src/query.ts:291) |
-
---
-
-## Summary
-
-### Strengths of AISBF implementation:
-1. Clean OAuth2 integration matching vendors/kilocode patterns
-2. Comprehensive message format conversion
-3. Good tool schema normalization
-4. Proper streaming SSE handling
-5. Robust fallback strategy for model discovery
-6. Adaptive rate limiting with learning
-
-### Strengths of vendors/kilocode implementation:
-1. **AI SDK abstraction**: Unified interface across all providers
-2. **Automatic prompt caching**: Ephemeral caching on system/last 2 messages
-3. **Full thinking support**: Adaptive thinking + budget-based thinking for Claude models
-4. **Message validation**: Empty content filtering, tool call ID sanitization
-5. **Reasoning variants**: Auto-generates effort level variants for reasoning models
-6. **Provider option remapping**: Handles provider-specific SDK key differences
-7. **Robust error handling**: Duplicate reasoning fix, unsupported part handling
-8. **Model management**: Small model selection, priority sorting
-
-### Strengths of vendors/claude implementation:
-1. **Full conversation management**: Query loop with auto-compact, reactive compact
-2. **Token budgeting**: Per-turn output limits with continuation nudges
-3. **Streaming tool execution**: Parallel tool execution during streaming
-4. **Model fallback**: Automatic fallback to alternative models
-5. **Memory prefetch**: Relevant memory file preloading
-6. **Skill discovery**: Dynamic skill file detection
-7. **Stop hooks**: Pre/post-turn hook execution
-
-### Areas for improvement (AISBF):
-1. Add thinking block support for models that use it
-2. Add tool call streaming (fine-grained-tool-streaming beta)
-3. Add more detailed usage metadata (cache tokens)
-4. Consider adding model fallback support
-5. Add tool result size validation
-6. Add message role normalization and validation
-7. Add image/multimodal support
-8. Add prompt caching (ephemeral cache on system/last 2 messages)
-9. Add tool call ID sanitization for Claude compatibility
-
-### Overall assessment:
-The AISBF Claude provider is a solid implementation that correctly handles the core API communication, message conversion, and tool handling. It appropriately focuses on the provider-level concerns (API translation) while leaving higher-level concerns (conversation management, compaction) to the rest of the framework.
-
-The vendors/kilocode implementation demonstrates the power of using the AI SDK (`@ai-sdk/anthropic`) for provider abstraction, with automatic handling of thinking, caching, and tool conversion. Its message validation pipeline (empty content filtering, ID sanitization, duplicate reasoning fix) provides robustness that AISBF could benefit from.
-
-The vendors/claude implementation is the most comprehensive, with full conversation management including auto-compaction, token budgeting, streaming tool execution, and model fallback. Many of these features are out of scope for a provider handler but represent the full Claude Code experience.
--- a/docs/claude_provider_improvement_plan.md
+++ b/docs/claude_provider_improvement_plan.md
-# AISBF Claude Provider Improvement Plan
-
-**Date:** 2026-03-31  
-**Based on:** Claude Provider Comparison ([`docs/claude_provider_comparison.md`](docs/claude_provider_comparison.md))  
-**Target:** [`aisbf/providers.py`](aisbf/providers.py:2300) - `ClaudeProviderHandler` class
-
---
-
-## Overview
-
-This plan outlines the implementation of improvements identified in the Claude provider comparison between AISBF, vendors/kilocode, and vendors/claude. The improvements are prioritized by impact and complexity.
-
---
-
-## Phase 1: Quick Wins (Low Complexity, High Impact)
-
-### 1.1 Tool Call ID Sanitization
-
-**Problem:** Claude API requires tool call IDs to contain only alphanumeric characters, underscores, and hyphens. OpenAI-style IDs may contain invalid characters.
-
-**Reference:** vendors/kilocode [`normalizeMessages()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:76)
-
-**Implementation:**
- Add `_sanitize_tool_call_id()` method to `ClaudeProviderHandler`
- Replace non-alphanumeric chars (except `_` and `-`) with `_`
- Apply to all tool_call IDs in messages before sending to API
- Apply to tool_use IDs in response conversion
-
-**Files to modify:**
- `aisbf/providers.py` - `ClaudeProviderHandler` class
-
-**Estimated effort:** 1-2 hours
-
---
-
-### 1.2 Empty Content Filtering
-
-**Problem:** Claude API rejects messages with empty content strings or empty text/reasoning parts in array content.
-
-**Reference:** vendors/kilocode [`normalizeMessages()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:49)
-
-**Implementation:**
- Add `_filter_empty_content()` method to `ClaudeProviderHandler`
- Filter out empty string messages
- Remove empty text parts from array content
- Apply during message conversion in `_convert_messages_to_anthropic()`
-
-**Files to modify:**
- `aisbf/providers.py` - `ClaudeProviderHandler` class
-
-**Estimated effort:** 1-2 hours
-
---
-
-### 1.3 Prompt Caching (Ephemeral)
-
-**Problem:** No cache_control headers applied, missing opportunity for cost savings via prompt caching.
-
-**Reference:** vendors/kilocode [`applyCaching()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:177)
-
-**Implementation:**
- Add `enable_prompt_caching` config option to provider config
- Add `_apply_cache_control()` method to `ClaudeProviderHandler`
- Apply `cache_control: {"type": "ephemeral"}` to:
-  - System message (if present)
-  - Last 2 non-system messages before the final user message
- Only apply when message count > 4 (avoid overhead for short conversations)
- Add cache_control to the message content block format
-
-**Files to modify:**
- `aisbf/providers.py` - `ClaudeProviderHandler` class
- `aisbf/models.py` - Add cache_control field if needed
-
-**Estimated effort:** 2-3 hours
-
---
-
-## Phase 2: Core Improvements (Medium Complexity, High Impact)
-
-### 2.1 Thinking Block Support
-
-**Problem:** No thinking block handling in current implementation. Claude 3.7+ Sonnet and Opus support extended thinking.
-
-**Reference:** vendors/kilocode [`variants()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:381)
-
-**Implementation:**
- Add `enable_thinking` config option to provider config
- Add `thinking_budget_tokens` config option (default: 16000)
- Add `thinking` parameter to API request payload when enabled
- Parse thinking blocks from response content
- Add thinking content to response metadata (optional, for logging)
- Handle thinking blocks in streaming response
-
-**Files to modify:**
- `aisbf/providers.py` - `ClaudeProviderHandler` class
- `aisbf/models.py` - Add thinking config fields
- `config/providers.json` - Add thinking config schema
-
-**Estimated effort:** 4-6 hours
-
---
-
-### 2.2 Tool Call Streaming
-
-**Problem:** No tool call streaming. Missing `fine-grained-tool-streaming-2025-05-14` beta feature.
-
-**Reference:** vendors/kilocode beta headers + vendors/claude `StreamingToolExecutor`
-
-**Implementation:**
- Add `fine-grained-tool-streaming-2025-05-14` to Anthropic-Beta header (already partially there)
- Update streaming parser to handle `content_block_start` events with `tool_use` type
- Parse `content_block_delta` events with `input_json_delta` type
- Accumulate partial JSON and emit tool call chunks in OpenAI format
- Handle `content_block_stop` events for tool_use blocks
-
-**Files to modify:**
- `aisbf/providers.py` - `ClaudeProviderHandler._handle_streaming_request()`
-
-**Estimated effort:** 4-6 hours
-
---
-
-### 2.3 Detailed Usage Metadata
-
-**Problem:** No cache token tracking in usage metadata. Missing cache_read_input_tokens and cache_creation_input_tokens.
-
-**Reference:** vendors/kilocode [`session/index.ts:860`](vendors/kilocode/packages/opencode/src/session/index.ts:860)
-
-**Implementation:**
- Extract `cache_read_input_tokens` from Claude API response usage
- Extract `cache_creation_input_tokens` from Claude API response usage
- Add to OpenAI-format response usage metadata:
-  - `cache_read_tokens`
-  - `cache_creation_tokens`
- Log cache hit/miss for analytics
-
-**Files to modify:**
- `aisbf/providers.py` - `ClaudeProviderHandler._convert_to_openai_format()`
-
-**Estimated effort:** 1-2 hours
-
---
-
-## Phase 3: Robustness Improvements (Medium Complexity, Medium Impact)
-
-### 3.1 Message Role Normalization and Validation
-
-**Problem:** No validation of message roles or content structure before sending to API.
-
-**Reference:** vendors/kilocode [`normalizeMessages()`](vendors/kilocode/packages/opencode/src/provider/transform.ts:49)
-
-**Implementation:**
- Add `_validate_messages()` method to `ClaudeProviderHandler`
- Validate message roles are one of: user, assistant, system
- Validate system messages only appear at start
- Validate alternating user/assistant roles (after system)
- Log warnings for invalid messages instead of failing
- Add option to auto-fix common issues (e.g., consecutive user messages)
-
-**Files to modify:**
- `aisbf/providers.py` - `ClaudeProviderHandler` class
-
-**Estimated effort:** 2-3 hours
-
---
-
-### 3.2 Tool Result Size Validation
-
-**Problem:** No validation of tool result sizes before sending to API.
-
-**Reference:** vendors/claude [`applyToolResultBudget`](vendors/claude/src/query.ts:379)
-
-**Implementation:**
- Add `max_tool_result_chars` config option (default: 100000)
- Add `_truncate_tool_result()` method
- Truncate tool results that exceed limit with truncation notice
- Log warnings when truncation occurs
- Track cumulative tool result size per turn
-
-**Files to modify:**
- `aisbf/providers.py` - `ClaudeProviderHandler` class
- `aisbf/models.py` - Add max_tool_result_chars config
-
-**Estimated effort:** 2-3 hours
-
---
-
-### 3.3 Model Fallback Support
-
-**Problem:** No automatic fallback to alternative models when primary model fails.
-
-**Reference:** vendors/claude [`query.ts:894`](vendors/claude/src/query.ts:894)
-
-**Implementation:**
- Add `fallback_models` config option (list of model IDs)
- Add fallback logic to `handle_request()` method
- On specific error types (rate limit, overloaded), retry with next fallback model
- Track fallback usage for analytics
- Limit fallback attempts to prevent infinite loops
-
-**Files to modify:**
- `aisbf/providers.py` - `ClaudeProviderHandler` class
- `aisbf/models.py` - Add fallback_models config
-
-**Estimated effort:** 3-4 hours
-
---
-
-## Phase 4: Advanced Features (High Complexity, Medium Impact)
-
-### 4.1 Image/Multimodal Support
-
-**Problem:** No image/multimodal support in current implementation.
-
-**Reference:** vendors/kilocode AI SDK image handling
-
-**Implementation:**
- Add image content block support in `_convert_messages_to_anthropic()`
- Handle OpenAI image_url format → Anthropic image source format
- Support base64-encoded images
- Support image URLs (Claude API supports URL-based images)
- Add image validation (size, format, encoding)
- Add max image size config option
-
-**Files to modify:**
- `aisbf/providers.py` - `ClaudeProviderHandler` class
- `aisbf/models.py` - Add image config options
-
-**Estimated effort:** 4-6 hours
-
---
-
-## Implementation Order
-
-| Phase | Item | Priority | Effort | Dependencies |
-|-------|------|----------|--------|--------------|
-| 1 | 1.1 Tool Call ID Sanitization | High | 1-2h | None |
-| 1 | 1.2 Empty Content Filtering | High | 1-2h | None |
-| 1 | 1.3 Prompt Caching | High | 2-3h | None |
-| 2 | 2.3 Detailed Usage Metadata | High | 1-2h | None |
-| 2 | 2.1 Thinking Block Support | High | 4-6h | None |
-| 2 | 2.2 Tool Call Streaming | Medium | 4-6h | 2.1 |
-| 3 | 3.1 Message Role Validation | Medium | 2-3h | None |
-| 3 | 3.2 Tool Result Size Validation | Medium | 2-3h | None |
-| 3 | 3.3 Model Fallback Support | Low | 3-4h | None |
-| 4 | 4.1 Image/Multimodal Support | Low | 4-6h | None |
-
-**Total estimated effort:** 24-37 hours
-
---
-
-## Testing Strategy
-
-For each improvement:
-
-1. **Unit tests:** Test the new method/function in isolation
-2. **Integration tests:** Test with mock Claude API responses
-3. **End-to-end tests:** Test with actual Claude API (using test credentials)
-4. **Regression tests:** Ensure existing functionality still works
-
-### Test Files to Create/Update:
- `tests/test_claude_provider.py` - Main test file for ClaudeProviderHandler
- `tests/test_claude_streaming.py` - Streaming-specific tests
- `tests/test_claude_tools.py` - Tool handling tests
- `tests/test_claude_messages.py` - Message conversion tests
-
---
-
-## Configuration Changes
-
-### New config options to add to `config/providers.json`:
-
-```json
-{
-  "claude_config": {
-    "enable_thinking": true,
-    "thinking_budget_tokens": 16000,
-    "enable_prompt_caching": true,
-    "max_tool_result_chars": 100000,
-    "fallback_models": [],
-    "max_image_size_bytes": 5242880
-  }
-}
-```
-
---
-
-## Risk Assessment
-
-| Item | Risk | Mitigation |
-|------|------|------------|
-| Tool Call ID Sanitization | Low - purely additive change | Unit tests for edge cases |
-| Empty Content Filtering | Low - filters invalid data | Unit tests for edge cases |
-| Prompt Caching | Medium - may affect response quality | Config option, default off |
-| Thinking Block Support | Medium - new API feature | Config option, beta header required |
-| Tool Call Streaming | Medium - complex parsing | Extensive streaming tests |
-| Usage Metadata | Low - additive change | Unit tests |
-| Message Validation | Low - validation only | Unit tests for invalid inputs |
-| Tool Result Size | Low - truncation only | Config option, clear truncation notice |
-| Model Fallback | Medium - may increase costs | Config option, limited attempts |
-| Image Support | Medium - new feature | Config option, size limits |
-
---
-
-## Rollout Plan
-
-1. **Week 1:** Phase 1 items (Tool Call ID, Empty Content, Prompt Caching)
-2. **Week 2:** Phase 2 items (Thinking, Streaming, Usage Metadata)
-3. **Week 3:** Phase 3 items (Validation, Size Limits, Fallback)
-4. **Week 4:** Phase 4 items (Image Support) + Testing + Documentation
-
-Each phase should be:
- Implemented
- Tested
- Reviewed
- Merged to main
- Deployed to staging
- Validated before proceeding to next phase
--- a/docs/claude_provider_improvements.md
+++ b/docs/claude_provider_improvements.md
-# Claude Provider: Improvements & SDK Migration Analysis
-
-**Date:** 2026-04-01  
-**Author:** AI Assistant
-
---
-
-## Executive Summary
-
-This document analyzes potential improvements for the AISBF Claude provider and evaluates the trade-offs of migrating from direct HTTP (`httpx`) to the official Anthropic Python SDK.
-
---
-
-## 1. Current Architecture Assessment
-
-### What We Do Well:
- **Direct HTTP control**: Full control over request/response lifecycle
- **OAuth2 integration**: Custom auth flow matching Claude Code's OAuth2
- **Streaming SSE parsing**: Manual SSE parsing gives fine-grained control
- **OpenAI format conversion**: Complete OpenAI ↔ Anthropic translation
- **Fallback retry logic**: Model fallback with exponential backoff
-
-### Current Limitations:
- Manual message format conversion (error-prone)
- No automatic retry on transient errors
- Missing advanced SDK features (automatic token counting, etc.)
- Temperature/thinking conflict handling (just fixed)
-
---
-
-## 2. Recommended Improvements (Without SDK Migration)
-
-### 2.1 Message Validation Pipeline
-**Priority:** HIGH  
-**Effort:** MEDIUM
-
-Implement a comprehensive message validation pipeline similar to vendors/kilocode:
-
-```python
-def validate_and_normalize_messages(self, messages: List[Dict]) -> List[Dict]:
-    """Complete message validation pipeline."""
-    # 1. Empty content filtering
-    messages = self._filter_empty_content_blocks(messages)
-    
-    # 2. Tool call ID sanitization
-    messages = self._sanitize_tool_call_ids(messages)
-    
-    # 3. Role alternation enforcement
-    messages = self._ensure_alternating_roles(messages)
-    
-    # 4. Tool result pairing
-    messages = self._ensure_tool_result_pairing(messages)
-    
-    # 5. Thinking block preservation
-    messages = self._preserve_thinking_blocks(messages)
-    
-    # 6. Media limit enforcement (100 items max)
-    messages = self._enforce_media_limits(messages)
-    
-    return messages
-```
-
-**Benefits:**
- Prevents 400 errors from malformed messages
- Matches vendors/kilocode robustness
- Reduces API rejection rate
-
-### 2.2 Automatic Retry with Exponential Backoff
-**Priority:** HIGH  
-**Effort:** LOW
-
-Add automatic retry for transient errors (529, 503, rate limits):
-
-```python
-async def _request_with_retry(self, api_url, payload, headers, max_retries=3):
-    """Request with automatic retry and exponential backoff."""
-    for attempt in range(max_retries):
-        try:
-            response = await self.client.post(api_url, headers=headers, json=payload)
-            
-            if response.status_code == 429:
-                wait_time = self._parse_retry_after(response.headers)
-                await asyncio.sleep(wait_time)
-                continue
-            
-            if response.status_code in (529, 503):
-                wait_time = min(2 ** attempt + random.uniform(0, 1), 30)
-                await asyncio.sleep(wait_time)
-                continue
-            
-            return response
-            
-        except httpx.TimeoutException:
-            if attempt < max_retries - 1:
-                await asyncio.sleep(2 ** attempt)
-                continue
-            raise
-```
-
-**Benefits:**
- Handles transient overload errors automatically
- Respects `x-should-retry: true` header
- Reduces user-facing errors
-
-### 2.3 Temperature/Thinking Conflict Resolution
-**Priority:** HIGH (ALREADY FIXED)  
-**Effort:** DONE
-
-Fixed in commit 2559e2f - skip temperature 0.0 when thinking beta is active.
-
-### 2.4 Streaming Idle Watchdog
-**Priority:** MEDIUM  
-**Effort:** LOW
-
-Add timeout detection for hung streams (matching vendors/claude):
-
-```python
-STREAM_IDLE_TIMEOUT = 90.0  # seconds
-
-async def _stream_with_watchdog(self, response):
-    """Stream with idle timeout detection."""
-    last_event_time = time.time()
-    
-    async for line in response.aiter_lines():
-        if time.time() - last_event_time > STREAM_IDLE_TIMEOUT:
-            raise TimeoutError(f"Stream idle for {STREAM_IDLE_TIMEOUT}s")
-        last_event_time = time.time()
-        yield line
-```
-
-**Benefits:**
- Detects hung connections quickly
- Prevents indefinite hangs
- Matches vendors/claude behavior
-
-### 2.5 Token Counting and Context Management
-**Priority:** MEDIUM  
-**Effort:** MEDIUM
-
-Add automatic token counting for context window management:
-
-```python
-def _count_tokens(self, messages: List[Dict], model: str) -> int:
-    """Count tokens in messages for context window management."""
-    # Use tiktoken or anthropic's token counting
-    # Track cumulative token usage
-    # Warn when approaching context limits
-    pass
-```
-
-**Benefits:**
- Prevents context window exceeded errors
- Enables automatic compaction decisions
- Better resource management
-
-### 2.6 Cache Token Tracking
-**Priority:** LOW  
-**Effort:** LOW
-
-Track cache hit/miss rates for analytics:
-
-```python
-def _track_cache_usage(self, usage: Dict):
-    """Track prompt cache usage for analytics."""
-    cache_read = usage.get('cache_read_input_tokens', 0)
-    cache_creation = usage.get('cache_creation_input_tokens', 0)
-    
-    if cache_read > 0:
-        self.cache_hits += 1
-        self.cache_tokens_read += cache_read
-    if cache_creation > 0:
-        self.cache_misses += 1
-        self.cache_tokens_created += cache_creation
-```
-
---
-
-## 3. SDK Migration Analysis
-
-### 3.1 Official Anthropic Python SDK
-
-**Package:** `anthropic` (already in requirements.txt)  
-**Current Usage:** Only for `AnthropicProviderHandler`, not for `ClaudeProviderHandler`
-
-#### Pros of SDK Migration:
-
-1. **Automatic Message Validation**
-   - SDK validates messages before sending
-   - Catches format errors early
-   - Reduces 400 errors
-
-2. **Built-in Retry Logic**
-   - SDK has automatic retry for transient errors
-   - Configurable retry strategies
-   - Handles rate limits gracefully
-
-3. **Token Counting**
-   - SDK can count tokens automatically
-   - No need for external token counting
-   - Accurate token usage tracking
-
-4. **Streaming Abstraction**
-   - SDK handles SSE parsing internally
-   - Cleaner streaming code
-   - Automatic event type handling
-
-5. **Type Safety**
-   - Pydantic models for all request/response types
-   - Better IDE support
-   - Compile-time error detection
-
-6. **Future-Proof**
-   - SDK updates with new API features
-   - Less maintenance burden
-   - Official support from Anthropic
-
-#### Cons of SDK Migration:
-
-1. **OAuth2 Token Handling**
-   - SDK expects API keys, not OAuth2 tokens
-   - May need custom auth implementation
-   - Current direct HTTP works well with OAuth2
-
-2. **Loss of Fine-Grained Control**
-   - SDK abstracts away some control
-   - Custom headers may be harder to set
-   - Beta header management through SDK
-
-3. **Dependency on SDK Version**
-   - SDK updates may break compatibility
-   - Need to track SDK releases
-   - Potential breaking changes
-
-4. **Streaming Differences**
-   - SDK streaming uses different abstraction
-   - May need to rewrite streaming logic
-   - Current SSE parsing works well
-
-### 3.2 Hybrid Approach (Recommended)
-
-Use SDK for non-streaming requests, keep direct HTTP for streaming:
-
-```python
-class ClaudeProviderHandler(BaseProviderHandler):
-    def __init__(self, ...):
-        # SDK client for non-streaming
-        self.sdk_client = Anthropic(
-            api_key=self._get_oauth_token(),
-            base_url="https://api.anthropic.com"
-        )
-        # HTTP client for streaming
-        self.http_client = httpx.AsyncClient(...)
-    
-    async def handle_request(self, ..., stream=False):
-        if stream:
-            return await self._handle_streaming_http(...)
-        else:
-            return await self._handle_non_streaming_sdk(...)
-```
-
-**Benefits:**
- Best of both worlds
- SDK validation for non-streaming
- Full control for streaming
- Gradual migration path
-
---
-
-## 4. Implementation Priority
-
-### Phase 1: Quick Wins (1-2 days)
-1. ✅ Temperature/thinking conflict fix (DONE)
-2. Automatic retry with exponential backoff
-3. Streaming idle watchdog
-
-### Phase 2: Robustness (3-5 days)
-4. Message validation pipeline
-5. Token counting and context management
-6. Cache token tracking
-
-### Phase 3: SDK Evaluation (1-2 weeks)
-7. Prototype SDK integration for non-streaming
-8. Compare error rates and performance
-9. Decide on full migration or hybrid approach
-
---
-
-## 5. Recommendation
-
-**Do NOT migrate to SDK immediately.** Instead:
-
-1. **Implement the quick wins first** - These provide immediate value with minimal effort
-2. **Build the message validation pipeline** - This addresses the most common error source
-3. **Evaluate SDK after Phase 2** - Once our implementation is robust, evaluate if SDK adds value
-
-**Rationale:**
- Our direct HTTP approach gives us full control over OAuth2
- We've already implemented most SDK features manually
- SDK migration would be a significant rewrite with uncertain benefits
- The hybrid approach adds complexity without clear advantages
-
-**When to reconsider SDK:**
- If Anthropic adds features we can't easily implement manually
- If SDK becomes the only way to access new API features
- If maintenance burden of manual implementation becomes too high
-
---
-
-## 6. Comparison: Our Implementation vs SDK
-
-| Feature | Our Implementation | SDK | Gap |
-|---------|-------------------|-----|-----|
-| Message Validation | Manual (Phase 2) | Automatic | Medium |
-| Retry Logic | Manual fallback | Built-in | Low |
-| Token Counting | External | Built-in | Medium |
-| Streaming | Manual SSE | SDK abstraction | Low |
-| OAuth2 Support | Custom | Requires workaround | High |
-| Type Safety | Dict-based | Pydantic models | Medium |
-| Beta Headers | Manual | SDK config | Low |
-| Error Handling | Custom | SDK exceptions | Low |
-
-**Overall Assessment:** Our implementation is 80% as robust as SDK, with better OAuth2 support. The remaining 20% can be achieved with the recommended improvements without SDK migration.
--- a/main.py
+++ b/main.py
@@ -5699,12 +5699,13 @@ async def dashboard_claude_auth_start(request: Request):
        
        # Build OAuth2 URL (Claude requires full scope set)
        auth_params = {
+            "code": "true",
            "client_id": auth.CLIENT_ID,
            "response_type": "code",
            "code_challenge": challenge,
            "code_challenge_method": "S256",
            "redirect_uri": auth.REDIRECT_URI,
-            "scope": "user:profile user:inference user:sessions:claude_code user:mcp_servers user:file_upload",
+            "scope": "org:create_api_key user:profile user:inference user:sessions:claude_code user:mcp_servers user:file_upload",
            "state": state
        }
        auth_url = f"{auth.AUTH_URL}?{'&'.join(f'{k}={v}' for k, v in auth_params.items())}"

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "aisbf"
-version = "0.9.1"
+version = "0.9.2"
 description = "AISBF - AI Service Broker Framework || AI Should Be Free - A modular proxy server for managing multiple AI provider integrations"
 readme = "README.md"
 license = "GPL-3.0-or-later"