- 08 Feb, 2026 16 commits
-
-
Stefy Lanza (nextime / spora ) authored
- Detect 'tool: {...}' pattern in Google model text responses - Parse and convert to OpenAI-compatible tool_calls format - Extract assistant text from 'assistant: [...]' format if present - Handle both 'action' and 'name' fields for tool identification - Convert arguments to JSON string for OpenAI compatibility This fixes issues where models return tool calls as text instead of using proper function_call attributes. -
Stefy Lanza (nextime / spora ) authored
- Google provider now yields raw chunk objects instead of pre-formatted SSE bytes - The handlers.py handles the conversion to OpenAI-compatible format - This fixes the issue where clients weren't receiving streaming responses Note: Server must be restarted to pick up this change
-
Stefy Lanza (nextime / spora ) authored
- Import count_messages_tokens from utils module - Fixes 'name count_messages_tokens is not defined' error in Google streaming handler
-
Stefy Lanza (nextime / spora ) authored
- Add 'condensation' section to providers.json for specifying dedicated provider/model - Add CondensationConfig model to config.py - Add _load_condensation() and get_condensation() methods - Update ContextManager to use dedicated condensation handler when configured - Update handlers to pass condensation config to ContextManager - Allows using smaller/faster model for context condensation operations This addresses the issue where conversational and semantic condensation methods were using the same model as the main request, which was inefficient. Now users can configure a dedicated provider and model for condensation operations, typically using a smaller/faster model to reduce costs and improve performance.
-
Stefy Lanza (nextime / spora ) authored
- Accumulate all streaming chunks before parsing - Parse complete response at end of stream - Detect and convert tool calls from accumulated text content - Fixes issue where tool calls were returned as text instead of tool_calls structure
-
Stefy Lanza (nextime / spora ) authored
- Add logic to detect and parse tool calls from text content - Some models return tool calls as JSON in text instead of using function_call attribute - Handles both Google-style (action) and OpenAI-style (function/name) tool calls - Clears response_text when tool_calls are detected
-
Stefy Lanza (nextime / spora ) authored
- Changed all references from 'notify_errors' to 'notifyerrors' to match RotationConfig model - Fixes issue where notifyerrors setting was not being properly detected
-
Stefy Lanza (nextime / spora ) authored
- Add notifyerrors field with default value False - Fixes issue where notifyerrors was always detected as False - Allows rotation to return error as normal message instead of HTTP 503
-
Stefy Lanza (nextime / spora ) authored
- Get stream parameter from request_data to determine response type - Return StreamingResponse if original request was streaming - Return dict if original request was non-streaming - Fixes notifyerrors not working for streaming requests in retry exhausted case
-
Stefy Lanza (nextime / spora ) authored
- Add stream parameter to handle_rotation_request() - When notifyerrors is enabled, return StreamingResponse if original request was streaming - Return dict if original request was non-streaming - Fixes issue where autoselect handler expects StreamingResponse but was getting dict
-
Stefy Lanza (nextime / spora ) authored
- RotationConfig is a Pydantic model, not a dictionary - Use getattr() to safely access notifyerrors attribute - Fixes AttributeError when accessing rotation_config attributes
-
Stefy Lanza (nextime / spora ) authored
- Add 'notifyerrors' field to rotation configuration (default: false) - When enabled, return errors as normal messages instead of HTTP 503 - Allows clients to consume error messages normally without HTTP errors - Update handlers.py to check notifyerrors setting and return appropriate response
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- Add detailed error information including provider status - Show cooldown remaining time for rate-limited providers - Display failure counts for each provider - Provide structured error response with rotation_id, attempted models, and details
-
Stefy Lanza (nextime / spora ) authored
- Move RotationHandler import inside method to avoid circular dependency - Import RotationHandler only when needed in ContextManager.__init__
-
Stefy Lanza (nextime / spora ) authored
- Add 'condensation' section to providers.json for dedicated provider/model - Support rotation-based condensation by specifying rotation ID in model field - Update ContextManager to use dedicated condensation handler - Update handlers to pass condensation configuration - Bump version to 0.3.2
-
- 07 Feb, 2026 13 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- Pass effective_context as parameter to stream_generator functions - Update _create_streaming_response signature to accept effective_context - Update all calls to _create_streaming_response to pass effective_context - Track accumulated response text for token counting in streaming - Calculate completion tokens for Google responses (since Google doesn't provide them) - Calculate completion tokens for non-Google providers when they don't provide token counts - Include prompt_tokens, completion_tokens, total_tokens, and effective_context in final chunk - Fixes 'name effective_context is not defined' error in streaming responses - Fixes issue where streaming responses had null token counts
-
Stefy Lanza (nextime / spora ) authored
- Enable PRAGMA journal_mode=WAL for better concurrent access - Set PRAGMA busy_timeout=5000 (5 seconds) for concurrent access - WAL mode allows multiple readers and one writer simultaneously
-
Stefy Lanza (nextime / spora ) authored
- Import initialize_database from aisbf.database - Call initialize_database() in main() to create/recreate database - Clean up old token usage records to prevent database bloat
-
Stefy Lanza (nextime / spora ) authored
- Create aisbf/database.py with DatabaseManager class - Track context dimensions (context_size, condense_context, condense_method, effective_context) - Track token usage for rate limiting (TPM, TPH, TPD) - Auto-create database at ~/.aisbf/aisbf.db if it doesn't exist - Clean up old token usage records to prevent database bloat - Export database module in __init__.py - Update setup.py to include database.py in package data
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- Add context_size, condense_context, and condense_method fields to Model class - Create new context.py module with ContextManager and condensation methods - Implement hierarchical, conversational, semantic, and algoritmic condensation - Calculate and report effective_context for all requests - Update handlers.py to apply context condensation when configured - Update providers.json and rotations.json with example context configurations - Update README.md and DOCUMENTATION.md with context management documentation - Export context module and utilities in __init__.py
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
- 06 Feb, 2026 11 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- Return Google's synchronous iterator directly from provider handler - Detect Google streaming responses by checking for __iter__ but not __aiter__ - Convert Google chunks to OpenAI format in stream_generator - Handle both sync (Google) and async (OpenAI/Anthropic) streaming responses - Fix 'async_generator object is not iterable' error This fixes streaming requests through autoselect and rotation handlers that were failing with 'async_generator' object is not iterable error.
-
Stefy Lanza (nextime / spora ) authored
- Keep stream_generator as async function (not sync) - Wrap Google's synchronous iterator in async generator - Properly structure if/else for streaming vs non-streaming paths - Fix 'client has been closed' error in streaming responses This fixes the issue where streaming requests through autoselect were failing with 'Cannot send a request, as a client has been closed' error.
-
Stefy Lanza (nextime / spora ) authored
- Ensure complete chunk object is yielded as single unit - Add logging to show complete chunk structure - Fix issue where chunk was being serialized as separate fields - Maintain OpenAI-compatible chat.completion.chunk format This should fix the streaming issue where chunks were being serialized as separate data: lines instead of complete JSON objects.
-
Stefy Lanza (nextime / spora ) authored
- Use generate_content_stream() for streaming requests - Create async generator that yields OpenAI-compatible chunks - Extract text from each stream chunk - Generate unique chunk IDs - Format chunks as chat.completion.chunk objects - Include delta content in each chunk - Maintain non-streaming functionality for regular requests This fixes the streaming issue where Google GenAI was returning a dict instead of an iterable, causing 'JSONResponse object is not iterable' errors.
-
Stefy Lanza (nextime / spora ) authored
- Test non-streaming requests to autoselect endpoint - Test streaming requests to autoselect endpoint - Test listing available providers - Test listing models for autoselect endpoint - Use model 'autoselect' for autoselect endpoint - Include jq installation instructions for formatted output Run with: ./test_proxy.sh
-
Stefy Lanza (nextime / spora ) authored
- Remove ChatCompletionResponse validation from GoogleProviderHandler - Remove ChatCompletionResponse validation from AnthropicProviderHandler - Return raw response dict directly - Add logging to show response dict keys - This tests if Pydantic validation was causing serialization issues Testing if removing validation fixes client-side 'Cannot read properties of undefined' errors.
-
Stefy Lanza (nextime / spora ) authored
GoogleProviderHandler: - Wrap validated response dict in JSONResponse before returning - Add logging to confirm JSONResponse is being returned - Ensures proper JSON serialization for Google GenAI responses AnthropicProviderHandler: - Wrap validated response dict in JSONResponse before returning - Add logging to confirm JSONResponse is being returned - Ensures proper JSON serialization for Anthropic responses RequestHandler: - Remove JSONResponse wrapping (now handled by providers) - Update logging to detect JSONResponse vs dict responses - OpenAI and Ollama providers return raw dicts (already compatible) This fixes client-side 'Cannot read properties of undefined' errors by ensuring Google and Anthropic responses are properly serialized as JSONResponse, while leaving OpenAI and Ollama responses as-is since they're already OpenAI-compatible.
-