Commits · b2aca70971ce72cfe09e7ed4ef189b5b1be2946c · nexlab / aisbf

08 Feb, 2026 16 commits

feat: Add tool call parsing for 'tool: {...}' text format · b2aca709

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Detect 'tool: {...}' pattern in Google model text responses
- Parse and convert to OpenAI-compatible tool_calls format
- Extract assistant text from 'assistant: [...]' format if present
- Handle both 'action' and 'name' fields for tool identification
- Convert arguments to JSON string for OpenAI compatibility

This fixes issues where models return tool calls as text instead of
using proper function_call attributes.

b2aca709

fix: Revert Google streaming to yield raw chunk objects · e1e0092d

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Google provider now yields raw chunk objects instead of pre-formatted SSE bytes
- The handlers.py handles the conversion to OpenAI-compatible format
- This fixes the issue where clients weren't receiving streaming responses

Note: Server must be restarted to pick up this change

e1e0092d

fix: Add missing import for count_messages_tokens in providers.py · 9d95c435

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Import count_messages_tokens from utils module
- Fixes 'name count_messages_tokens is not defined' error in Google streaming handler

9d95c435

feat: Add dedicated condensation provider/model configuration · a41de233

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Add 'condensation' section to providers.json for specifying dedicated provider/model
- Add CondensationConfig model to config.py
- Add _load_condensation() and get_condensation() methods
- Update ContextManager to use dedicated condensation handler when configured
- Update handlers to pass condensation config to ContextManager
- Allows using smaller/faster model for context condensation operations

This addresses the issue where conversational and semantic condensation
methods were using the same model as the main request, which was
inefficient. Now users can configure a dedicated provider and model
for condensation operations, typically using a smaller/faster model to reduce
costs and improve performance.

a41de233

fix: Properly parse tool calls in Google streaming responses · acce04f1

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Accumulate all streaming chunks before parsing
- Parse complete response at end of stream
- Detect and convert tool calls from accumulated text content
- Fixes issue where tool calls were returned as text instead of tool_calls structure

acce04f1

fix: Parse tool calls from text content in Google provider · 7daa1c22

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Add logic to detect and parse tool calls from text content
- Some models return tool calls as JSON in text instead of using function_call attribute
- Handles both Google-style (action) and OpenAI-style (function/name) tool calls
- Clears response_text when tool_calls are detected

7daa1c22

fix: Correct notifyerrors field name in handlers.py · 0798eecd

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Changed all references from 'notify_errors' to 'notifyerrors' to match RotationConfig model
- Fixes issue where notifyerrors setting was not being properly detected

0798eecd

fix: Add notifyerrors field to RotationConfig model · 2a522121

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Add notifyerrors field with default value False
- Fixes issue where notifyerrors was always detected as False
- Allows rotation to return error as normal message instead of HTTP 503

2a522121

fix: Add stream parameter check to 'All retries exhausted' case · e8bd3dea

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Get stream parameter from request_data to determine response type
- Return StreamingResponse if original request was streaming
- Return dict if original request was non-streaming
- Fixes notifyerrors not working for streaming requests in retry exhausted case

e8bd3dea

fix: Return correct response type based on original request mode · 7e5dc1ec

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Add stream parameter to handle_rotation_request()
- When notifyerrors is enabled, return StreamingResponse if original request was streaming
- Return dict if original request was non-streaming
- Fixes issue where autoselect handler expects StreamingResponse but was getting dict

7e5dc1ec

fix: Use getattr() instead of .get() for rotation_config · b3a547e1

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- RotationConfig is a Pydantic model, not a dictionary
- Use getattr() to safely access notifyerrors attribute
- Fixes AttributeError when accessing rotation_config attributes

b3a547e1

feat: Add notifyerrors configuration to rotations.json · 3beaded8

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Add 'notifyerrors' field to rotation configuration (default: false)
- When enabled, return errors as normal messages instead of HTTP 503
- Allows clients to consume error messages normally without HTTP errors
- Update handlers.py to check notifyerrors setting and return appropriate response

3beaded8

chore: Bump version to 0.3.3 · 56d1b65e
Stefy Lanza (nextime / spora ) authored Feb 08, 2026

56d1b65e

feat: Improve error messages when no models are available in rotation · ed17ff5c

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Add detailed error information including provider status
- Show cooldown remaining time for rate-limited providers
- Display failure counts for each provider
- Provide structured error response with rotation_id, attempted models, and details

ed17ff5c

fix: Resolve circular import between context.py and handlers.py · 290892ae

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Move RotationHandler import inside method to avoid circular dependency
- Import RotationHandler only when needed in ContextManager.__init__

290892ae

feat: Add condensation configuration with provider/model/rotation support (v0.3.2) · fe7b1fd9

Stefy Lanza (nextime / spora ) authored Feb 08, 2026

- Add 'condensation' section to providers.json for dedicated provider/model
- Support rotation-based condensation by specifying rotation ID in model field
- Update ContextManager to use dedicated condensation handler
- Update handlers to pass condensation configuration
- Bump version to 0.3.2

fe7b1fd9

07 Feb, 2026 13 commits

Update wrapper script · d9772366
Stefy Lanza (nextime / spora ) authored Feb 07, 2026

d9772366

Fix effective_context variable scope and calculate total tokens for streaming responses · 8f07c660

Stefy Lanza (nextime / spora ) authored Feb 07, 2026

- Pass effective_context as parameter to stream_generator functions
- Update _create_streaming_response signature to accept effective_context
- Update all calls to _create_streaming_response to pass effective_context
- Track accumulated response text for token counting in streaming
- Calculate completion tokens for Google responses (since Google doesn't provide them)
- Calculate completion tokens for non-Google providers when they don't provide token counts
- Include prompt_tokens, completion_tokens, total_tokens, and effective_context in final chunk
- Fixes 'name effective_context is not defined' error in streaming responses
- Fixes issue where streaming responses had null token counts

8f07c660

Enable WAL mode for concurrent database access · d9cdab8b

Stefy Lanza (nextime / spora ) authored Feb 07, 2026

- Enable PRAGMA journal_mode=WAL for better concurrent access
- Set PRAGMA busy_timeout=5000 (5 seconds) for concurrent access
- WAL mode allows multiple readers and one writer simultaneously

d9cdab8b

Initialize database at application startup · 7628a6be

Stefy Lanza (nextime / spora ) authored Feb 07, 2026

- Import initialize_database from aisbf.database
- Call initialize_database() in main() to create/recreate database
- Clean up old token usage records to prevent database bloat

7628a6be

Add SQLite database module for persistent tracking · 2733f49f

Stefy Lanza (nextime / spora ) authored Feb 07, 2026

- Create aisbf/database.py with DatabaseManager class
- Track context dimensions (context_size, condense_context, condense_method, effective_context)
- Track token usage for rate limiting (TPM, TPH, TPD)
- Auto-create database at ~/.aisbf/aisbf.db if it doesn't exist
- Clean up old token usage records to prevent database bloat
- Export database module in __init__.py
- Update setup.py to include database.py in package data

2733f49f

Update setup.py to include context.py and utils.py modules in package data · ab62c97b
Stefy Lanza (nextime / spora ) authored Feb 07, 2026

ab62c97b

Add context management feature with automatic condensation · 55a8311f

Stefy Lanza (nextime / spora ) authored Feb 07, 2026

- Add context_size, condense_context, and condense_method fields to Model class
- Create new context.py module with ContextManager and condensation methods
- Implement hierarchical, conversational, semantic, and algoritmic condensation
- Calculate and report effective_context for all requests
- Update handlers.py to apply context condensation when configured
- Update providers.json and rotations.json with example context configurations
- Update README.md and DOCUMENTATION.md with context management documentation
- Export context module and utilities in __init__.py

55a8311f

Rate limitng and message splitting · 8bad912b
Stefy Lanza (nextime / spora ) authored Feb 07, 2026

8bad912b
Now it works! · e5494efd
Stefy Lanza (nextime / spora ) authored Feb 07, 2026

e5494efd
Fix streaming chunk serialization error · 49e14347
Stefy Lanza (nextime / spora ) authored Feb 07, 2026

49e14347
Fix streaming request handling for GoogleProviderHandler · e9a244cd
Stefy Lanza (nextime / spora ) authored Feb 07, 2026

e9a244cd
Add googletest rotation to rotations.json · 8a701a57
Stefy Lanza (nextime / spora ) authored Feb 07, 2026

8a701a57
Update test proxy to use /api/rotations/chat/completions with googletest model · 3ba34e24
Stefy Lanza (nextime / spora ) authored Feb 07, 2026

3ba34e24

06 Feb, 2026 11 commits

Fix streaming response handling for OpenAI async iterators · 9333666f
Stefy Lanza (nextime / spora ) authored Feb 06, 2026

9333666f
Fix: Import time module and improve stream type detection · 0e5fab02
Stefy Lanza (nextime / spora ) authored Feb 06, 2026

0e5fab02
Fix: Create new Google GenAI client for each streaming request · b6b35f58
Stefy Lanza (nextime / spora ) authored Feb 06, 2026

b6b35f58
Fix Google GenAI streaming response handling · dc79f93a
Stefy Lanza (nextime / spora ) authored Feb 06, 2026

dc79f93a

Fix Google GenAI streaming response handling · 77c08ee2

Stefy Lanza (nextime / spora ) authored Feb 06, 2026

- Return Google's synchronous iterator directly from provider handler
- Detect Google streaming responses by checking for __iter__ but not __aiter__
- Convert Google chunks to OpenAI format in stream_generator
- Handle both sync (Google) and async (OpenAI/Anthropic) streaming responses
- Fix 'async_generator object is not iterable' error

This fixes streaming requests through autoselect and rotation handlers
that were failing with 'async_generator' object is not iterable error.

77c08ee2

Fix Google GenAI streaming handler to use async generator · 81e9a8f5

Stefy Lanza (nextime / spora ) authored Feb 06, 2026

- Keep stream_generator as async function (not sync)
- Wrap Google's synchronous iterator in async generator
- Properly structure if/else for streaming vs non-streaming paths
- Fix 'client has been closed' error in streaming responses

This fixes the issue where streaming requests through autoselect
were failing with 'Cannot send a request, as a client has been closed'
error.

81e9a8f5

Fix streaming chunk structure for Google GenAI · 63268f97

Stefy Lanza (nextime / spora ) authored Feb 06, 2026

- Ensure complete chunk object is yielded as single unit
- Add logging to show complete chunk structure
- Fix issue where chunk was being serialized as separate fields
- Maintain OpenAI-compatible chat.completion.chunk format

This should fix the streaming issue where chunks were being
serialized as separate data: lines instead of complete
JSON objects.

63268f97

Implement streaming support for Google GenAI provider · 8360e33b

Stefy Lanza (nextime / spora ) authored Feb 06, 2026

- Use generate_content_stream() for streaming requests
- Create async generator that yields OpenAI-compatible chunks
- Extract text from each stream chunk
- Generate unique chunk IDs
- Format chunks as chat.completion.chunk objects
- Include delta content in each chunk
- Maintain non-streaming functionality for regular requests

This fixes the streaming issue where Google GenAI was returning
a dict instead of an iterable, causing 'JSONResponse object is
not iterable' errors.

8360e33b

Create test script for AISBF proxy · 3c7bec4c

Stefy Lanza (nextime / spora ) authored Feb 06, 2026

- Test non-streaming requests to autoselect endpoint
- Test streaming requests to autoselect endpoint
- Test listing available providers
- Test listing models for autoselect endpoint
- Use model 'autoselect' for autoselect endpoint
- Include jq installation instructions for formatted output

Run with: ./test_proxy.sh

3c7bec4c

Remove Pydantic validation to test serialization · 6a1fc753

Stefy Lanza (nextime / spora ) authored Feb 06, 2026

- Remove ChatCompletionResponse validation from GoogleProviderHandler
- Remove ChatCompletionResponse validation from AnthropicProviderHandler
- Return raw response dict directly
- Add logging to show response dict keys
- This tests if Pydantic validation was causing serialization issues

Testing if removing validation fixes client-side 'Cannot read properties
of undefined' errors.

6a1fc753

Wrap Google and Anthropic provider responses in JSONResponse · 4760277f

Stefy Lanza (nextime / spora ) authored Feb 06, 2026

GoogleProviderHandler:
- Wrap validated response dict in JSONResponse before returning
- Add logging to confirm JSONResponse is being returned
- Ensures proper JSON serialization for Google GenAI responses

AnthropicProviderHandler:
- Wrap validated response dict in JSONResponse before returning
- Add logging to confirm JSONResponse is being returned
- Ensures proper JSON serialization for Anthropic responses

RequestHandler:
- Remove JSONResponse wrapping (now handled by providers)
- Update logging to detect JSONResponse vs dict responses
- OpenAI and Ollama providers return raw dicts (already compatible)

This fixes client-side 'Cannot read properties of undefined' errors by ensuring
Google and Anthropic responses are properly serialized as JSONResponse,
while leaving OpenAI and Ollama responses as-is since they're already
OpenAI-compatible.

4760277f