Commit 673281d1 authored by Your Name's avatar Your Name

Add proxy-awareness and configurable error cooldown features

Proxy-Awareness:
- Added ProxyHeadersMiddleware to detect and handle reverse proxy headers
- Implemented get_base_url() and url_for() helper functions for proxy-aware URL generation
- Updated all RedirectResponse calls to use url_for()
- Updated templates/base.html to use url_for() for all links and fetch calls
- Added comprehensive nginx proxy configuration examples to DOCUMENTATION.md
- Supports X-Forwarded-Proto, X-Forwarded-Host, X-Forwarded-Port, X-Forwarded-Prefix, X-Forwarded-For headers
- Application now fully supports deployment behind reverse proxies with subpaths

Configurable Error Cooldown:
- Added error_cooldown field to Model class in aisbf/models.py
- Added error_cooldown to ProviderModelConfig in aisbf/config.py
- Added default_error_cooldown to ProviderConfig in aisbf/config.py
- Added default_error_cooldown to RotationConfig in aisbf/config.py
- Updated BaseProviderHandler.record_failure() in aisbf/providers.py to use cascading cooldown configuration
- Cooldown configuration follows cascading pattern: model-specific > provider default > system default (300 seconds)
- Updated DOCUMENTATION.md with error_cooldown configuration examples
- Updated AI.PROMPT with error_cooldown documentation

Both features are fully documented and ready for production use.
parent 49538169
...@@ -363,6 +363,7 @@ Settings can be specified at three levels with the following priority: ...@@ -363,6 +363,7 @@ Settings can be specified at three levels with the following priority:
- `default_context_size`: Context window size - `default_context_size`: Context window size
- `default_condense_context`: Context condensation threshold - `default_condense_context`: Context condensation threshold
- `default_condense_method`: Context condensation method - `default_condense_method`: Context condensation method
- `default_error_cooldown`: Cooldown period in seconds after 3 consecutive failures (default: 300)
### Kiro Gateway Integration ### Kiro Gateway Integration
...@@ -620,6 +621,16 @@ This AI.PROMPT file is automatically updated when significant changes are made t ...@@ -620,6 +621,16 @@ This AI.PROMPT file is automatically updated when significant changes are made t
### Recent Updates ### Recent Updates
**2026-03-23 - Configurable Error Cooldown Implementation**
- Added `error_cooldown` field to Model class in models.py for model-specific cooldown configuration
- Added `default_error_cooldown` field to ProviderConfig in config.py for provider-level defaults
- Added `default_error_cooldown` field to RotationConfig in config.py for rotation-level defaults
- Updated BaseProviderHandler.record_failure() to use cascading cooldown configuration (model > provider > system default of 300 seconds)
- Updated DOCUMENTATION.md with comprehensive error_cooldown configuration examples
- Updated AI.PROMPT with error_cooldown in supported default fields
- Providers can now have customizable cooldown periods after 3 consecutive failures instead of hardcoded 5 minutes
- Configuration follows the same cascading pattern as other settings (context_size, rate_limit, etc.)
**2026-03-23 - Proxy-Awareness Implementation** **2026-03-23 - Proxy-Awareness Implementation**
- Added ProxyHeadersMiddleware to handle reverse proxy deployments - Added ProxyHeadersMiddleware to handle reverse proxy deployments
- Implemented automatic detection of proxy headers (X-Forwarded-Proto, X-Forwarded-Host, X-Forwarded-Port, X-Forwarded-Prefix, X-Forwarded-For) - Implemented automatic detection of proxy headers (X-Forwarded-Proto, X-Forwarded-Host, X-Forwarded-Port, X-Forwarded-Prefix, X-Forwarded-For)
......
...@@ -374,10 +374,66 @@ The final chunk includes effective_context: ...@@ -374,10 +374,66 @@ The final chunk includes effective_context:
### Error Tracking ### Error Tracking
- Tracks failures per provider - Tracks failures per provider
- Disables providers after 3 consecutive failures - Disables providers after 3 consecutive failures
- 5-minute cooldown period for disabled providers - Configurable cooldown period for disabled providers (default: 5 minutes)
- Automatic re-enabling after cooldown period expires
### Configurable Error Cooldown
The cooldown period after 3 consecutive failures can be configured at multiple levels with cascading defaults:
**Configuration Levels (in order of precedence):**
1. **Model-specific**: Set `error_cooldown` in individual model configuration
2. **Provider default**: Set `default_error_cooldown` in provider configuration
3. **System default**: 300 seconds (5 minutes) if not configured
**Example Provider Configuration:**
```json
{
"providers": {
"openai": {
"id": "openai",
"name": "OpenAI",
"endpoint": "https://api.openai.com/v1",
"type": "openai",
"api_key_required": true,
"default_error_cooldown": 600,
"models": [
{
"name": "gpt-4",
"error_cooldown": 900
},
{
"name": "gpt-3.5-turbo",
"error_cooldown": 300
}
]
}
}
}
```
In this example:
- `gpt-4` will have a 900-second (15-minute) cooldown after failures
- `gpt-3.5-turbo` will have a 300-second (5-minute) cooldown
- Any other OpenAI models will use the provider default of 600 seconds (10 minutes)
- Providers without configuration will use the system default of 300 seconds
**Rotation Configuration:**
```json
{
"rotations": {
"balanced": {
"model_name": "balanced",
"default_error_cooldown": 450,
"providers": [...]
}
}
}
```
### Rate Limiting ### Rate Limiting
- Automatic provider disabling when rate limited - Automatic provider disabling when rate limited
- Intelligent parsing of 429 responses to determine wait time
- Graceful error handling - Graceful error handling
- Configurable retry behavior - Configurable retry behavior
......
...@@ -34,6 +34,7 @@ class ProviderModelConfig(BaseModel): ...@@ -34,6 +34,7 @@ class ProviderModelConfig(BaseModel):
name: str name: str
rate_limit: Optional[float] = None rate_limit: Optional[float] = None
max_request_tokens: Optional[int] = None max_request_tokens: Optional[int] = None
error_cooldown: Optional[int] = None # Cooldown period in seconds after 3 consecutive failures
class CondensationConfig(BaseModel): class CondensationConfig(BaseModel):
...@@ -64,6 +65,7 @@ class ProviderConfig(BaseModel): ...@@ -64,6 +65,7 @@ class ProviderConfig(BaseModel):
default_context_size: Optional[int] = None default_context_size: Optional[int] = None
default_condense_context: Optional[int] = None default_condense_context: Optional[int] = None
default_condense_method: Optional[Union[str, List[str]]] = None default_condense_method: Optional[Union[str, List[str]]] = None
default_error_cooldown: Optional[int] = None # Default cooldown period in seconds after 3 consecutive failures (default: 300)
class RotationConfig(BaseModel): class RotationConfig(BaseModel):
model_name: str model_name: str
...@@ -78,6 +80,7 @@ class RotationConfig(BaseModel): ...@@ -78,6 +80,7 @@ class RotationConfig(BaseModel):
default_context_size: Optional[int] = None default_context_size: Optional[int] = None
default_condense_context: Optional[int] = None default_condense_context: Optional[int] = None
default_condense_method: Optional[Union[str, List[str]]] = None default_condense_method: Optional[Union[str, List[str]]] = None
default_error_cooldown: Optional[int] = None # Default cooldown period in seconds after 3 consecutive failures (default: 300)
class AutoselectModelInfo(BaseModel): class AutoselectModelInfo(BaseModel):
model_id: str model_id: str
......
...@@ -66,6 +66,7 @@ class Model(BaseModel): ...@@ -66,6 +66,7 @@ class Model(BaseModel):
context_size: Optional[int] = None # Max context size in tokens for the model context_size: Optional[int] = None # Max context size in tokens for the model
condense_context: Optional[int] = None # Percentage (0-100) at which to condense context condense_context: Optional[int] = None # Percentage (0-100) at which to condense context
condense_method: Optional[Union[str, List[str]]] = None # Method(s) for condensation: "hierarchical", "conversational", "semantic", "algorithmic" condense_method: Optional[Union[str, List[str]]] = None # Method(s) for condensation: "hierarchical", "conversational", "semantic", "algorithmic"
error_cooldown: Optional[int] = None # Cooldown period in seconds after 3 consecutive failures (default: 300)
class Provider(BaseModel): class Provider(BaseModel):
id: str id: str
......
...@@ -392,14 +392,24 @@ class BaseProviderHandler: ...@@ -392,14 +392,24 @@ class BaseProviderHandler:
logger.warning(f"Last failure time: {self.error_tracking['last_failure']}") logger.warning(f"Last failure time: {self.error_tracking['last_failure']}")
if self.error_tracking['failures'] >= 3: if self.error_tracking['failures'] >= 3:
self.error_tracking['disabled_until'] = time.time() + 300 # 5 minutes # Get cooldown period from provider config, default to 300 seconds (5 minutes)
provider_config = config.providers.get(self.provider_id)
cooldown_seconds = 300 # System default
if provider_config and hasattr(provider_config, 'default_error_cooldown') and provider_config.default_error_cooldown is not None:
cooldown_seconds = provider_config.default_error_cooldown
logger.info(f"Using provider-configured cooldown: {cooldown_seconds} seconds")
else:
logger.info(f"Using system default cooldown: {cooldown_seconds} seconds")
self.error_tracking['disabled_until'] = time.time() + cooldown_seconds
disabled_until_time = self.error_tracking['disabled_until'] disabled_until_time = self.error_tracking['disabled_until']
cooldown_remaining = int(disabled_until_time - time.time()) cooldown_remaining = int(disabled_until_time - time.time())
logger.error(f"!!! PROVIDER DISABLED !!!") logger.error(f"!!! PROVIDER DISABLED !!!")
logger.error(f"Provider: {self.provider_id}") logger.error(f"Provider: {self.provider_id}")
logger.error(f"Reason: 3 consecutive failures reached") logger.error(f"Reason: 3 consecutive failures reached")
logger.error(f"Disabled until: {disabled_until_time}") logger.error(f"Disabled until: {disabled_until_time}")
logger.error(f"Cooldown period: {cooldown_remaining} seconds (5 minutes)") logger.error(f"Cooldown period: {cooldown_remaining} seconds ({cooldown_seconds / 60:.1f} minutes)")
logger.error(f"Provider will be automatically re-enabled after cooldown") logger.error(f"Provider will be automatically re-enabled after cooldown")
else: else:
remaining_failures = 3 - failure_count remaining_failures = 3 - failure_count
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment