feat: add RunPod provider runtime management

parent 3156c83c
RunPod implementation recovery plan for next session.
Goal
- Add a new provider type `runpod`
- Support multiple RunPod accounts by allowing multiple AISBF providers of type `runpod`
- Support two modes:
- pod-backed/serverless-backed wrapper provider with one wrapper mode per provider: `openai`, `coderai`, or `ollama`
- `runpod_public` provider represented as one AISBF provider with many discovered models/endpoints
- Auto-start stopped pods on request and wait until ready
- Cache pod/endpoint status in DB/cache so behavior is consistent across multiple AISBF instances
- Stop idle pods after configurable inactivity
- Allow serverless endpoint template usage as an alternative to pod-backed mode
Product decisions already made
- Scope: full lifecycle now
- Wrapper mode:
- pod-backed `runpod` providers store one wrapper mode per provider
- `runpod_public` auto-detects protocol per discovered model, with optional manual override per model
- Cold start behavior: auto start + wait
- `runpod_public` shape: one provider, many discovered models
- Management API preference: use the most recent/current supported RunPod management API surface between GraphQL and REST/OpenAPI
- Do not hardcode GraphQL if REST/OpenAPI is newer
Critical first step next session
- Verify which RunPod management API is the current supported one:
- inspect current REST/OpenAPI docs/spec
- inspect current GraphQL docs/spec
- use whichever is the newer/current supported API surface
- Then map exact operations for:
- pod status/start/stop
- template lookup/use
- endpoint discovery
- serverless endpoint creation/use
- public endpoint metadata and request format
Docs already identified
- `https://docs.runpod.io/api-reference/overview`
- `https://docs.runpod.io/llms.txt`
- `https://docs.runpod.io/public-endpoints/requests`
- `https://rest.runpod.io/v1/openapi.json`
Implementation map in AISBF
- `aisbf/config.py`
- extend `ProviderConfig` with `runpod_config: Optional[Dict] = None`
- `aisbf/providers/__init__.py`
- register new provider type `runpod`
- new file `aisbf/providers/runpod.py`
- main handler/orchestrator
- `templates/dashboard/providers.html`
- add `runpod` provider type option and config UI
- `aisbf/routes/dashboard/providers.py`
- add any RunPod-specific dashboard actions/status endpoints if needed
- `aisbf/app/model_cache.py`
- integrate caching/refresh for `runpod_public` discovered models
- `aisbf/database.py`
- add persistent lifecycle/runtime state for runpod providers
Planned `runpod_config` structure
Example target shape:
```json
{
"mode": "pod",
"wrapper_mode": "openai",
"account_name": "personal-runpod",
"management_api": "auto",
"idle_shutdown_ms": 900000,
"startup_poll_interval_ms": 3000,
"startup_timeout_ms": 300000,
"pod_id": "abc123",
"template_id": "tmpl_xyz",
"endpoint_id": "",
"serverless_template_id": "",
"public_endpoint_protocol_default": "auto",
"public_models": {
"model-slug": {
"protocol": "openai",
"capabilities": ["chat", "vision"]
}
}
}
```
Modes
- `pod`
- `serverless_template`
- `public`
Wrapper modes for non-public
- `openai`
- `ollama`
- `coderai`
Representation rules
- Non-public runpod providers:
- one wrapper mode per provider
- lifecycle managed by AISBF
- `runpod_public`:
- one provider with many discovered models/endpoints
- protocol auto-detected per model
- optional manual override per model in config
Architecture to implement
1. `RunpodProviderHandler` as orchestrator
- It should handle lifecycle and dispatch, not just protocol forwarding
- Responsibilities:
- load `runpod_config`
- ensure pod/endpoint is ready before forwarding requests
- cache status/discovery
- delegate to existing protocol behavior
2. Delegation model
- For pod/serverless-backed providers:
- once ready, speak protocol based on provider-level `wrapper_mode`
- delegate internally to existing handlers:
- `OpenAIProviderHandler`
- `OllamaProviderHandler`
- `CoderAIProviderHandler`
- For `runpod_public`:
- discover public models/endpoints
- resolve protocol per model
- dispatch request using model-specific protocol behavior
3. Readiness lifecycle
- On request for pod-backed provider:
- read cached status from DB/cache
- if running and endpoint known, reuse
- if stopped, start pod
- poll until ready or timeout
- persist status/ready endpoint back to DB/cache
- On request for serverless-template mode:
- resolve or create usable endpoint from template as configured
- cache endpoint metadata
4. Idle shutdown
- Store persistent last-used timestamps and runtime state in DB
- Add background loop that:
- scans runpod provider state
- if `now - last_used_at > idle_shutdown_ms` and provider is pod-backed and running
- stop the pod
- persist updated status
Database work needed
Add a new table in `aisbf/database.py`, e.g. `runpod_provider_state` with fields like:
- `provider_scope` (`global` / `user`)
- `owner_user_id`
- `provider_id`
- `mode`
- `wrapper_mode`
- `resource_id`
- `resource_kind` (`pod`, `endpoint`, `public`)
- `status`
- `endpoint_url`
- `public_catalog_json`
- `metadata`
- `last_used_at`
- `last_status_sync_at`
- `updated_at`
- unique on `(owner_user_id, provider_id)`
Add helpers:
- `get_runpod_provider_state(...)`
- `save_runpod_provider_state(...)`
- `touch_runpod_provider_state(...)`
- `list_runpod_provider_states(...)`
This DB-backed state is required for:
- round-robin multi-instance consistency
- idle shutdown scanning
- readiness caching
- public endpoint discovery caching
Cache/model discovery work
For `runpod_public` in `aisbf/app/model_cache.py`:
- cache discovered public models
- refresh periodically or on-demand
- store enough metadata per model:
- model id/slug
- protocol
- capabilities
- route base
- request mode (`runsync`, `run`, `status`)
- parameter/schema hints if available
Dashboard work
In `templates/dashboard/providers.html`:
- add provider type option: `runpod`
- add description text for `runpod`
- add UI section for `runpod_config`
- likely fields:
- account label
- mode (`pod`, `serverless_template`, `public`)
- wrapper mode (`openai`, `ollama`, `coderai`) for non-public
- API key field if not top-level
- pod id
- template id
- endpoint id
- serverless template id
- idle shutdown ms
- startup timeout ms
- poll interval ms
- auto-discovery toggle
- per-model protocol override editor for public models
Potential server-side additions in `aisbf/routes/dashboard/providers.py`
- refresh RunPod public discovery
- show RunPod lifecycle status
- optional manual start/stop actions later if useful
Protocol behavior plan
1. Pod-backed `openai`
- after pod ready, delegate to OpenAI-compatible request/model list flow
- endpoint likely `/v1/...`
2. Pod-backed `ollama`
- after pod ready, delegate to Ollama flow
- endpoint likely `/api/...`
3. Pod-backed `coderai`
- after pod ready, delegate to CoderAI flow
- endpoint/path depends on service running in the pod
4. `runpod_public`
- public endpoints are not one uniform protocol
- implement model-level protocol metadata
- auto-detect protocol from endpoint metadata/docs/naming where possible
- allow manual override per model
- request path likely uses `https://api.runpod.ai/v2/<endpoint>/...`
- do not fake this part; implement from verified docs only
Suggested next-session execution order
1. Verify RunPod API contract and choose the current supported management API surface
2. Add `runpod_config` to `aisbf/config.py`
3. Add DB-backed `runpod_provider_state` table and helpers in `aisbf/database.py`
4. Create `aisbf/providers/runpod.py`
5. Register `runpod` in `aisbf/providers/__init__.py`
6. Add idle shutdown background task in startup/background task area
7. Add dashboard UI/config save support in `templates/dashboard/providers.html`
8. Hook `runpod_public` discovery into `aisbf/app/model_cache.py`
9. Validate with compile/tests
Recommended tests to add
- config validation for `runpod_config`
- DB CRUD for `runpod_provider_state`
- lifecycle tests:
- stopped pod -> start called
- running pod -> no start
- idle timeout -> stop called
- public model discovery parsing
- protocol selection:
- public model auto-detect
- public model manual override
- delegation tests:
- `wrapper_mode=openai`
- `wrapper_mode=ollama`
- `wrapper_mode=coderai`
Files already reviewed for this work
- `aisbf/config.py`
- `aisbf/providers/__init__.py`
- `aisbf/providers/openai.py`
- `aisbf/providers/ollama.py`
- `aisbf/providers/coderai.py`
- `aisbf/providers/base.py`
- `aisbf/app/model_cache.py`
- `aisbf/routes/dashboard/providers.py`
- `templates/dashboard/providers.html`
Suggested next-session prompt
"Implement full RunPod provider support for AISBF. First determine whether RunPod REST/OpenAPI or GraphQL is the newer/current supported management API, then use that API for pod lifecycle, endpoint discovery, and template/serverless management. Add a new `runpod` provider type with `runpod_config`, DB-backed lifecycle state, auto-start/wait, idle shutdown, wrapper-mode delegation (`openai`, `ollama`, `coderai`), and `runpod_public` as one provider with many discovered models and per-model protocol auto-detect/manual override. Preserve multi-instance consistency by storing lifecycle state in the database."
This diff is collapsed.
...@@ -43,6 +43,7 @@ from .ollama import OllamaProviderHandler ...@@ -43,6 +43,7 @@ from .ollama import OllamaProviderHandler
from .codex import CodexProviderHandler from .codex import CodexProviderHandler
from .coderai import CoderAIProviderHandler from .coderai import CoderAIProviderHandler
from .qwen import QwenProviderHandler from .qwen import QwenProviderHandler
from .runpod import RunpodProviderHandler
from ..config import config from ..config import config
...@@ -57,7 +58,8 @@ PROVIDER_HANDLERS = { ...@@ -57,7 +58,8 @@ PROVIDER_HANDLERS = {
'kilocode': KiloProviderHandler, # Kilocode provider with OAuth2 support 'kilocode': KiloProviderHandler, # Kilocode provider with OAuth2 support
'codex': CodexProviderHandler, # Codex provider with OAuth2 support (OpenAI protocol) 'codex': CodexProviderHandler, # Codex provider with OAuth2 support (OpenAI protocol)
'coderai': CoderAIProviderHandler, # CoderAI provider with HTTP/WebSocket bridge support 'coderai': CoderAIProviderHandler, # CoderAI provider with HTTP/WebSocket bridge support
'qwen': QwenProviderHandler # Qwen provider with OAuth2 support (OpenAI-compatible) 'qwen': QwenProviderHandler, # Qwen provider with OAuth2 support (OpenAI-compatible)
'runpod': RunpodProviderHandler,
} }
......
...@@ -29,15 +29,17 @@ from .base import BaseProviderHandler, AISBF_DEBUG ...@@ -29,15 +29,17 @@ from .base import BaseProviderHandler, AISBF_DEBUG
class OllamaProviderHandler(BaseProviderHandler): class OllamaProviderHandler(BaseProviderHandler):
def __init__(self, provider_id: str, api_key: Optional[str] = None): def __init__(self, provider_id: str, api_key: Optional[str] = None, user_id: Optional[int] = None, provider_config=None):
super().__init__(provider_id, api_key) self.provider_config = provider_config if provider_config is not None else config.providers[provider_id]
super().__init__(provider_id, api_key, user_id=user_id)
timeout = httpx.Timeout( timeout = httpx.Timeout(
connect=60.0, connect=60.0,
read=300.0, read=300.0,
write=60.0, write=60.0,
pool=60.0 pool=60.0
) )
self.client = httpx.AsyncClient(base_url=config.providers[provider_id].endpoint, timeout=timeout) endpoint = self.provider_config.get("endpoint") if isinstance(self.provider_config, dict) else self.provider_config.endpoint
self.client = httpx.AsyncClient(base_url=endpoint, timeout=timeout)
def validate_credentials(self) -> bool: def validate_credentials(self) -> bool:
""" """
......
...@@ -30,9 +30,11 @@ from .base import BaseProviderHandler, AISBF_DEBUG ...@@ -30,9 +30,11 @@ from .base import BaseProviderHandler, AISBF_DEBUG
class OpenAIProviderHandler(BaseProviderHandler): class OpenAIProviderHandler(BaseProviderHandler):
def __init__(self, provider_id: str, api_key: str): def __init__(self, provider_id: str, api_key: str, user_id: Optional[int] = None, provider_config=None):
super().__init__(provider_id, api_key) self.provider_config = provider_config if provider_config is not None else config.providers[provider_id]
self.client = OpenAI(base_url=config.providers[provider_id].endpoint, api_key=api_key) super().__init__(provider_id, api_key, user_id=user_id)
endpoint = self.provider_config.get("endpoint") if isinstance(self.provider_config, dict) else self.provider_config.endpoint
self.client = OpenAI(base_url=endpoint, api_key=api_key)
def validate_credentials(self) -> bool: def validate_credentials(self) -> bool:
"""Validate OpenAI API key presence.""" """Validate OpenAI API key presence."""
......
This diff is collapsed.
...@@ -16,6 +16,7 @@ from aisbf.app.startup import _reload_global_config, _apply_condense_defaults_pr ...@@ -16,6 +16,7 @@ from aisbf.app.startup import _reload_global_config, _apply_condense_defaults_pr
from aisbf.app.middleware import _is_local_client from aisbf.app.middleware import _is_local_client
from aisbf.app.model_cache import fetch_provider_models from aisbf.app.model_cache import fetch_provider_models
from aisbf.routes.auth import require_dashboard_auth, require_api_auth, require_api_admin, require_admin from aisbf.routes.auth import require_dashboard_auth, require_api_auth, require_api_admin, require_admin
from aisbf.providers.runpod import RunpodProviderHandler
import httpx import httpx
router = APIRouter() router = APIRouter()
...@@ -116,6 +117,55 @@ def _ensure_coderai_token(provider_config: dict) -> dict: ...@@ -116,6 +117,55 @@ def _ensure_coderai_token(provider_config: dict) -> dict:
return stamped return stamped
def _normalize_runpod_provider_config(provider_id: str, provider_config: dict) -> dict:
stamped = dict(provider_config or {})
if stamped.get('type') != 'runpod':
return stamped
runpod_config = stamped.get('runpod_config')
if not isinstance(runpod_config, dict):
runpod_config = {}
mode = str(runpod_config.get('mode') or 'pod').strip().lower()
wrapper_mode = str(runpod_config.get('wrapper_mode') or 'openai').strip().lower()
runpod_config['mode'] = mode
runpod_config['management_api'] = str(runpod_config.get('management_api') or 'auto').strip().lower() or 'auto'
runpod_config['account_name'] = str(runpod_config.get('account_name') or provider_id).strip() or provider_id
runpod_config['startup_poll_interval_ms'] = int(runpod_config.get('startup_poll_interval_ms') or 3000)
runpod_config['startup_timeout_ms'] = int(runpod_config.get('startup_timeout_ms') or 300000)
runpod_config['idle_shutdown_ms'] = int(runpod_config.get('idle_shutdown_ms') or 900000)
runpod_config['public_endpoint_protocol_default'] = str(runpod_config.get('public_endpoint_protocol_default') or 'auto').strip().lower() or 'auto'
if mode == 'public':
public_models = runpod_config.get('public_models')
if not isinstance(public_models, dict):
runpod_config['public_models'] = {}
else:
runpod_config['wrapper_mode'] = wrapper_mode
stamped['runpod_config'] = runpod_config
if not stamped.get('endpoint'):
stamped['endpoint'] = 'https://rest.runpod.io/v1'
return stamped
def _validate_runpod_provider_config(provider_id: str, provider_config: dict) -> None:
if not isinstance(provider_config, dict) or provider_config.get('type') != 'runpod':
return
runpod_config = provider_config.get('runpod_config') or {}
mode = str(runpod_config.get('mode') or 'pod').strip().lower()
if mode not in {'pod', 'serverless_template', 'public'}:
raise ValueError(f"RunPod provider '{provider_id}' has unsupported mode '{mode}'")
if mode != 'public':
wrapper_mode = str(runpod_config.get('wrapper_mode') or 'openai').strip().lower()
if wrapper_mode not in {'openai', 'ollama', 'coderai'}:
raise ValueError(f"RunPod provider '{provider_id}' has unsupported wrapper_mode '{wrapper_mode}'")
if mode == 'pod' and not str(runpod_config.get('pod_id') or '').strip():
raise ValueError(f"RunPod provider '{provider_id}' requires runpod_config.pod_id in pod mode")
if mode == 'serverless_template' and not (str(runpod_config.get('endpoint_id') or '').strip() or str(runpod_config.get('serverless_template_id') or '').strip() or str(runpod_config.get('template_id') or '').strip()):
raise ValueError(f"RunPod provider '{provider_id}' requires endpoint_id or template_id in serverless_template mode")
def _validate_coderai_provider_config(provider_id: str, provider_config: dict) -> None: def _validate_coderai_provider_config(provider_id: str, provider_config: dict) -> None:
if not isinstance(provider_config, dict) or provider_config.get('type') != 'coderai': if not isinstance(provider_config, dict) or provider_config.get('type') != 'coderai':
return return
...@@ -189,6 +239,34 @@ def _apply_usage_disable(db, user_id, provider_id: str, usage_data: dict): ...@@ -189,6 +239,34 @@ def _apply_usage_disable(db, user_id, provider_id: str, usage_data: dict):
pass pass
def _resolve_dashboard_provider_config(request: Request, provider_id: str) -> tuple[dict, Optional[int]]:
current_user_id = request.session.get('user_id')
db = DatabaseRegistry.get_config_database()
if current_user_id is None:
provider = _config.providers.get(provider_id) if _config else None
if provider is None:
raise HTTPException(status_code=404, detail="Provider not found")
if hasattr(provider, "model_dump"):
return provider.model_dump(), None
if hasattr(provider, "dict"):
return provider.dict(), None
return dict(provider), None
provider_row = db.get_user_provider(current_user_id, provider_id)
if not provider_row:
raise HTTPException(status_code=404, detail="Provider not found")
return dict(provider_row.get("config") or {}), current_user_id
def _build_dashboard_runpod_handler(request: Request, provider_id: str) -> RunpodProviderHandler:
provider_config, owner_user_id = _resolve_dashboard_provider_config(request, provider_id)
if provider_config.get("type") != "runpod":
raise HTTPException(status_code=404, detail="RunPod provider not found")
api_key = provider_config.get("api_key")
return RunpodProviderHandler(provider_id, api_key=api_key, user_id=owner_user_id, provider_config=provider_config)
@router.get("/dashboard", response_class=HTMLResponse) @router.get("/dashboard", response_class=HTMLResponse)
async def dashboard_index(request: Request): async def dashboard_index(request: Request):
"""Dashboard overview page""" """Dashboard overview page"""
...@@ -628,7 +706,9 @@ async def dashboard_providers_save(request: Request, config: str = Form(...)): ...@@ -628,7 +706,9 @@ async def dashboard_providers_save(request: Request, config: str = Form(...)):
# Apply defaults: if condense_method is set but condense_context is not, default to 80 # Apply defaults: if condense_method is set but condense_context is not, default to 80
for provider_key, provider in providers_data.items(): for provider_key, provider in providers_data.items():
provider = _ensure_coderai_token(provider) provider = _ensure_coderai_token(provider)
provider = _normalize_runpod_provider_config(provider_key, provider)
_validate_coderai_provider_config(provider_key, provider) _validate_coderai_provider_config(provider_key, provider)
_validate_runpod_provider_config(provider_key, provider)
if 'models' in provider and isinstance(provider['models'], list): if 'models' in provider and isinstance(provider['models'], list):
for model in provider['models']: for model in provider['models']:
if 'condense_method' in model and model.get('condense_method'): if 'condense_method' in model and model.get('condense_method'):
...@@ -961,6 +1041,41 @@ async def search_provider_models_api(request: Request, provider_id: str, query: ...@@ -961,6 +1041,41 @@ async def search_provider_models_api(request: Request, provider_id: str, query:
return JSONResponse({"models": models[:200], "fetched_live": fetched_live}) return JSONResponse({"models": models[:200], "fetched_live": fetched_live})
@router.get("/dashboard/providers/{provider_id}/runpod-status")
async def api_runpod_provider_status(provider_id: str, request: Request):
auth_check = require_dashboard_auth(request)
if auth_check:
return JSONResponse({"success": False, "error": "Not authenticated"}, status_code=401)
try:
handler = _build_dashboard_runpod_handler(request, provider_id)
return JSONResponse({"success": True, "status": handler.build_runtime_status()})
except HTTPException as exc:
return JSONResponse({"success": False, "error": exc.detail}, status_code=exc.status_code)
except Exception as exc:
return JSONResponse({"success": False, "error": str(exc)}, status_code=500)
@router.post("/dashboard/providers/{provider_id}/runpod-refresh")
async def api_runpod_provider_refresh(provider_id: str, request: Request):
auth_check = require_dashboard_auth(request)
if auth_check:
return JSONResponse({"success": False, "error": "Not authenticated"}, status_code=401)
try:
handler = _build_dashboard_runpod_handler(request, provider_id)
catalog = await handler.refresh_public_catalog()
return JSONResponse({
"success": True,
"catalog_count": len(catalog),
"status": handler.build_runtime_status(),
})
except HTTPException as exc:
return JSONResponse({"success": False, "error": exc.detail}, status_code=exc.status_code)
except Exception as exc:
return JSONResponse({"success": False, "error": str(exc)}, status_code=500)
@router.get("/dashboard/search-all-models") @router.get("/dashboard/search-all-models")
async def search_all_models_api(request: Request, query: str = "", refresh: bool = False): async def search_all_models_api(request: Request, query: str = "", refresh: bool = False):
"""Return all available models (rotations + provider models) for autoselect, with optional live refresh.""" """Return all available models (rotations + provider models) for autoselect, with optional live refresh."""
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment