feat: add RunPod provider runtime management

parent 3156c83c
RunPod implementation recovery plan for next session.
Goal
- Add a new provider type `runpod`
- Support multiple RunPod accounts by allowing multiple AISBF providers of type `runpod`
- Support two modes:
- pod-backed/serverless-backed wrapper provider with one wrapper mode per provider: `openai`, `coderai`, or `ollama`
- `runpod_public` provider represented as one AISBF provider with many discovered models/endpoints
- Auto-start stopped pods on request and wait until ready
- Cache pod/endpoint status in DB/cache so behavior is consistent across multiple AISBF instances
- Stop idle pods after configurable inactivity
- Allow serverless endpoint template usage as an alternative to pod-backed mode
Product decisions already made
- Scope: full lifecycle now
- Wrapper mode:
- pod-backed `runpod` providers store one wrapper mode per provider
- `runpod_public` auto-detects protocol per discovered model, with optional manual override per model
- Cold start behavior: auto start + wait
- `runpod_public` shape: one provider, many discovered models
- Management API preference: use the most recent/current supported RunPod management API surface between GraphQL and REST/OpenAPI
- Do not hardcode GraphQL if REST/OpenAPI is newer
Critical first step next session
- Verify which RunPod management API is the current supported one:
- inspect current REST/OpenAPI docs/spec
- inspect current GraphQL docs/spec
- use whichever is the newer/current supported API surface
- Then map exact operations for:
- pod status/start/stop
- template lookup/use
- endpoint discovery
- serverless endpoint creation/use
- public endpoint metadata and request format
Docs already identified
- `https://docs.runpod.io/api-reference/overview`
- `https://docs.runpod.io/llms.txt`
- `https://docs.runpod.io/public-endpoints/requests`
- `https://rest.runpod.io/v1/openapi.json`
Implementation map in AISBF
- `aisbf/config.py`
- extend `ProviderConfig` with `runpod_config: Optional[Dict] = None`
- `aisbf/providers/__init__.py`
- register new provider type `runpod`
- new file `aisbf/providers/runpod.py`
- main handler/orchestrator
- `templates/dashboard/providers.html`
- add `runpod` provider type option and config UI
- `aisbf/routes/dashboard/providers.py`
- add any RunPod-specific dashboard actions/status endpoints if needed
- `aisbf/app/model_cache.py`
- integrate caching/refresh for `runpod_public` discovered models
- `aisbf/database.py`
- add persistent lifecycle/runtime state for runpod providers
Planned `runpod_config` structure
Example target shape:
```json
{
"mode": "pod",
"wrapper_mode": "openai",
"account_name": "personal-runpod",
"management_api": "auto",
"idle_shutdown_ms": 900000,
"startup_poll_interval_ms": 3000,
"startup_timeout_ms": 300000,
"pod_id": "abc123",
"template_id": "tmpl_xyz",
"endpoint_id": "",
"serverless_template_id": "",
"public_endpoint_protocol_default": "auto",
"public_models": {
"model-slug": {
"protocol": "openai",
"capabilities": ["chat", "vision"]
}
}
}
```
Modes
- `pod`
- `serverless_template`
- `public`
Wrapper modes for non-public
- `openai`
- `ollama`
- `coderai`
Representation rules
- Non-public runpod providers:
- one wrapper mode per provider
- lifecycle managed by AISBF
- `runpod_public`:
- one provider with many discovered models/endpoints
- protocol auto-detected per model
- optional manual override per model in config
Architecture to implement
1. `RunpodProviderHandler` as orchestrator
- It should handle lifecycle and dispatch, not just protocol forwarding
- Responsibilities:
- load `runpod_config`
- ensure pod/endpoint is ready before forwarding requests
- cache status/discovery
- delegate to existing protocol behavior
2. Delegation model
- For pod/serverless-backed providers:
- once ready, speak protocol based on provider-level `wrapper_mode`
- delegate internally to existing handlers:
- `OpenAIProviderHandler`
- `OllamaProviderHandler`
- `CoderAIProviderHandler`
- For `runpod_public`:
- discover public models/endpoints
- resolve protocol per model
- dispatch request using model-specific protocol behavior
3. Readiness lifecycle
- On request for pod-backed provider:
- read cached status from DB/cache
- if running and endpoint known, reuse
- if stopped, start pod
- poll until ready or timeout
- persist status/ready endpoint back to DB/cache
- On request for serverless-template mode:
- resolve or create usable endpoint from template as configured
- cache endpoint metadata
4. Idle shutdown
- Store persistent last-used timestamps and runtime state in DB
- Add background loop that:
- scans runpod provider state
- if `now - last_used_at > idle_shutdown_ms` and provider is pod-backed and running
- stop the pod
- persist updated status
Database work needed
Add a new table in `aisbf/database.py`, e.g. `runpod_provider_state` with fields like:
- `provider_scope` (`global` / `user`)
- `owner_user_id`
- `provider_id`
- `mode`
- `wrapper_mode`
- `resource_id`
- `resource_kind` (`pod`, `endpoint`, `public`)
- `status`
- `endpoint_url`
- `public_catalog_json`
- `metadata`
- `last_used_at`
- `last_status_sync_at`
- `updated_at`
- unique on `(owner_user_id, provider_id)`
Add helpers:
- `get_runpod_provider_state(...)`
- `save_runpod_provider_state(...)`
- `touch_runpod_provider_state(...)`
- `list_runpod_provider_states(...)`
This DB-backed state is required for:
- round-robin multi-instance consistency
- idle shutdown scanning
- readiness caching
- public endpoint discovery caching
Cache/model discovery work
For `runpod_public` in `aisbf/app/model_cache.py`:
- cache discovered public models
- refresh periodically or on-demand
- store enough metadata per model:
- model id/slug
- protocol
- capabilities
- route base
- request mode (`runsync`, `run`, `status`)
- parameter/schema hints if available
Dashboard work
In `templates/dashboard/providers.html`:
- add provider type option: `runpod`
- add description text for `runpod`
- add UI section for `runpod_config`
- likely fields:
- account label
- mode (`pod`, `serverless_template`, `public`)
- wrapper mode (`openai`, `ollama`, `coderai`) for non-public
- API key field if not top-level
- pod id
- template id
- endpoint id
- serverless template id
- idle shutdown ms
- startup timeout ms
- poll interval ms
- auto-discovery toggle
- per-model protocol override editor for public models
Potential server-side additions in `aisbf/routes/dashboard/providers.py`
- refresh RunPod public discovery
- show RunPod lifecycle status
- optional manual start/stop actions later if useful
Protocol behavior plan
1. Pod-backed `openai`
- after pod ready, delegate to OpenAI-compatible request/model list flow
- endpoint likely `/v1/...`
2. Pod-backed `ollama`
- after pod ready, delegate to Ollama flow
- endpoint likely `/api/...`
3. Pod-backed `coderai`
- after pod ready, delegate to CoderAI flow
- endpoint/path depends on service running in the pod
4. `runpod_public`
- public endpoints are not one uniform protocol
- implement model-level protocol metadata
- auto-detect protocol from endpoint metadata/docs/naming where possible
- allow manual override per model
- request path likely uses `https://api.runpod.ai/v2/<endpoint>/...`
- do not fake this part; implement from verified docs only
Suggested next-session execution order
1. Verify RunPod API contract and choose the current supported management API surface
2. Add `runpod_config` to `aisbf/config.py`
3. Add DB-backed `runpod_provider_state` table and helpers in `aisbf/database.py`
4. Create `aisbf/providers/runpod.py`
5. Register `runpod` in `aisbf/providers/__init__.py`
6. Add idle shutdown background task in startup/background task area
7. Add dashboard UI/config save support in `templates/dashboard/providers.html`
8. Hook `runpod_public` discovery into `aisbf/app/model_cache.py`
9. Validate with compile/tests
Recommended tests to add
- config validation for `runpod_config`
- DB CRUD for `runpod_provider_state`
- lifecycle tests:
- stopped pod -> start called
- running pod -> no start
- idle timeout -> stop called
- public model discovery parsing
- protocol selection:
- public model auto-detect
- public model manual override
- delegation tests:
- `wrapper_mode=openai`
- `wrapper_mode=ollama`
- `wrapper_mode=coderai`
Files already reviewed for this work
- `aisbf/config.py`
- `aisbf/providers/__init__.py`
- `aisbf/providers/openai.py`
- `aisbf/providers/ollama.py`
- `aisbf/providers/coderai.py`
- `aisbf/providers/base.py`
- `aisbf/app/model_cache.py`
- `aisbf/routes/dashboard/providers.py`
- `templates/dashboard/providers.html`
Suggested next-session prompt
"Implement full RunPod provider support for AISBF. First determine whether RunPod REST/OpenAPI or GraphQL is the newer/current supported management API, then use that API for pod lifecycle, endpoint discovery, and template/serverless management. Add a new `runpod` provider type with `runpod_config`, DB-backed lifecycle state, auto-start/wait, idle shutdown, wrapper-mode delegation (`openai`, `ollama`, `coderai`), and `runpod_public` as one provider with many discovered models and per-model protocol auto-detect/manual override. Preserve multi-instance consistency by storing lifecycle state in the database."
This diff is collapsed.
......@@ -43,6 +43,7 @@ from .ollama import OllamaProviderHandler
from .codex import CodexProviderHandler
from .coderai import CoderAIProviderHandler
from .qwen import QwenProviderHandler
from .runpod import RunpodProviderHandler
from ..config import config
......@@ -57,7 +58,8 @@ PROVIDER_HANDLERS = {
'kilocode': KiloProviderHandler, # Kilocode provider with OAuth2 support
'codex': CodexProviderHandler, # Codex provider with OAuth2 support (OpenAI protocol)
'coderai': CoderAIProviderHandler, # CoderAI provider with HTTP/WebSocket bridge support
'qwen': QwenProviderHandler # Qwen provider with OAuth2 support (OpenAI-compatible)
'qwen': QwenProviderHandler, # Qwen provider with OAuth2 support (OpenAI-compatible)
'runpod': RunpodProviderHandler,
}
......
......@@ -29,15 +29,17 @@ from .base import BaseProviderHandler, AISBF_DEBUG
class OllamaProviderHandler(BaseProviderHandler):
def __init__(self, provider_id: str, api_key: Optional[str] = None):
super().__init__(provider_id, api_key)
def __init__(self, provider_id: str, api_key: Optional[str] = None, user_id: Optional[int] = None, provider_config=None):
self.provider_config = provider_config if provider_config is not None else config.providers[provider_id]
super().__init__(provider_id, api_key, user_id=user_id)
timeout = httpx.Timeout(
connect=60.0,
read=300.0,
write=60.0,
pool=60.0
)
self.client = httpx.AsyncClient(base_url=config.providers[provider_id].endpoint, timeout=timeout)
endpoint = self.provider_config.get("endpoint") if isinstance(self.provider_config, dict) else self.provider_config.endpoint
self.client = httpx.AsyncClient(base_url=endpoint, timeout=timeout)
def validate_credentials(self) -> bool:
"""
......
......@@ -30,9 +30,11 @@ from .base import BaseProviderHandler, AISBF_DEBUG
class OpenAIProviderHandler(BaseProviderHandler):
def __init__(self, provider_id: str, api_key: str):
super().__init__(provider_id, api_key)
self.client = OpenAI(base_url=config.providers[provider_id].endpoint, api_key=api_key)
def __init__(self, provider_id: str, api_key: str, user_id: Optional[int] = None, provider_config=None):
self.provider_config = provider_config if provider_config is not None else config.providers[provider_id]
super().__init__(provider_id, api_key, user_id=user_id)
endpoint = self.provider_config.get("endpoint") if isinstance(self.provider_config, dict) else self.provider_config.endpoint
self.client = OpenAI(base_url=endpoint, api_key=api_key)
def validate_credentials(self) -> bool:
"""Validate OpenAI API key presence."""
......
This diff is collapsed.
......@@ -16,6 +16,7 @@ from aisbf.app.startup import _reload_global_config, _apply_condense_defaults_pr
from aisbf.app.middleware import _is_local_client
from aisbf.app.model_cache import fetch_provider_models
from aisbf.routes.auth import require_dashboard_auth, require_api_auth, require_api_admin, require_admin
from aisbf.providers.runpod import RunpodProviderHandler
import httpx
router = APIRouter()
......@@ -116,6 +117,55 @@ def _ensure_coderai_token(provider_config: dict) -> dict:
return stamped
def _normalize_runpod_provider_config(provider_id: str, provider_config: dict) -> dict:
stamped = dict(provider_config or {})
if stamped.get('type') != 'runpod':
return stamped
runpod_config = stamped.get('runpod_config')
if not isinstance(runpod_config, dict):
runpod_config = {}
mode = str(runpod_config.get('mode') or 'pod').strip().lower()
wrapper_mode = str(runpod_config.get('wrapper_mode') or 'openai').strip().lower()
runpod_config['mode'] = mode
runpod_config['management_api'] = str(runpod_config.get('management_api') or 'auto').strip().lower() or 'auto'
runpod_config['account_name'] = str(runpod_config.get('account_name') or provider_id).strip() or provider_id
runpod_config['startup_poll_interval_ms'] = int(runpod_config.get('startup_poll_interval_ms') or 3000)
runpod_config['startup_timeout_ms'] = int(runpod_config.get('startup_timeout_ms') or 300000)
runpod_config['idle_shutdown_ms'] = int(runpod_config.get('idle_shutdown_ms') or 900000)
runpod_config['public_endpoint_protocol_default'] = str(runpod_config.get('public_endpoint_protocol_default') or 'auto').strip().lower() or 'auto'
if mode == 'public':
public_models = runpod_config.get('public_models')
if not isinstance(public_models, dict):
runpod_config['public_models'] = {}
else:
runpod_config['wrapper_mode'] = wrapper_mode
stamped['runpod_config'] = runpod_config
if not stamped.get('endpoint'):
stamped['endpoint'] = 'https://rest.runpod.io/v1'
return stamped
def _validate_runpod_provider_config(provider_id: str, provider_config: dict) -> None:
if not isinstance(provider_config, dict) or provider_config.get('type') != 'runpod':
return
runpod_config = provider_config.get('runpod_config') or {}
mode = str(runpod_config.get('mode') or 'pod').strip().lower()
if mode not in {'pod', 'serverless_template', 'public'}:
raise ValueError(f"RunPod provider '{provider_id}' has unsupported mode '{mode}'")
if mode != 'public':
wrapper_mode = str(runpod_config.get('wrapper_mode') or 'openai').strip().lower()
if wrapper_mode not in {'openai', 'ollama', 'coderai'}:
raise ValueError(f"RunPod provider '{provider_id}' has unsupported wrapper_mode '{wrapper_mode}'")
if mode == 'pod' and not str(runpod_config.get('pod_id') or '').strip():
raise ValueError(f"RunPod provider '{provider_id}' requires runpod_config.pod_id in pod mode")
if mode == 'serverless_template' and not (str(runpod_config.get('endpoint_id') or '').strip() or str(runpod_config.get('serverless_template_id') or '').strip() or str(runpod_config.get('template_id') or '').strip()):
raise ValueError(f"RunPod provider '{provider_id}' requires endpoint_id or template_id in serverless_template mode")
def _validate_coderai_provider_config(provider_id: str, provider_config: dict) -> None:
if not isinstance(provider_config, dict) or provider_config.get('type') != 'coderai':
return
......@@ -189,6 +239,34 @@ def _apply_usage_disable(db, user_id, provider_id: str, usage_data: dict):
pass
def _resolve_dashboard_provider_config(request: Request, provider_id: str) -> tuple[dict, Optional[int]]:
current_user_id = request.session.get('user_id')
db = DatabaseRegistry.get_config_database()
if current_user_id is None:
provider = _config.providers.get(provider_id) if _config else None
if provider is None:
raise HTTPException(status_code=404, detail="Provider not found")
if hasattr(provider, "model_dump"):
return provider.model_dump(), None
if hasattr(provider, "dict"):
return provider.dict(), None
return dict(provider), None
provider_row = db.get_user_provider(current_user_id, provider_id)
if not provider_row:
raise HTTPException(status_code=404, detail="Provider not found")
return dict(provider_row.get("config") or {}), current_user_id
def _build_dashboard_runpod_handler(request: Request, provider_id: str) -> RunpodProviderHandler:
provider_config, owner_user_id = _resolve_dashboard_provider_config(request, provider_id)
if provider_config.get("type") != "runpod":
raise HTTPException(status_code=404, detail="RunPod provider not found")
api_key = provider_config.get("api_key")
return RunpodProviderHandler(provider_id, api_key=api_key, user_id=owner_user_id, provider_config=provider_config)
@router.get("/dashboard", response_class=HTMLResponse)
async def dashboard_index(request: Request):
"""Dashboard overview page"""
......@@ -628,7 +706,9 @@ async def dashboard_providers_save(request: Request, config: str = Form(...)):
# Apply defaults: if condense_method is set but condense_context is not, default to 80
for provider_key, provider in providers_data.items():
provider = _ensure_coderai_token(provider)
provider = _normalize_runpod_provider_config(provider_key, provider)
_validate_coderai_provider_config(provider_key, provider)
_validate_runpod_provider_config(provider_key, provider)
if 'models' in provider and isinstance(provider['models'], list):
for model in provider['models']:
if 'condense_method' in model and model.get('condense_method'):
......@@ -961,6 +1041,41 @@ async def search_provider_models_api(request: Request, provider_id: str, query:
return JSONResponse({"models": models[:200], "fetched_live": fetched_live})
@router.get("/dashboard/providers/{provider_id}/runpod-status")
async def api_runpod_provider_status(provider_id: str, request: Request):
auth_check = require_dashboard_auth(request)
if auth_check:
return JSONResponse({"success": False, "error": "Not authenticated"}, status_code=401)
try:
handler = _build_dashboard_runpod_handler(request, provider_id)
return JSONResponse({"success": True, "status": handler.build_runtime_status()})
except HTTPException as exc:
return JSONResponse({"success": False, "error": exc.detail}, status_code=exc.status_code)
except Exception as exc:
return JSONResponse({"success": False, "error": str(exc)}, status_code=500)
@router.post("/dashboard/providers/{provider_id}/runpod-refresh")
async def api_runpod_provider_refresh(provider_id: str, request: Request):
auth_check = require_dashboard_auth(request)
if auth_check:
return JSONResponse({"success": False, "error": "Not authenticated"}, status_code=401)
try:
handler = _build_dashboard_runpod_handler(request, provider_id)
catalog = await handler.refresh_public_catalog()
return JSONResponse({
"success": True,
"catalog_count": len(catalog),
"status": handler.build_runtime_status(),
})
except HTTPException as exc:
return JSONResponse({"success": False, "error": exc.detail}, status_code=exc.status_code)
except Exception as exc:
return JSONResponse({"success": False, "error": str(exc)}, status_code=500)
@router.get("/dashboard/search-all-models")
async def search_all_models_api(request: Request, query: str = "", refresh: bool = False):
"""Return all available models (rotations + provider models) for autoselect, with optional live refresh."""
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment