- 16 Mar, 2026 40 commits
-
-
Your Name authored
-
Your Name authored
-
Your Name authored
- Simplify OpenAIFormatter by using litellm's ModelResponse and ChatCompletionChunk directly - Add fallback support for when litellm is not available or fails - Maintain compatibility with existing API - Remove redundant format_litellm_full and format_litellm_chunk methods
-
Your Name authored
The issue was caused by importing StreamingResponse and JSONResponse inside the chat_completions function. In Python, when you have an import statement anywhere inside a function, it creates a local variable for that name throughout the entire function scope. This caused the code in the original implementation path to fail because Python saw StreamingResponse as an unassigned local variable. Fix: Move StreamingResponse and JSONResponse imports to module level and remove redundant imports from inside the function.
-
Your Name authored
-
Your Name authored
The litellm library doesn't export Delta, Choices, etc. directly. Rewrote the formatter to build response dictionaries directly.
-
Your Name authored
-
Your Name authored
- Remove the --parser argument and litellm backend handling code - Add OpenAIFormatter class in codai/models/parsers.py for final response sanitization - Integrate formatter into both streaming and non-streaming response paths - Use litellm's ModelResponse and ChatCompletionChunk for proper OpenAI format
-
Your Name authored
-
Your Name authored
- Changed model name format from openai/... to coderai/... - This ensures the model is correctly identified as coderai/TeichAI/Qwen3-8B-...
-
Your Name authored
- Use litellm.openai instead of 'openai' string for custom_handler - This ensures proper registration of the coderai provider with litellm
-
Your Name authored
- Changed model name format from openai/... to coderai/... - Added litellm.custom_provider_map to map coderai to openai handler - This allows litellm to use its internal HTTP handler for custom providers - Example: TeichAI/Qwen3-8B-... now becomes coderai/TeichAI/Qwen3-8B-...
-
Your Name authored
- Instead of defaulting to 'huggingface' for org/model paths, now preserves the original org name as the provider - Example: TeichAI/Qwen3-8B-... now becomes openai/TeichAI/Qwen3-8B-... instead of openai/huggingface/TeichAI/Qwen3-8B-...
-
Your Name authored
- Add logic to set api_base to server's own URL for non-Ollama models - Extract host/port from request headers (X-Forwarded-For, Host header) - Determine protocol (http/https) based on global_args - Include debug output showing the determined api_base - This ensures litellm can properly connect to local server when using litellm backend with local models
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
- When using HuggingFace inference endpoints, set api_key to None to avoid auth errors
-
Your Name authored
- When model starts with 'ollama:', construct api_base from request host and port - api_base is now passed to LiteLLMBackend for local connections
-
Your Name authored
- Don't check environment for OPENAI_API_KEY - Use fake key directly in LiteLLMBackend if no key passed
-
Your Name authored
- If no API key is provided in request, use a fake key to allow litellm to proceed - Check both request body and Authorization header for API key
-
Your Name authored
- Add tool_parser parameter to litellm backend calls in coderai endpoint - ModelParserAdapter now passed to both streaming and non-streaming calls - Enables model-specific tool call parsing for external models via litellm
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
- Add model_manager parameter to LiteLLMBackend for alias resolution - Add _resolve_model_alias() method to handle default, image, audio, tts aliases - Update get_litellm_backend() to pass model_manager - Update coderai call site to pass multi_model_manager Now --parser litellm will resolve aliases like 'default', 'image' to actual model names before normalizing for litellm.
-
Your Name authored
- Add method to normalize model names for litellm - Maps common model patterns to providers (gpt-* -> openai/, llama -> meta/, etc.) - Falls back to openai/ for unknown models
-
Your Name authored
-
Your Name authored
Created codai/models/cache/__init__.py with: - get_model_cache_dir() - get_all_cache_dirs() - get_cached_model_path() - is_huggingface_model_id() - download_huggingface_model() - download_model() - list_cached_models() - remove_cached_model() - remove_all_cached_models() This extracts the cache-related functionality into a separate module.
-
Your Name authored
- Rename codai/litellm_backend.py to codai/openai/litellm.py - Create codai/openai/__init__.py - Update imports in coderai and codai/__init__.py
-
Your Name authored
- Add litellm to requirements.txt - Add --parser CLI arg (auto/litellm, default auto) - Create codai/litellm_backend.py module with: - LiteLLMBackend class for standardized responses - Rate limit headers (x-ratelimit-remaining-tokens, x-ratelimit-limit-tokens) - Qwen tool-call resilience (parse <tool> and <tool_call> tags) - Error handling with litellm exception mapping - Update chat completions endpoint to use litellm when --parser litellm - Update codai/__init__.py to export litellm components
-
Your Name authored
- Added litellm>=1.40.0 to requirements.txt - Added --parser argument (auto/litellm, default auto) Note: Full litellm integration requires significant refactoring of the chat completion endpoints to use litellm.completion() for standardized responses, adding rate limit headers, and error handling.
-
Your Name authored
QwenParser: - Add repetition guard to handle looping models - Improve flexible tag matching for tool/tool_call/function_call - Add JSON recovery for unclosed JSON - Add circuit breaker after first valid call - Support <call=name> in coder style fallback API: - Add repeat_penalty parameter to ChatCompletionRequest - Add repeat_penalty parameter to CompletionRequest
-
Your Name authored
- Added pre-cleaning for thinking/special tokens - Unified tag matching for both <tool> and <tool_call> - Added markdown code block stripping inside tags - Added lazy JSON parsing fallback - Added _parse_coder_style() and _relaxed_val() helper methods
-
Your Name authored
- Added _clean_json_string() method to BaseParser for cleaning JSON strings - Updated QwenParser.parse() with 3-step parsing strategy: 1. Qwen format: <tool=func_name>...</tool> 2. JSON format with flexible tag matching 3. Fallback coder style with parameter tags - Fixed syntax issues in the module
-