Commits · 95e11455a8659b3eea2dd68b569dc261b3aaa174 · nexlab / coderai

16 Mar, 2026 40 commits

Use coderai provider with litellm custom_provider_map · 95e11455

Your Name authored Mar 16, 2026

- Changed model name format from openai/... to coderai/...
- Added litellm.custom_provider_map to map coderai to openai handler
- This allows litellm to use its internal HTTP handler for custom providers
- Example: TeichAI/Qwen3-8B-... now becomes coderai/TeichAI/Qwen3-8B-...

95e11455

Fix model name normalization to preserve org name · 87acdc45

Your Name authored Mar 16, 2026

- Instead of defaulting to 'huggingface' for org/model paths,
  now preserves the original org name as the provider
- Example: TeichAI/Qwen3-8B-... now becomes openai/TeichAI/Qwen3-8B-...
  instead of openai/huggingface/TeichAI/Qwen3-8B-...

87acdc45

Fix litellm api_base for non-Ollama models · df7875e3

Your Name authored Mar 16, 2026

- Add logic to set api_base to server's own URL for non-Ollama models
- Extract host/port from request headers (X-Forwarded-For, Host header)
- Determine protocol (http/https) based on global_args
- Include debug output showing the determined api_base
- This ensures litellm can properly connect to local server when using litellm backend with local models

df7875e3

Add debug output for api_key and api_base in litellm · 6b297f18
Your Name authored Mar 16, 2026

6b297f18
Add debug output for api_key and api_base in litellm · be41efbd
Your Name authored Mar 16, 2026

be41efbd
Fix fake key · a52a56e7
Your Name authored Mar 16, 2026

a52a56e7
Fix litellm debug mode check to use runtime global_debug · e654c9f4
Your Name authored Mar 16, 2026

e654c9f4
Enable litellm debug mode when --debug flag is set · c52d2e2b
Your Name authored Mar 16, 2026

c52d2e2b
Fix: Move HuggingFace fake key check after use_model is defined · 04b971a4
Your Name authored Mar 16, 2026

04b971a4
Use sk-fakekey format for HuggingFace fake key · 711041e0
Your Name authored Mar 16, 2026

711041e0
Use fake key for HuggingFace models instead of None · f098e307
Your Name authored Mar 16, 2026

f098e307
Skip API key for HuggingFace models in litellm · 275ef90d
Your Name authored Mar 16, 2026
```
- When using HuggingFace inference endpoints, set api_key to None to avoid auth errors
```
275ef90d

Pass api_base to litellm for local model connections · a93b69cf

Your Name authored Mar 16, 2026

- When model starts with 'ollama:', construct api_base from request host and port
- api_base is now passed to LiteLLMBackend for local connections

a93b69cf

Fix litellm to generate fake API key if none provided · c723cf43

Your Name authored Mar 16, 2026

- Don't check environment for OPENAI_API_KEY
- Use fake key directly in LiteLLMBackend if no key passed

c723cf43

Add fake API key fallback for litellm backend · eb9e8d6a

Your Name authored Mar 16, 2026

- If no API key is provided in request, use a fake key to allow litellm to proceed
- Check both request body and Authorization header for API key

eb9e8d6a

Integrate model_parser module with LiteLLM backend · c1e71237

Your Name authored Mar 16, 2026

- Add tool_parser parameter to litellm backend calls in coderai endpoint
- ModelParserAdapter now passed to both streaming and non-streaming calls
- Enables model-specific tool call parsing for external models via litellm

c1e71237

Always use openai/{provider}/{model} format with coderai default · 0ab10131
Your Name authored Mar 16, 2026

0ab10131
Fix HuggingFace org/model path detection in LiteLLM normalization · c8cc0048
Your Name authored Mar 16, 2026

c8cc0048
Add debug logging to LiteLLM alias resolution · 26b84ba4
Your Name authored Mar 16, 2026

26b84ba4

Resolve model aliases in LiteLLM backend · cd1040bb

Your Name authored Mar 16, 2026

- Add model_manager parameter to LiteLLMBackend for alias resolution
- Add _resolve_model_alias() method to handle default, image, audio, tts aliases
- Update get_litellm_backend() to pass model_manager
- Update coderai call site to pass multi_model_manager

Now --parser litellm will resolve aliases like 'default', 'image' to actual model names before normalizing for litellm.

cd1040bb

Add normalize_model_name() to litellm backend · eed5a3ff

Your Name authored Mar 16, 2026

- Add method to normalize model names for litellm
- Maps common model patterns to providers (gpt-* -> openai/, llama -> meta/, etc.)
- Falls back to openai/ for unknown models

eed5a3ff

Fix: Move ERROR_CODE_MAP inside try block to handle litellm import error gracefully · 71e521ff
Your Name authored Mar 16, 2026

71e521ff

Create codai.models.cache module for model caching · ae1820e5

Your Name authored Mar 16, 2026

Created codai/models/cache/__init__.py with:
- get_model_cache_dir()
- get_all_cache_dirs()
- get_cached_model_path()
- is_huggingface_model_id()
- download_huggingface_model()
- download_model()
- list_cached_models()
- remove_cached_model()
- remove_all_cached_models()

This extracts the cache-related functionality into a separate module.

ae1820e5

Restructure: Move litellm to codai.openai.litellm · f0cf14d4

Your Name authored Mar 16, 2026

- Rename codai/litellm_backend.py to codai/openai/litellm.py
- Create codai/openai/__init__.py
- Update imports in coderai and codai/__init__.py

f0cf14d4

Implement LiteLLM integration for OpenAI-compatible /v1/chat/completions · 39f8696e

Your Name authored Mar 16, 2026

- Add litellm to requirements.txt
- Add --parser CLI arg (auto/litellm, default auto)
- Create codai/litellm_backend.py module with:
  - LiteLLMBackend class for standardized responses
  - Rate limit headers (x-ratelimit-remaining-tokens, x-ratelimit-limit-tokens)
  - Qwen tool-call resilience (parse <tool> and <tool_call> tags)
  - Error handling with litellm exception mapping
- Update chat completions endpoint to use litellm when --parser litellm
- Update codai/__init__.py to export litellm components

39f8696e

Add --parser CLI arg and litellm dependency for future integration · 7ec43f73

Your Name authored Mar 16, 2026

- Added litellm>=1.40.0 to requirements.txt
- Added --parser argument (auto/litellm, default auto)

Note: Full litellm integration requires significant refactoring of the
chat completion endpoints to use litellm.completion() for standardized
responses, adding rate limit headers, and error handling.

7ec43f73

Improve QwenParser with repetition guard and add repeat_penalty to API · 9e9febbd

Your Name authored Mar 16, 2026

QwenParser:
- Add repetition guard to handle looping models
- Improve flexible tag matching for tool/tool_call/function_call
- Add JSON recovery for unclosed JSON
- Add circuit breaker after first valid call
- Support <call=name> in coder style fallback

API:
- Add repeat_penalty parameter to ChatCompletionRequest
- Add repeat_penalty parameter to CompletionRequest

9e9febbd

Improve QwenParser with cleaner parsing logic and coder style fallback · 433eb3ee

Your Name authored Mar 16, 2026

- Added pre-cleaning for thinking/special tokens
- Unified tag matching for both <tool> and <tool_call>
- Added markdown code block stripping inside tags
- Added lazy JSON parsing fallback
- Added _parse_coder_style() and _relaxed_val() helper methods

433eb3ee

Update QwenParser with improved parsing and add _clean_json_string helper · 73d1c77c

Your Name authored Mar 16, 2026

- Added _clean_json_string() method to BaseParser for cleaning JSON strings
- Updated QwenParser.parse() with 3-step parsing strategy:
  1. Qwen format: <tool=func_name>...</tool>
  2. JSON format with flexible tag matching
  3. Fallback coder style with parameter tags
- Fixed syntax issues in the module

73d1c77c

Restructure: Move parser to codai.models, add templates, update imports · 0c504a0b
Your Name authored Mar 16, 2026

0c504a0b
Add Qwen format stripping in strip_tool_calls_from_content · 3fd46920
Your Name authored Mar 16, 2026

3fd46920
Force manual formatting when tools are present to avoid Jinja errors · 544896de
Your Name authored Mar 16, 2026

544896de
Add debug output to QwenParser · 562d5df5
Your Name authored Mar 16, 2026

562d5df5
Fix regex to handle </tool_call> closing tag · 861f8741
Your Name authored Mar 16, 2026

861f8741
Fix QwenParser to handle <tool=func_name> format · 36da321b
Your Name authored Mar 16, 2026

36da321b
Fix _to_oa to return OpenAI format with 'function' key · 886b4c2d
Your Name authored Mar 16, 2026

886b4c2d
Add debug output to model_parser showing model_name and selected parser · 55ebbc3b
Your Name authored Mar 16, 2026

55ebbc3b

Fix tool parsing error with improved error handling · 4ee69261

Your Name authored Mar 16, 2026

- Handle both dict and pydantic model formats for tools
- Add try/except around tool conversion and extraction
- More robust error handling to prevent 500 errors

4ee69261

Integrate model_parser module as codai package · 82ee7353

Your Name authored Mar 16, 2026

- Move model_parser.py into codai/ directory
- Add __init__.py to make it a proper Python module
- Create ModelParserAdapter class to wrap ModelParserDispatcher
- Replace ToolCallParser() with ModelParserAdapter() in 4 locations
- Update import to use 'from codai import ModelParserDispatcher'

This enables model-specific tool call parsing for Qwen, DeepSeek,
Llama, Mistral, Claude, Command R, Gemma, Grok, and Phi models.

82ee7353

Add Qwen model tool call parsing support · fbb6476e

Your Name authored Mar 16, 2026

- Add Qwen-specific tool call parsing in ToolCallParser
- Support for Instruct-style: <tool_call>{JSON}</tool_call>
- Support for Coder-style: <tool_call><function=name><parameter=k>v</parameter></function></tool_call>
- Add model_name attribute to ToolCallParser for model-specific parsing
- Update ModelManager.load_model to set model name on tool parser
- Fix duplicate method definitions in ToolCallParser class

fbb6476e