Commits · 989f1858f8530020f901371c6d03f5237d9c4c97 · nexlab / coderai

17 Mar, 2026 4 commits
- Refactor: Move QueueManager to codai/queue/manager and restore FastAPI app · 989f1858
  Your Name authored Mar 17, 2026
  
  989f1858
- Remove --reply-filters option, always apply malformed and tool_calls filters · 001e1708
  Your Name authored Mar 17, 2026
  
  001e1708
- Fix model name in response: resolve aliases, extract filename from URLs, add coderai/ prefix · 8cc18c40
  Your Name authored Mar 17, 2026
  
  8cc18c40
- Add debug output for model input in both streaming and non-streaming modes · 0653c58a
  Your Name authored Mar 17, 2026
  
  0653c58a
16 Mar, 2026 36 commits

Fix UnboundLocalError for stop_sequences in reasoning logic · 3890a849
Your Name authored Mar 16, 2026

3890a849

Enhance --force-reasoning with stop/inject options and add reasoning extraction · ef03dee8

Your Name authored Mar 16, 2026

- Added --force-reasoning with choices: 'stop', 'inject', 'both' (default)
- Add model-family detection for reasoning stop tokens
- Get appropriate stop tokens for Qwen, DeepSeek, Llama3, Mistral, Gemma, Hermes/Yi
- Add system prompt injection for forcing reasoning on non-native models
- Add extract_reasoning_content() function to parsers for extracting thinking tags

ef03dee8

Add --force-reasoning CLI flag for reasoning/thinking mode · ed8397a0

Your Name authored Mar 16, 2026

- Added --force-reasoning argument to enable reasoning mode for models
  that support it (Qwen3, DeepSeek R1, etc.)
- Modified chat_completions endpoint to check both API parameter
  enable_thinking and CLI flag force_reasoning
- When either is true, injects agentic template to enable thinking

ed8397a0

Add enable_thinking parameter to chat completion · 11526eee

Your Name authored Mar 16, 2026

- Add enable_thinking parameter to ChatCompletionRequest
- When enable_thinking=True, inject agentic system prompt to force thinking/reasoning
- Uses AgenticTemplateManager to inject thought tags for supported models

11526eee

Revert reasoning changes - fixing indentation error · b3e5d314
Your Name authored Mar 16, 2026

b3e5d314

Add reasoning/thinking extraction and forced reasoning support · 1a6467ca

Your Name authored Mar 16, 2026

- Add --force-reasoning CLI flag to force thinking mode for models like qwen3 coder
- Add check_force_reasoning() function to determine if reasoning should be forced
- Modify QwenParser to extract thinking/reasoning content instead of stripping it
- Add reasoning field to response message in non-streaming chat completions
- Prepend reasoning content to generated text in streaming responses
- Update OpenAIFormatter to include reasoning in response when available

1a6467ca

Fix openaiformatter import · b1e402b9
Your Name authored Mar 16, 2026

b1e402b9
Combine parsers module into parser.py · fee96eb2
Your Name authored Mar 16, 2026

fee96eb2
Add backward compatibility methods for format_litellm_full and format_litellm_chunk · 63c4c8a4
Your Name authored Mar 16, 2026

63c4c8a4

Refactor OpenAIFormatter to use litellm models directly · 203f97e0

Your Name authored Mar 16, 2026

- Simplify OpenAIFormatter by using litellm's ModelResponse and ChatCompletionChunk directly
- Add fallback support for when litellm is not available or fails
- Maintain compatibility with existing API
- Remove redundant format_litellm_full and format_litellm_chunk methods

203f97e0

Fix UnboundLocalError for StreamingResponse in chat_completions · 70a6cfe1

Your Name authored Mar 16, 2026

The issue was caused by importing StreamingResponse and JSONResponse inside
the chat_completions function. In Python, when you have an import statement
anywhere inside a function, it creates a local variable for that name
throughout the entire function scope. This caused the code in the original
implementation path to fail because Python saw StreamingResponse as an
unassigned local variable.

Fix: Move StreamingResponse and JSONResponse imports to module level and
remove redundant imports from inside the function.

70a6cfe1

Fix LiteLLM · 6c2c0afc
Your Name authored Mar 16, 2026

6c2c0afc

Fix OpenAIFormatter to not rely on litellm imports · 8280060e

Your Name authored Mar 16, 2026

The litellm library doesn't export Delta, Choices, etc. directly.
Rewrote the formatter to build response dictionaries directly.

8280060e

Remove litellm imports from codai module · 98a48640
Your Name authored Mar 16, 2026

98a48640

Remove --parser litellm option and add OpenAIFormatter for response sanitization · 076a7724

Your Name authored Mar 16, 2026

- Remove the --parser argument and litellm backend handling code
- Add OpenAIFormatter class in codai/models/parsers.py for final response sanitization
- Integrate formatter into both streaming and non-streaming response paths
- Use litellm's ModelResponse and ChatCompletionChunk for proper OpenAI format

076a7724

fix provider handle · b505de59
Your Name authored Mar 16, 2026

b505de59

Fix model name format to use coderai provider · ca91d563

Your Name authored Mar 16, 2026

- Changed model name format from openai/... to coderai/...
- This ensures the model is correctly identified as coderai/TeichAI/Qwen3-8B-...

ca91d563

Fix litellm custom provider registration · 7b194a45

Your Name authored Mar 16, 2026

- Use litellm.openai instead of 'openai' string for custom_handler
- This ensures proper registration of the coderai provider with litellm

7b194a45

Use coderai provider with litellm custom_provider_map · 95e11455

Your Name authored Mar 16, 2026

- Changed model name format from openai/... to coderai/...
- Added litellm.custom_provider_map to map coderai to openai handler
- This allows litellm to use its internal HTTP handler for custom providers
- Example: TeichAI/Qwen3-8B-... now becomes coderai/TeichAI/Qwen3-8B-...

95e11455

Fix model name normalization to preserve org name · 87acdc45

Your Name authored Mar 16, 2026

- Instead of defaulting to 'huggingface' for org/model paths,
  now preserves the original org name as the provider
- Example: TeichAI/Qwen3-8B-... now becomes openai/TeichAI/Qwen3-8B-...
  instead of openai/huggingface/TeichAI/Qwen3-8B-...

87acdc45

Fix litellm api_base for non-Ollama models · df7875e3

Your Name authored Mar 16, 2026

- Add logic to set api_base to server's own URL for non-Ollama models
- Extract host/port from request headers (X-Forwarded-For, Host header)
- Determine protocol (http/https) based on global_args
- Include debug output showing the determined api_base
- This ensures litellm can properly connect to local server when using litellm backend with local models

df7875e3

Add debug output for api_key and api_base in litellm · 6b297f18
Your Name authored Mar 16, 2026

6b297f18
Add debug output for api_key and api_base in litellm · be41efbd
Your Name authored Mar 16, 2026

be41efbd
Fix fake key · a52a56e7
Your Name authored Mar 16, 2026

a52a56e7
Fix litellm debug mode check to use runtime global_debug · e654c9f4
Your Name authored Mar 16, 2026

e654c9f4
Enable litellm debug mode when --debug flag is set · c52d2e2b
Your Name authored Mar 16, 2026

c52d2e2b
Fix: Move HuggingFace fake key check after use_model is defined · 04b971a4
Your Name authored Mar 16, 2026

04b971a4
Use sk-fakekey format for HuggingFace fake key · 711041e0
Your Name authored Mar 16, 2026

711041e0
Use fake key for HuggingFace models instead of None · f098e307
Your Name authored Mar 16, 2026

f098e307
Skip API key for HuggingFace models in litellm · 275ef90d
Your Name authored Mar 16, 2026
```
- When using HuggingFace inference endpoints, set api_key to None to avoid auth errors
```
275ef90d

Pass api_base to litellm for local model connections · a93b69cf

Your Name authored Mar 16, 2026

- When model starts with 'ollama:', construct api_base from request host and port
- api_base is now passed to LiteLLMBackend for local connections

a93b69cf

Fix litellm to generate fake API key if none provided · c723cf43

Your Name authored Mar 16, 2026

- Don't check environment for OPENAI_API_KEY
- Use fake key directly in LiteLLMBackend if no key passed

c723cf43

Add fake API key fallback for litellm backend · eb9e8d6a

Your Name authored Mar 16, 2026

- If no API key is provided in request, use a fake key to allow litellm to proceed
- Check both request body and Authorization header for API key

eb9e8d6a

Integrate model_parser module with LiteLLM backend · c1e71237

Your Name authored Mar 16, 2026

- Add tool_parser parameter to litellm backend calls in coderai endpoint
- ModelParserAdapter now passed to both streaming and non-streaming calls
- Enables model-specific tool call parsing for external models via litellm

c1e71237

Always use openai/{provider}/{model} format with coderai default · 0ab10131
Your Name authored Mar 16, 2026

0ab10131
Fix HuggingFace org/model path detection in LiteLLM normalization · c8cc0048
Your Name authored Mar 16, 2026

c8cc0048