Commits · de9a6cdc51e96f69205f320e27236bc4971c7d35 · nexlab / coderai

17 Mar, 2026 26 commits
- Re-add image_model property to MultiModelManager · de9a6cdc
  Your Name authored Mar 17, 2026
  
  de9a6cdc
- Add config attribute to MultiModelManager · e06dba80
  Your Name authored Mar 17, 2026
  
  e06dba80
- Add image_model property to MultiModelManager · 31a50c6e
  Your Name authored Mar 17, 2026
  
  31a50c6e
- Fix generate() method signature to match base class · 2a5f2bf6
  Your Name authored Mar 17, 2026
```
Now accepts positional args: max_tokens, temperature, top_p, stop
```
  2a5f2bf6
- Fix ChatMessage Pydantic object handling in format_messages · 280b91c3
  Your Name authored Mar 17, 2026
```
Convert ChatMessage objects to dicts before applying chat template.
```
  280b91c3
- Add missing get_resolved_model_name import · cc5500a1
  Your Name authored Mar 17, 2026
  
  cc5500a1
- Add missing get_model_family import from codai.models.utils · 7019837a
  Your Name authored Mar 17, 2026
  
  7019837a
- Use GGUF model's built-in chat template first · 563f878e
  Your Name authored Mar 17, 2026
```
Now detects and uses the built-in chat template from GGUF files
loaded via llama-cpp-python before falling back to manual formatting.
```
  563f878e
- Reduce debug verbosity for tokenizer loading · e2deefb8
  Your Name authored Mar 17, 2026
  
  e2deefb8
- Fix GGUF model loading from HuggingFace repos · c4182620
  Your Name authored Mar 17, 2026
```
Now detects GGUF model repos (e.g., unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF)
and lists available GGUF files before downloading.

Prefers Q4_K_M or Q4_K quantizations when available.
```
  c4182620
- Fix VulkanBackend async/sync method signatures · 289a58f7
  Your Name authored Mar 17, 2026
```
Fixed load_model and generate to be non-async methods (matching base class):
- load_model: changed from async def returning bool to def returning None
- generate: changed from async def to def (removed streaming support in sync version)
- Removed 'stream' parameter from generate since it's now sync
- chat: changed from async def to def
- generate_stream remains async def (correct for streaming)
```
  289a58f7
- Fix VulkanBackend missing abstract methods · 660bce2d
  Your Name authored Mar 17, 2026
```
Added:
- get_model_name()
- format_messages()
- cleanup()

These were required by the ModelBackend abstract base class.
```
  660bce2d
- Remove all duplicate class definitions from coderai · a6907dd6
  Your Name authored Mar 17, 2026
```
Removed ~2050 lines of duplicate code:
- Pydantic models (ToolFunction, Tool, ChatMessage, etc.) - now from codai.pydantic
- ModelParserAdapter, ToolCallParser - now from codai.models
- NvidiaBackend, VulkanBackend - now from codai.backends
- All other duplicates removed

Now coderai properly imports all classes from codai modules.
```
  a6907dd6
- Remove duplicate class/function definitions from coderai · df366b63
  Your Name authored Mar 17, 2026
```
Removed ~1500 lines of duplicate code that now exist in codai modules:
- ModelCapabilities, detect_model_capabilities (now in codai.models.capabilities)
- Cache functions (now in codai.models.cache)
- detect_available_backends, check_flash_attn_availability (now in codai.backends)
- ModelBackend abstract class (now in codai.backends.base)
- ModelManager, WhisperServerManager, MultiModelManager (now in codai.models.manager)
- QueueManager (now in codai.queue.manager)
- Utility functions (now in codai.models.utils)

The code now properly imports from codai modules instead of having inline duplicates.
```
  df366b63
- Update codai/models/utils.py with full implementations · 9299c34f
  Your Name authored Mar 17, 2026
```
- Added complete check_hf_chat_template with global_args support
- Added complete get_resolved_model_name
- Added complete get_model_family with more model families
- Added complete get_reasoning_stop_tokens for more model families
- Added complete get_reasoning_system_prompt
- Added set_global_args and get_global_args for configuration
```
  9299c34f
- Add imports for ModelCapabilities and cache functions from codai modules · add2ecd1
  Your Name authored Mar 17, 2026
  
  add2ecd1
- Refactor: Move backend and manager classes to codai modules · 81c39eb8
  Your Name authored Mar 17, 2026
```
- Move NvidiaBackend to codai/backends/cuda.py
- Move VulkanBackend to codai/backends/vulkan.py
- Move ModelManager, WhisperServerManager, MultiModelManager to codai/models/manager.py
- Move QueueManager to codai/queue/manager.py
- Add proper exports in codai/backends/__init__.py
- Update imports in coderai to use new modules
- Fix import paths for base class and cache functions
```
  81c39eb8
- Revert to working version from commit 001e1708 · 7c6b60f0
  Your Name authored Mar 17, 2026
  
  7c6b60f0
- Fix get_reasoning_stop_tokens to return 3 values · e7f781f3
  Your Name authored Mar 17, 2026
  
  e7f781f3
- Fix VulkanBackend to accept original_backend parameter · 8e072ebb
  Your Name authored Mar 17, 2026
  
  8e072ebb
- Add full ModelManager and MultiModelManager implementations · 059999f7
  Your Name authored Mar 17, 2026
  
  059999f7
- Fix missing model_manager and queue_manager initialization · 020f4f6d
  Your Name authored Mar 17, 2026
  
  020f4f6d
- Refactor: Move QueueManager to codai/queue/manager and restore FastAPI app · 989f1858
  Your Name authored Mar 17, 2026
  
  989f1858
- Remove --reply-filters option, always apply malformed and tool_calls filters · 001e1708
  Your Name authored Mar 17, 2026
  
  001e1708
- Fix model name in response: resolve aliases, extract filename from URLs, add coderai/ prefix · 8cc18c40
  Your Name authored Mar 17, 2026
  
  8cc18c40
- Add debug output for model input in both streaming and non-streaming modes · 0653c58a
  Your Name authored Mar 17, 2026
  
  0653c58a
16 Mar, 2026 14 commits

Fix UnboundLocalError for stop_sequences in reasoning logic · 3890a849
Your Name authored Mar 16, 2026

3890a849

Enhance --force-reasoning with stop/inject options and add reasoning extraction · ef03dee8

Your Name authored Mar 16, 2026

- Added --force-reasoning with choices: 'stop', 'inject', 'both' (default)
- Add model-family detection for reasoning stop tokens
- Get appropriate stop tokens for Qwen, DeepSeek, Llama3, Mistral, Gemma, Hermes/Yi
- Add system prompt injection for forcing reasoning on non-native models
- Add extract_reasoning_content() function to parsers for extracting thinking tags

ef03dee8

Add --force-reasoning CLI flag for reasoning/thinking mode · ed8397a0

Your Name authored Mar 16, 2026

- Added --force-reasoning argument to enable reasoning mode for models
  that support it (Qwen3, DeepSeek R1, etc.)
- Modified chat_completions endpoint to check both API parameter
  enable_thinking and CLI flag force_reasoning
- When either is true, injects agentic template to enable thinking

ed8397a0

Add enable_thinking parameter to chat completion · 11526eee

Your Name authored Mar 16, 2026

- Add enable_thinking parameter to ChatCompletionRequest
- When enable_thinking=True, inject agentic system prompt to force thinking/reasoning
- Uses AgenticTemplateManager to inject thought tags for supported models

11526eee

Revert reasoning changes - fixing indentation error · b3e5d314
Your Name authored Mar 16, 2026

b3e5d314

Add reasoning/thinking extraction and forced reasoning support · 1a6467ca

Your Name authored Mar 16, 2026

- Add --force-reasoning CLI flag to force thinking mode for models like qwen3 coder
- Add check_force_reasoning() function to determine if reasoning should be forced
- Modify QwenParser to extract thinking/reasoning content instead of stripping it
- Add reasoning field to response message in non-streaming chat completions
- Prepend reasoning content to generated text in streaming responses
- Update OpenAIFormatter to include reasoning in response when available

1a6467ca

Fix openaiformatter import · b1e402b9
Your Name authored Mar 16, 2026

b1e402b9
Combine parsers module into parser.py · fee96eb2
Your Name authored Mar 16, 2026

fee96eb2
Add backward compatibility methods for format_litellm_full and format_litellm_chunk · 63c4c8a4
Your Name authored Mar 16, 2026

63c4c8a4

Refactor OpenAIFormatter to use litellm models directly · 203f97e0

Your Name authored Mar 16, 2026

- Simplify OpenAIFormatter by using litellm's ModelResponse and ChatCompletionChunk directly
- Add fallback support for when litellm is not available or fails
- Maintain compatibility with existing API
- Remove redundant format_litellm_full and format_litellm_chunk methods

203f97e0

Fix UnboundLocalError for StreamingResponse in chat_completions · 70a6cfe1

Your Name authored Mar 16, 2026

The issue was caused by importing StreamingResponse and JSONResponse inside
the chat_completions function. In Python, when you have an import statement
anywhere inside a function, it creates a local variable for that name
throughout the entire function scope. This caused the code in the original
implementation path to fail because Python saw StreamingResponse as an
unassigned local variable.

Fix: Move StreamingResponse and JSONResponse imports to module level and
remove redundant imports from inside the function.

70a6cfe1

Fix LiteLLM · 6c2c0afc
Your Name authored Mar 16, 2026

6c2c0afc

Fix OpenAIFormatter to not rely on litellm imports · 8280060e

Your Name authored Mar 16, 2026

The litellm library doesn't export Delta, Choices, etc. directly.
Rewrote the formatter to build response dictionaries directly.

8280060e

Remove litellm imports from codai module · 98a48640
Your Name authored Mar 16, 2026

98a48640