Commits · 76815ec9280e6cbb56e6a4fe723cdab596e2b14c · nexlab / coderai

17 Mar, 2026 31 commits

feat(templates): Add inject_system and force_reasoning parameters · 76815ec9

Your Name authored Mar 17, 2026

- Add selectable parameters to format_for_raw_completion()
- inject_system: toggle agentic system prompt injection
- force_reasoning: toggle prompt seeding (thought tag)
- Update create_reasoning_prompt() convenience function

76815ec9

feat(templates): Add Prompt Seeding technique for forced reasoning · 0ed2e601

Your Name authored Mar 17, 2026

- Add REASONING_PREFIXES for Big 10 model families (Qwen, Llama3, DeepSeek, etc.)
- Add REASONING_STOP_TOKENS for stopping reasoning generation
- Add force_reasoning_prompt() to construct prompts ending with thought tags
- Add extract_reasoning() to parse reasoning from responses
- Add format_for_raw_completion() and create_reasoning_prompt() convenience functions
- This enables 'token hijacking' to force models to start with reasoning

0ed2e601

Add debug output for flash-attention and force-reasoning mode · b7d84534

Your Name authored Mar 17, 2026

- Enhanced flash attention status output in NvidiaBackend to always show availability
- Added debug output in chat completions endpoint for force-reasoning mode
- Shows CLI flag value, API param, reasoning action, and whether injection was done
- Displays the actual injected system prompt content when debug mode is enabled

b7d84534

Fix deprecation: torch_dtype -> dtype · b49d3f59
Your Name authored Mar 17, 2026

b49d3f59
Add cleanup method to MultiModelManager · ba4ce29f
Your Name authored Mar 17, 2026

ba4ce29f
Re-add image_model property to MultiModelManager · de9a6cdc
Your Name authored Mar 17, 2026

de9a6cdc
Add config attribute to MultiModelManager · e06dba80
Your Name authored Mar 17, 2026

e06dba80
Add image_model property to MultiModelManager · 31a50c6e
Your Name authored Mar 17, 2026

31a50c6e
Fix generate() method signature to match base class · 2a5f2bf6
Your Name authored Mar 17, 2026
```
Now accepts positional args: max_tokens, temperature, top_p, stop
```
2a5f2bf6
Fix ChatMessage Pydantic object handling in format_messages · 280b91c3
Your Name authored Mar 17, 2026
```
Convert ChatMessage objects to dicts before applying chat template.
```
280b91c3
Add missing get_resolved_model_name import · cc5500a1
Your Name authored Mar 17, 2026

cc5500a1
Add missing get_model_family import from codai.models.utils · 7019837a
Your Name authored Mar 17, 2026

7019837a

Use GGUF model's built-in chat template first · 563f878e

Your Name authored Mar 17, 2026

Now detects and uses the built-in chat template from GGUF files
loaded via llama-cpp-python before falling back to manual formatting.

563f878e

Reduce debug verbosity for tokenizer loading · e2deefb8
Your Name authored Mar 17, 2026

e2deefb8

Fix GGUF model loading from HuggingFace repos · c4182620

Your Name authored Mar 17, 2026

Now detects GGUF model repos (e.g., unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF)
and lists available GGUF files before downloading.

Prefers Q4_K_M or Q4_K quantizations when available.

c4182620

Fix VulkanBackend async/sync method signatures · 289a58f7

Your Name authored Mar 17, 2026

Fixed load_model and generate to be non-async methods (matching base class):
- load_model: changed from async def returning bool to def returning None
- generate: changed from async def to def (removed streaming support in sync version)
- Removed 'stream' parameter from generate since it's now sync
- chat: changed from async def to def
- generate_stream remains async def (correct for streaming)

289a58f7

Fix VulkanBackend missing abstract methods · 660bce2d

Your Name authored Mar 17, 2026

Added:
- get_model_name()
- format_messages()
- cleanup()

These were required by the ModelBackend abstract base class.

660bce2d

Remove all duplicate class definitions from coderai · a6907dd6

Your Name authored Mar 17, 2026

Removed ~2050 lines of duplicate code:
- Pydantic models (ToolFunction, Tool, ChatMessage, etc.) - now from codai.pydantic
- ModelParserAdapter, ToolCallParser - now from codai.models
- NvidiaBackend, VulkanBackend - now from codai.backends
- All other duplicates removed

Now coderai properly imports all classes from codai modules.

a6907dd6

Remove duplicate class/function definitions from coderai · df366b63

Your Name authored Mar 17, 2026

Removed ~1500 lines of duplicate code that now exist in codai modules:
- ModelCapabilities, detect_model_capabilities (now in codai.models.capabilities)
- Cache functions (now in codai.models.cache)
- detect_available_backends, check_flash_attn_availability (now in codai.backends)
- ModelBackend abstract class (now in codai.backends.base)
- ModelManager, WhisperServerManager, MultiModelManager (now in codai.models.manager)
- QueueManager (now in codai.queue.manager)
- Utility functions (now in codai.models.utils)

The code now properly imports from codai modules instead of having inline duplicates.

df366b63

Update codai/models/utils.py with full implementations · 9299c34f

Your Name authored Mar 17, 2026

- Added complete check_hf_chat_template with global_args support
- Added complete get_resolved_model_name
- Added complete get_model_family with more model families
- Added complete get_reasoning_stop_tokens for more model families
- Added complete get_reasoning_system_prompt
- Added set_global_args and get_global_args for configuration

9299c34f

Add imports for ModelCapabilities and cache functions from codai modules · add2ecd1
Your Name authored Mar 17, 2026

add2ecd1

Refactor: Move backend and manager classes to codai modules · 81c39eb8

Your Name authored Mar 17, 2026

- Move NvidiaBackend to codai/backends/cuda.py
- Move VulkanBackend to codai/backends/vulkan.py
- Move ModelManager, WhisperServerManager, MultiModelManager to codai/models/manager.py
- Move QueueManager to codai/queue/manager.py
- Add proper exports in codai/backends/__init__.py
- Update imports in coderai to use new modules
- Fix import paths for base class and cache functions

81c39eb8

Revert to working version from commit 001e1708 · 7c6b60f0
Your Name authored Mar 17, 2026

7c6b60f0
Fix get_reasoning_stop_tokens to return 3 values · e7f781f3
Your Name authored Mar 17, 2026

e7f781f3
Fix VulkanBackend to accept original_backend parameter · 8e072ebb
Your Name authored Mar 17, 2026

8e072ebb
Add full ModelManager and MultiModelManager implementations · 059999f7
Your Name authored Mar 17, 2026

059999f7
Fix missing model_manager and queue_manager initialization · 020f4f6d
Your Name authored Mar 17, 2026

020f4f6d
Refactor: Move QueueManager to codai/queue/manager and restore FastAPI app · 989f1858
Your Name authored Mar 17, 2026

989f1858
Remove --reply-filters option, always apply malformed and tool_calls filters · 001e1708
Your Name authored Mar 17, 2026

001e1708
Fix model name in response: resolve aliases, extract filename from URLs, add coderai/ prefix · 8cc18c40
Your Name authored Mar 17, 2026

8cc18c40
Add debug output for model input in both streaming and non-streaming modes · 0653c58a
Your Name authored Mar 17, 2026

0653c58a

16 Mar, 2026 9 commits

Fix UnboundLocalError for stop_sequences in reasoning logic · 3890a849
Your Name authored Mar 16, 2026

3890a849

Enhance --force-reasoning with stop/inject options and add reasoning extraction · ef03dee8

Your Name authored Mar 16, 2026

- Added --force-reasoning with choices: 'stop', 'inject', 'both' (default)
- Add model-family detection for reasoning stop tokens
- Get appropriate stop tokens for Qwen, DeepSeek, Llama3, Mistral, Gemma, Hermes/Yi
- Add system prompt injection for forcing reasoning on non-native models
- Add extract_reasoning_content() function to parsers for extracting thinking tags

ef03dee8

Add --force-reasoning CLI flag for reasoning/thinking mode · ed8397a0

Your Name authored Mar 16, 2026

- Added --force-reasoning argument to enable reasoning mode for models
  that support it (Qwen3, DeepSeek R1, etc.)
- Modified chat_completions endpoint to check both API parameter
  enable_thinking and CLI flag force_reasoning
- When either is true, injects agentic template to enable thinking

ed8397a0

Add enable_thinking parameter to chat completion · 11526eee

Your Name authored Mar 16, 2026

- Add enable_thinking parameter to ChatCompletionRequest
- When enable_thinking=True, inject agentic system prompt to force thinking/reasoning
- Uses AgenticTemplateManager to inject thought tags for supported models

11526eee

Revert reasoning changes - fixing indentation error · b3e5d314
Your Name authored Mar 16, 2026

b3e5d314

Add reasoning/thinking extraction and forced reasoning support · 1a6467ca

Your Name authored Mar 16, 2026

- Add --force-reasoning CLI flag to force thinking mode for models like qwen3 coder
- Add check_force_reasoning() function to determine if reasoning should be forced
- Modify QwenParser to extract thinking/reasoning content instead of stripping it
- Add reasoning field to response message in non-streaming chat completions
- Prepend reasoning content to generated text in streaming responses
- Update OpenAIFormatter to include reasoning in response when available

1a6467ca

Fix openaiformatter import · b1e402b9
Your Name authored Mar 16, 2026

b1e402b9
Combine parsers module into parser.py · fee96eb2
Your Name authored Mar 16, 2026

fee96eb2
Add backward compatibility methods for format_litellm_full and format_litellm_chunk · 63c4c8a4
Your Name authored Mar 16, 2026

63c4c8a4