Commits · 916ced3bbcef11f7e6f4389afc55462d28842493 · nexlab / coderai

17 Mar, 2026 38 commits

feat(templates): Add 'The user requested' after thought tag in prompt seeding · 916ced3b
Your Name authored Mar 17, 2026

916ced3b

fix: Use ]]> in inject when prompt is also selected · 9c169fac

Your Name authored Mar 17, 2026

When both inject and prompt are selected, use the same reasoning tag
(]]) for consistency instead of <|thought|>

9c169fac

fix: Chain --system-prompt at start of existing system message · 1abaf9c5
Your Name authored Mar 17, 2026
```
When --system-prompt is specified, it now prepends to any existing
system message instead of replacing it.
```
1abaf9c5

feat(cli): Add ]]> stop token when prompt is selected, add mock reasoning stats · fab01e8a

Your Name authored Mar 17, 2026

- Add ]]> to stop sequences when using 'prompt' option
- Add 'mock' strategy to add fake reasoning stats for VSCode plugin
- Add 'twopass' option (not yet implemented)

fab01e8a

feat(cli): Add --dump option to show model output · bdecb8c9
Your Name authored Mar 17, 2026
```
Shows:
- Raw model output
- Parsed output (after formatter)
- Litellm debug info (via --debug)
```
bdecb8c9
feat(cli): Add 'all' option to --force-reasoning · 905b1814
Your Name authored Mar 17, 2026
```
Use --force-reasoning all to enable chat, stop, inject, and prompt
```
905b1814

feat(cli): Add comma-separated --force-reasoning options · 08f64c61

Your Name authored Mar 17, 2026

New options for --force-reasoning:
- chat: Enable thinking API parameter
- stop: Add reasoning stop tokens
- inject: System prompt injection (includes stop)
- prompt: Prompt seeding with thought tag (includes stop)

Can combine: --force-reasoning chat,inject,prompt

Also added force_reasoning_prompt() to templates.py for prompt seeding.

08f64c61

feat(templates): Add inject_system and force_reasoning parameters · 76815ec9

Your Name authored Mar 17, 2026

- Add selectable parameters to format_for_raw_completion()
- inject_system: toggle agentic system prompt injection
- force_reasoning: toggle prompt seeding (thought tag)
- Update create_reasoning_prompt() convenience function

76815ec9

feat(templates): Add Prompt Seeding technique for forced reasoning · 0ed2e601

Your Name authored Mar 17, 2026

- Add REASONING_PREFIXES for Big 10 model families (Qwen, Llama3, DeepSeek, etc.)
- Add REASONING_STOP_TOKENS for stopping reasoning generation
- Add force_reasoning_prompt() to construct prompts ending with thought tags
- Add extract_reasoning() to parse reasoning from responses
- Add format_for_raw_completion() and create_reasoning_prompt() convenience functions
- This enables 'token hijacking' to force models to start with reasoning

0ed2e601

Add debug output for flash-attention and force-reasoning mode · b7d84534

Your Name authored Mar 17, 2026

- Enhanced flash attention status output in NvidiaBackend to always show availability
- Added debug output in chat completions endpoint for force-reasoning mode
- Shows CLI flag value, API param, reasoning action, and whether injection was done
- Displays the actual injected system prompt content when debug mode is enabled

b7d84534

Fix deprecation: torch_dtype -> dtype · b49d3f59
Your Name authored Mar 17, 2026

b49d3f59
Add cleanup method to MultiModelManager · ba4ce29f
Your Name authored Mar 17, 2026

ba4ce29f
Re-add image_model property to MultiModelManager · de9a6cdc
Your Name authored Mar 17, 2026

de9a6cdc
Add config attribute to MultiModelManager · e06dba80
Your Name authored Mar 17, 2026

e06dba80
Add image_model property to MultiModelManager · 31a50c6e
Your Name authored Mar 17, 2026

31a50c6e
Fix generate() method signature to match base class · 2a5f2bf6
Your Name authored Mar 17, 2026
```
Now accepts positional args: max_tokens, temperature, top_p, stop
```
2a5f2bf6
Fix ChatMessage Pydantic object handling in format_messages · 280b91c3
Your Name authored Mar 17, 2026
```
Convert ChatMessage objects to dicts before applying chat template.
```
280b91c3
Add missing get_resolved_model_name import · cc5500a1
Your Name authored Mar 17, 2026

cc5500a1
Add missing get_model_family import from codai.models.utils · 7019837a
Your Name authored Mar 17, 2026

7019837a

Use GGUF model's built-in chat template first · 563f878e

Your Name authored Mar 17, 2026

Now detects and uses the built-in chat template from GGUF files
loaded via llama-cpp-python before falling back to manual formatting.

563f878e

Reduce debug verbosity for tokenizer loading · e2deefb8
Your Name authored Mar 17, 2026

e2deefb8

Fix GGUF model loading from HuggingFace repos · c4182620

Your Name authored Mar 17, 2026

Now detects GGUF model repos (e.g., unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF)
and lists available GGUF files before downloading.

Prefers Q4_K_M or Q4_K quantizations when available.

c4182620

Fix VulkanBackend async/sync method signatures · 289a58f7

Your Name authored Mar 17, 2026

Fixed load_model and generate to be non-async methods (matching base class):
- load_model: changed from async def returning bool to def returning None
- generate: changed from async def to def (removed streaming support in sync version)
- Removed 'stream' parameter from generate since it's now sync
- chat: changed from async def to def
- generate_stream remains async def (correct for streaming)

289a58f7

Fix VulkanBackend missing abstract methods · 660bce2d

Your Name authored Mar 17, 2026

Added:
- get_model_name()
- format_messages()
- cleanup()

These were required by the ModelBackend abstract base class.

660bce2d

Remove all duplicate class definitions from coderai · a6907dd6

Your Name authored Mar 17, 2026

Removed ~2050 lines of duplicate code:
- Pydantic models (ToolFunction, Tool, ChatMessage, etc.) - now from codai.pydantic
- ModelParserAdapter, ToolCallParser - now from codai.models
- NvidiaBackend, VulkanBackend - now from codai.backends
- All other duplicates removed

Now coderai properly imports all classes from codai modules.

a6907dd6

Remove duplicate class/function definitions from coderai · df366b63

Your Name authored Mar 17, 2026

Removed ~1500 lines of duplicate code that now exist in codai modules:
- ModelCapabilities, detect_model_capabilities (now in codai.models.capabilities)
- Cache functions (now in codai.models.cache)
- detect_available_backends, check_flash_attn_availability (now in codai.backends)
- ModelBackend abstract class (now in codai.backends.base)
- ModelManager, WhisperServerManager, MultiModelManager (now in codai.models.manager)
- QueueManager (now in codai.queue.manager)
- Utility functions (now in codai.models.utils)

The code now properly imports from codai modules instead of having inline duplicates.

df366b63

Update codai/models/utils.py with full implementations · 9299c34f

Your Name authored Mar 17, 2026

- Added complete check_hf_chat_template with global_args support
- Added complete get_resolved_model_name
- Added complete get_model_family with more model families
- Added complete get_reasoning_stop_tokens for more model families
- Added complete get_reasoning_system_prompt
- Added set_global_args and get_global_args for configuration

9299c34f

Add imports for ModelCapabilities and cache functions from codai modules · add2ecd1
Your Name authored Mar 17, 2026

add2ecd1

Refactor: Move backend and manager classes to codai modules · 81c39eb8

Your Name authored Mar 17, 2026

- Move NvidiaBackend to codai/backends/cuda.py
- Move VulkanBackend to codai/backends/vulkan.py
- Move ModelManager, WhisperServerManager, MultiModelManager to codai/models/manager.py
- Move QueueManager to codai/queue/manager.py
- Add proper exports in codai/backends/__init__.py
- Update imports in coderai to use new modules
- Fix import paths for base class and cache functions

81c39eb8

Revert to working version from commit 001e1708 · 7c6b60f0
Your Name authored Mar 17, 2026

7c6b60f0
Fix get_reasoning_stop_tokens to return 3 values · e7f781f3
Your Name authored Mar 17, 2026

e7f781f3
Fix VulkanBackend to accept original_backend parameter · 8e072ebb
Your Name authored Mar 17, 2026

8e072ebb
Add full ModelManager and MultiModelManager implementations · 059999f7
Your Name authored Mar 17, 2026

059999f7
Fix missing model_manager and queue_manager initialization · 020f4f6d
Your Name authored Mar 17, 2026

020f4f6d
Refactor: Move QueueManager to codai/queue/manager and restore FastAPI app · 989f1858
Your Name authored Mar 17, 2026

989f1858
Remove --reply-filters option, always apply malformed and tool_calls filters · 001e1708
Your Name authored Mar 17, 2026

001e1708
Fix model name in response: resolve aliases, extract filename from URLs, add coderai/ prefix · 8cc18c40
Your Name authored Mar 17, 2026

8cc18c40
Add debug output for model input in both streaming and non-streaming modes · 0653c58a
Your Name authored Mar 17, 2026

0653c58a

16 Mar, 2026 2 commits

Fix UnboundLocalError for stop_sequences in reasoning logic · 3890a849
Your Name authored Mar 16, 2026

3890a849

Enhance --force-reasoning with stop/inject options and add reasoning extraction · ef03dee8

Your Name authored Mar 16, 2026

- Added --force-reasoning with choices: 'stop', 'inject', 'both' (default)
- Add model-family detection for reasoning stop tokens
- Get appropriate stop tokens for Qwen, DeepSeek, Llama3, Mistral, Gemma, Hermes/Yi
- Add system prompt injection for forcing reasoning on non-native models
- Add extract_reasoning_content() function to parsers for extracting thinking tags

ef03dee8