Commits · 059db0801bec9c8dc27a5bc0fb2be05d9f6a3757 · nexlab / coderai

17 Mar, 2026 40 commits

Add critical instruction to system prompt and tool tag fallback extraction · 059db080

Your Name authored Mar 17, 2026

- System prompt now includes: 'CRITICAL: You must always close your reasoning with ]]> before opening any tool tags'
- Extraction logic now uses tool tags as fallback stop markers if close tag is missing
- Handles: <tool_call>, <tool>, <|tool_call|>, <|tool|>, <function=

059db080

Add --dump output for raw mode first pass and extraction · 301371bf

Your Name authored Mar 17, 2026

Shows:
- Full first pass result
- Extraction details (close tag used, reasoning text, final text)
- Cleanup details

301371bf

Pass raw mode output through formatter/parser · b7bfccda

Your Name authored Mar 17, 2026

Now raw mode passes the generated text through OpenAIFormatter which:
- Handles tool extraction
- Provides OpenAI compatibility
- Handles other post-processing

This ensures raw mode output is treated the same as regular mode.

b7bfccda

Add cleanup_control_tokens and fix raw mode issues · d11b24fc

Your Name authored Mar 17, 2026

- Add cleanup_control_tokens function to strip leading/trailing control tokens
- Apply cleanup to final_text and second_pass_result in raw mode
- Add mock strategy handling to raw mode (was missing)
- Add debug output for cleanup steps

d11b24fc

Fix UnboundLocalError: template_manager not defined · 0c1c2429

Your Name authored Mar 17, 2026

When 'raw' is used without 'prompt', template_manager wasn't defined.
Now creating it on-demand when needed.

0c1c2429

Make 'raw' mutually exclusive with 'prompt' and 'inject' · ca6f9841

Your Name authored Mar 17, 2026

When 'raw' is used, skip the 'prompt', 'inject', and 'stop' handlers
since raw mode handles everything separately. This was causing
double assistant headers and corrupted prompts.

ca6f9841

Remove tokenizer approach, use only template_manager · 750d433f

Your Name authored Mar 17, 2026

The tokenizer approach was causing double assistant headers.
Now using only template_manager.format_for_raw_completion which
handles everything correctly.

750d433f

Use template_manager.format_for_raw_completion instead of tokenizer · 7d391da6

Your Name authored Mar 17, 2026

The AgenticTemplateManager already has a format_for_raw_completion method
that handles prompt formatting with reasoning tags. No need to manually
find the tokenizer - just use the existing template logic.

7d391da6

Add more debug output for tokenizer detection in raw mode · 51cee9e7

Your Name authored Mar 17, 2026

Now shows:
- current_manager type and backend type
- Available attributes on the backend
- Which path was used to find (or not find) the tokenizer
- Also checks model_manager.tokenizer as fallback

51cee9e7

Fix raw mode variable initialization · 47abbabb

Your Name authored Mar 17, 2026

Fixed issue where raw mode variables were being re-initialized,
which was overwriting the values set in the prompt handling section.

47abbabb

Add 'raw' option to --force-reasoning for native tokenizer prompt seeding · ceb4ae88

Your Name authored Mar 17, 2026

- Added 'raw' to valid force-reasoning options (chat, stop, inject, prompt, twopass, mock, raw)
- Implemented raw mode handler that:
  - Uses tokenizer.apply_chat_template() with add_generation_prompt=True
  - Seeds reasoning tag + commitment sentence
  - Uses two-pass generation: first captures reasoning, then gets final answer
  - Supports both streaming and non-streaming responses
  - Falls back gracefully if tokenizer not available

This enables using the model's native tokenizer for prompt seeding, bypassing
double-templating issues with chat APIs.

ceb4ae88

feat: Add 'The user requested' after thought tags in prompt seeding · 9de7c79d
Your Name authored Mar 17, 2026

9de7c79d
fix: Remove trailing space from thought tags in prompt seeding · 1260b67b
Your Name authored Mar 17, 2026
```
All Big 10 families now end with '<minimax:tool_call> ' without trailing space
```
1260b67b
fix: Actually use seeded prompt when prompt is selected · c4d8a497
Your Name authored Mar 17, 2026
```
Replace messages with seeded prompt for raw completion
```
c4d8a497
fix: Add space after thought tags in prompt seeding · 888c77cb
Your Name authored Mar 17, 2026
```
Now ends with ']~b] ' instead of ']~b]'
```
888c77cb
feat(templates): Add 'The user requested' after thought tag in prompt seeding · 916ced3b
Your Name authored Mar 17, 2026

916ced3b

fix: Use ]]> in inject when prompt is also selected · 9c169fac

Your Name authored Mar 17, 2026

When both inject and prompt are selected, use the same reasoning tag
(]]) for consistency instead of <|thought|>

9c169fac

fix: Chain --system-prompt at start of existing system message · 1abaf9c5
Your Name authored Mar 17, 2026
```
When --system-prompt is specified, it now prepends to any existing
system message instead of replacing it.
```
1abaf9c5

feat(cli): Add ]]> stop token when prompt is selected, add mock reasoning stats · fab01e8a

Your Name authored Mar 17, 2026

- Add ]]> to stop sequences when using 'prompt' option
- Add 'mock' strategy to add fake reasoning stats for VSCode plugin
- Add 'twopass' option (not yet implemented)

fab01e8a

feat(cli): Add --dump option to show model output · bdecb8c9
Your Name authored Mar 17, 2026
```
Shows:
- Raw model output
- Parsed output (after formatter)
- Litellm debug info (via --debug)
```
bdecb8c9
feat(cli): Add 'all' option to --force-reasoning · 905b1814
Your Name authored Mar 17, 2026
```
Use --force-reasoning all to enable chat, stop, inject, and prompt
```
905b1814

feat(cli): Add comma-separated --force-reasoning options · 08f64c61

Your Name authored Mar 17, 2026

New options for --force-reasoning:
- chat: Enable thinking API parameter
- stop: Add reasoning stop tokens
- inject: System prompt injection (includes stop)
- prompt: Prompt seeding with thought tag (includes stop)

Can combine: --force-reasoning chat,inject,prompt

Also added force_reasoning_prompt() to templates.py for prompt seeding.

08f64c61

feat(templates): Add inject_system and force_reasoning parameters · 76815ec9

Your Name authored Mar 17, 2026

- Add selectable parameters to format_for_raw_completion()
- inject_system: toggle agentic system prompt injection
- force_reasoning: toggle prompt seeding (thought tag)
- Update create_reasoning_prompt() convenience function

76815ec9

feat(templates): Add Prompt Seeding technique for forced reasoning · 0ed2e601

Your Name authored Mar 17, 2026

- Add REASONING_PREFIXES for Big 10 model families (Qwen, Llama3, DeepSeek, etc.)
- Add REASONING_STOP_TOKENS for stopping reasoning generation
- Add force_reasoning_prompt() to construct prompts ending with thought tags
- Add extract_reasoning() to parse reasoning from responses
- Add format_for_raw_completion() and create_reasoning_prompt() convenience functions
- This enables 'token hijacking' to force models to start with reasoning

0ed2e601

Add debug output for flash-attention and force-reasoning mode · b7d84534

Your Name authored Mar 17, 2026

- Enhanced flash attention status output in NvidiaBackend to always show availability
- Added debug output in chat completions endpoint for force-reasoning mode
- Shows CLI flag value, API param, reasoning action, and whether injection was done
- Displays the actual injected system prompt content when debug mode is enabled

b7d84534

Fix deprecation: torch_dtype -> dtype · b49d3f59
Your Name authored Mar 17, 2026

b49d3f59
Add cleanup method to MultiModelManager · ba4ce29f
Your Name authored Mar 17, 2026

ba4ce29f
Re-add image_model property to MultiModelManager · de9a6cdc
Your Name authored Mar 17, 2026

de9a6cdc
Add config attribute to MultiModelManager · e06dba80
Your Name authored Mar 17, 2026

e06dba80
Add image_model property to MultiModelManager · 31a50c6e
Your Name authored Mar 17, 2026

31a50c6e
Fix generate() method signature to match base class · 2a5f2bf6
Your Name authored Mar 17, 2026
```
Now accepts positional args: max_tokens, temperature, top_p, stop
```
2a5f2bf6
Fix ChatMessage Pydantic object handling in format_messages · 280b91c3
Your Name authored Mar 17, 2026
```
Convert ChatMessage objects to dicts before applying chat template.
```
280b91c3
Add missing get_resolved_model_name import · cc5500a1
Your Name authored Mar 17, 2026

cc5500a1
Add missing get_model_family import from codai.models.utils · 7019837a
Your Name authored Mar 17, 2026

7019837a

Use GGUF model's built-in chat template first · 563f878e

Your Name authored Mar 17, 2026

Now detects and uses the built-in chat template from GGUF files
loaded via llama-cpp-python before falling back to manual formatting.

563f878e

Reduce debug verbosity for tokenizer loading · e2deefb8
Your Name authored Mar 17, 2026

e2deefb8

Fix GGUF model loading from HuggingFace repos · c4182620

Your Name authored Mar 17, 2026

Now detects GGUF model repos (e.g., unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF)
and lists available GGUF files before downloading.

Prefers Q4_K_M or Q4_K quantizations when available.

c4182620

Fix VulkanBackend async/sync method signatures · 289a58f7

Your Name authored Mar 17, 2026

Fixed load_model and generate to be non-async methods (matching base class):
- load_model: changed from async def returning bool to def returning None
- generate: changed from async def to def (removed streaming support in sync version)
- Removed 'stream' parameter from generate since it's now sync
- chat: changed from async def to def
- generate_stream remains async def (correct for streaming)

289a58f7

Fix VulkanBackend missing abstract methods · 660bce2d

Your Name authored Mar 17, 2026

Added:
- get_model_name()
- format_messages()
- cleanup()

These were required by the ModelBackend abstract base class.

660bce2d

Remove all duplicate class definitions from coderai · a6907dd6

Your Name authored Mar 17, 2026

Removed ~2050 lines of duplicate code:
- Pydantic models (ToolFunction, Tool, ChatMessage, etc.) - now from codai.pydantic
- ModelParserAdapter, ToolCallParser - now from codai.models
- NvidiaBackend, VulkanBackend - now from codai.backends
- All other duplicates removed

Now coderai properly imports all classes from codai modules.

a6907dd6