Commits · 169d3647e7bc4a2e954a6a3cab38d9068078ebb2 · nexlab / coderai

17 Mar, 2026 40 commits

Fix: Skip 'default' key when resolving model name · 169d3647

Your Name authored Mar 17, 2026

When resolving 'default' model, skip the 'default' key in models dict
and return the first actual loaded model instead.

169d3647

Remove --force-all, use --force-reasoning all instead · cff368c7

Your Name authored Mar 17, 2026

- Removed --force-all CLI flag
- Updated help text to document 'all' option
- Updated reasoning_choices to expand 'all' to all options
- Updated parsing logic to handle 'all' as string

cff368c7

Fix: Properly resolve 'default' model name to actual model · 2983ddf4

Your Name authored Mar 17, 2026

When requested_model is 'default', get_resolved_model_name now:
1. First tries current_manager.default_model
2. Falls back to the first model in manager.models

This ensures the model family is correctly detected for parser selection.

2983ddf4

Add ToolCallParser fallback in ModelParserAdapter · d16f206e

Your Name authored Mar 17, 2026

When the model-specific parser (like QwenParser) doesn't find any
tool calls, fall back to ToolCallParser as a catch-all.

d16f206e

Fix: Add <tool> to XML pattern in ApexBig50Parser · a7e992a0

Your Name authored Mar 17, 2026

The regex pattern was missing <tool> - it only matched <tool_call>,
funktion>, and <tool_use>. Now it also matches <tool>.

a7e992a0

Add custom XML tool format parser for... · caddb164

Your Name authored Mar 17, 2026

Add custom XML tool format parser for <tool><action>...</action><object>...</object><properties>...</properties></tool>

The model was generating tool calls in this format:
<tool>
<action>search</action>
<object>financial_data</object>
<properties>
  <query>...</query>
</properties>
</tool>

Added parser support in ApexBig50Parser and strip_tool_calls_from_content
to handle this custom format.

caddb164

Fix force-reasoning bugs: duplicate tools, reasoning duplication, tool extraction · bcd9bc55

Your Name authored Mar 17, 2026

- Bug 1: Skip format_tools_for_prompt in raw mode (already had condition)
- Bug 2: Use final_text (after reasoning) instead of generated_text for formatter
- Bug 3: Pass final_text to ModelParserAdapter instead of generated_text

This prevents reasoning from appearing in both content AND reasoning fields,
and allows the tool parser to properly extract tool calls without being
confused by reasoning tags.

bcd9bc55

feat: Add --force-all flag equivalent to chat,inject,prompt,mock,raw,twopass · 017c0399

Your Name authored Mar 17, 2026

- Add new --force-all CLI argument
- Update --force-reasoning help text to mention --force-all
- Handle --force-all in main function to expand to all reasoning options

017c0399

feat: Complete --force-reasoning implementation with raw mode tool extraction · 207ac71f

Your Name authored Mar 17, 2026

- Add force_reasoning_prompt function with Big 10 family prefixes
- Add inject_system and force_reasoning parameters
- Update --force-reasoning CLI with comma-separated options
- Add --dump option to show raw output, parsed output, and litellm debug
- Fix stop tokens to include ]]> when prompt is selected
- Add mock strategy for fake reasoning stats
- Chain --system-prompt at start of existing system message
- Add 'raw' option to --force-reasoning
- Fix format_tools_for_prompt to skip in raw mode
- Pass tools to format_for_raw_completion in raw mode
- Add parse_and_format method to OpenAIFormatter for tool extraction
- Use parse_and_format in raw mode for correct tool extraction pipeline

Pipeline: Model output -> Extract reasoning (raw mode) -> ModelParserAdapter (extract tools) -> OpenAIFormatter (final format)

207ac71f

Fix force_reasoning_args None check in mock reasoning logic · 329a0042

Your Name authored Mar 17, 2026

- Add truthy check before 'in' operator to prevent TypeError when
  force_reasoning_args is None (when --force-reasoning is not specified)
- Fixes: name 'force_reasoning_args' is not defined error

329a0042

Fix: Only apply mock reasoning when no real reasoning extracted · 1266f46a

Your Name authored Mar 17, 2026

- In raw mode, extracted reasoning is now preserved in the response
- Mock reasoning is only applied when there's no existing reasoning
- Added logic to set extracted reasoning in message after formatter
- Same fix applied to non-raw path in generate_chat_response

1266f46a

Add critical instruction to system prompt and tool tag fallback extraction · 059db080

Your Name authored Mar 17, 2026

- System prompt now includes: 'CRITICAL: You must always close your reasoning with ]]> before opening any tool tags'
- Extraction logic now uses tool tags as fallback stop markers if close tag is missing
- Handles: <tool_call>, <tool>, <|tool_call|>, <|tool|>, <function=

059db080

Add --dump output for raw mode first pass and extraction · 301371bf

Your Name authored Mar 17, 2026

Shows:
- Full first pass result
- Extraction details (close tag used, reasoning text, final text)
- Cleanup details

301371bf

Pass raw mode output through formatter/parser · b7bfccda

Your Name authored Mar 17, 2026

Now raw mode passes the generated text through OpenAIFormatter which:
- Handles tool extraction
- Provides OpenAI compatibility
- Handles other post-processing

This ensures raw mode output is treated the same as regular mode.

b7bfccda

Add cleanup_control_tokens and fix raw mode issues · d11b24fc

Your Name authored Mar 17, 2026

- Add cleanup_control_tokens function to strip leading/trailing control tokens
- Apply cleanup to final_text and second_pass_result in raw mode
- Add mock strategy handling to raw mode (was missing)
- Add debug output for cleanup steps

d11b24fc

Fix UnboundLocalError: template_manager not defined · 0c1c2429

Your Name authored Mar 17, 2026

When 'raw' is used without 'prompt', template_manager wasn't defined.
Now creating it on-demand when needed.

0c1c2429

Make 'raw' mutually exclusive with 'prompt' and 'inject' · ca6f9841

Your Name authored Mar 17, 2026

When 'raw' is used, skip the 'prompt', 'inject', and 'stop' handlers
since raw mode handles everything separately. This was causing
double assistant headers and corrupted prompts.

ca6f9841

Remove tokenizer approach, use only template_manager · 750d433f

Your Name authored Mar 17, 2026

The tokenizer approach was causing double assistant headers.
Now using only template_manager.format_for_raw_completion which
handles everything correctly.

750d433f

Use template_manager.format_for_raw_completion instead of tokenizer · 7d391da6

Your Name authored Mar 17, 2026

The AgenticTemplateManager already has a format_for_raw_completion method
that handles prompt formatting with reasoning tags. No need to manually
find the tokenizer - just use the existing template logic.

7d391da6

Add more debug output for tokenizer detection in raw mode · 51cee9e7

Your Name authored Mar 17, 2026

Now shows:
- current_manager type and backend type
- Available attributes on the backend
- Which path was used to find (or not find) the tokenizer
- Also checks model_manager.tokenizer as fallback

51cee9e7

Fix raw mode variable initialization · 47abbabb

Your Name authored Mar 17, 2026

Fixed issue where raw mode variables were being re-initialized,
which was overwriting the values set in the prompt handling section.

47abbabb

Add 'raw' option to --force-reasoning for native tokenizer prompt seeding · ceb4ae88

Your Name authored Mar 17, 2026

- Added 'raw' to valid force-reasoning options (chat, stop, inject, prompt, twopass, mock, raw)
- Implemented raw mode handler that:
  - Uses tokenizer.apply_chat_template() with add_generation_prompt=True
  - Seeds reasoning tag + commitment sentence
  - Uses two-pass generation: first captures reasoning, then gets final answer
  - Supports both streaming and non-streaming responses
  - Falls back gracefully if tokenizer not available

This enables using the model's native tokenizer for prompt seeding, bypassing
double-templating issues with chat APIs.

ceb4ae88

feat: Add 'The user requested' after thought tags in prompt seeding · 9de7c79d
Your Name authored Mar 17, 2026

9de7c79d
fix: Remove trailing space from thought tags in prompt seeding · 1260b67b
Your Name authored Mar 17, 2026
```
All Big 10 families now end with '<minimax:tool_call> ' without trailing space
```
1260b67b
fix: Actually use seeded prompt when prompt is selected · c4d8a497
Your Name authored Mar 17, 2026
```
Replace messages with seeded prompt for raw completion
```
c4d8a497
fix: Add space after thought tags in prompt seeding · 888c77cb
Your Name authored Mar 17, 2026
```
Now ends with ']~b] ' instead of ']~b]'
```
888c77cb
feat(templates): Add 'The user requested' after thought tag in prompt seeding · 916ced3b
Your Name authored Mar 17, 2026

916ced3b

fix: Use ]]> in inject when prompt is also selected · 9c169fac

Your Name authored Mar 17, 2026

When both inject and prompt are selected, use the same reasoning tag
(]]) for consistency instead of <|thought|>

9c169fac

fix: Chain --system-prompt at start of existing system message · 1abaf9c5
Your Name authored Mar 17, 2026
```
When --system-prompt is specified, it now prepends to any existing
system message instead of replacing it.
```
1abaf9c5

feat(cli): Add ]]> stop token when prompt is selected, add mock reasoning stats · fab01e8a

Your Name authored Mar 17, 2026

- Add ]]> to stop sequences when using 'prompt' option
- Add 'mock' strategy to add fake reasoning stats for VSCode plugin
- Add 'twopass' option (not yet implemented)

fab01e8a

feat(cli): Add --dump option to show model output · bdecb8c9
Your Name authored Mar 17, 2026
```
Shows:
- Raw model output
- Parsed output (after formatter)
- Litellm debug info (via --debug)
```
bdecb8c9
feat(cli): Add 'all' option to --force-reasoning · 905b1814
Your Name authored Mar 17, 2026
```
Use --force-reasoning all to enable chat, stop, inject, and prompt
```
905b1814

feat(cli): Add comma-separated --force-reasoning options · 08f64c61

Your Name authored Mar 17, 2026

New options for --force-reasoning:
- chat: Enable thinking API parameter
- stop: Add reasoning stop tokens
- inject: System prompt injection (includes stop)
- prompt: Prompt seeding with thought tag (includes stop)

Can combine: --force-reasoning chat,inject,prompt

Also added force_reasoning_prompt() to templates.py for prompt seeding.

08f64c61

feat(templates): Add inject_system and force_reasoning parameters · 76815ec9

Your Name authored Mar 17, 2026

- Add selectable parameters to format_for_raw_completion()
- inject_system: toggle agentic system prompt injection
- force_reasoning: toggle prompt seeding (thought tag)
- Update create_reasoning_prompt() convenience function

76815ec9

feat(templates): Add Prompt Seeding technique for forced reasoning · 0ed2e601

Your Name authored Mar 17, 2026

- Add REASONING_PREFIXES for Big 10 model families (Qwen, Llama3, DeepSeek, etc.)
- Add REASONING_STOP_TOKENS for stopping reasoning generation
- Add force_reasoning_prompt() to construct prompts ending with thought tags
- Add extract_reasoning() to parse reasoning from responses
- Add format_for_raw_completion() and create_reasoning_prompt() convenience functions
- This enables 'token hijacking' to force models to start with reasoning

0ed2e601

Add debug output for flash-attention and force-reasoning mode · b7d84534

Your Name authored Mar 17, 2026

- Enhanced flash attention status output in NvidiaBackend to always show availability
- Added debug output in chat completions endpoint for force-reasoning mode
- Shows CLI flag value, API param, reasoning action, and whether injection was done
- Displays the actual injected system prompt content when debug mode is enabled

b7d84534

Fix deprecation: torch_dtype -> dtype · b49d3f59
Your Name authored Mar 17, 2026

b49d3f59
Add cleanup method to MultiModelManager · ba4ce29f
Your Name authored Mar 17, 2026

ba4ce29f
Re-add image_model property to MultiModelManager · de9a6cdc
Your Name authored Mar 17, 2026

de9a6cdc
Add config attribute to MultiModelManager · e06dba80
Your Name authored Mar 17, 2026

e06dba80