- 17 Mar, 2026 40 commits
-
-
Your Name authored
When resolving 'default' model, skip the 'default' key in models dict and return the first actual loaded model instead.
-
Your Name authored
- Removed --force-all CLI flag - Updated help text to document 'all' option - Updated reasoning_choices to expand 'all' to all options - Updated parsing logic to handle 'all' as string
-
Your Name authored
When requested_model is 'default', get_resolved_model_name now: 1. First tries current_manager.default_model 2. Falls back to the first model in manager.models This ensures the model family is correctly detected for parser selection.
-
Your Name authored
When the model-specific parser (like QwenParser) doesn't find any tool calls, fall back to ToolCallParser as a catch-all.
-
Your Name authored
The regex pattern was missing <tool> - it only matched <tool_call>, funktion>, and <tool_use>. Now it also matches <tool>.
-
Your Name authored
Add custom XML tool format parser for <tool><action>...</action><object>...</object><properties>...</properties></tool> The model was generating tool calls in this format: <tool> <action>search</action> <object>financial_data</object> <properties> <query>...</query> </properties> </tool> Added parser support in ApexBig50Parser and strip_tool_calls_from_content to handle this custom format.
-
Your Name authored
- Bug 1: Skip format_tools_for_prompt in raw mode (already had condition) - Bug 2: Use final_text (after reasoning) instead of generated_text for formatter - Bug 3: Pass final_text to ModelParserAdapter instead of generated_text This prevents reasoning from appearing in both content AND reasoning fields, and allows the tool parser to properly extract tool calls without being confused by reasoning tags.
-
Your Name authored
- Add new --force-all CLI argument - Update --force-reasoning help text to mention --force-all - Handle --force-all in main function to expand to all reasoning options
-
Your Name authored
- Add force_reasoning_prompt function with Big 10 family prefixes - Add inject_system and force_reasoning parameters - Update --force-reasoning CLI with comma-separated options - Add --dump option to show raw output, parsed output, and litellm debug - Fix stop tokens to include ]]> when prompt is selected - Add mock strategy for fake reasoning stats - Chain --system-prompt at start of existing system message - Add 'raw' option to --force-reasoning - Fix format_tools_for_prompt to skip in raw mode - Pass tools to format_for_raw_completion in raw mode - Add parse_and_format method to OpenAIFormatter for tool extraction - Use parse_and_format in raw mode for correct tool extraction pipeline Pipeline: Model output -> Extract reasoning (raw mode) -> ModelParserAdapter (extract tools) -> OpenAIFormatter (final format)
-
Your Name authored
- Add truthy check before 'in' operator to prevent TypeError when force_reasoning_args is None (when --force-reasoning is not specified) - Fixes: name 'force_reasoning_args' is not defined error
-
Your Name authored
- In raw mode, extracted reasoning is now preserved in the response - Mock reasoning is only applied when there's no existing reasoning - Added logic to set extracted reasoning in message after formatter - Same fix applied to non-raw path in generate_chat_response
-
Your Name authored
- System prompt now includes: 'CRITICAL: You must always close your reasoning with ]]> before opening any tool tags' - Extraction logic now uses tool tags as fallback stop markers if close tag is missing - Handles: <tool_call>, <tool>, <|tool_call|>, <|tool|>, <function=
-
Your Name authored
Shows: - Full first pass result - Extraction details (close tag used, reasoning text, final text) - Cleanup details
-
Your Name authored
Now raw mode passes the generated text through OpenAIFormatter which: - Handles tool extraction - Provides OpenAI compatibility - Handles other post-processing This ensures raw mode output is treated the same as regular mode.
-
Your Name authored
- Add cleanup_control_tokens function to strip leading/trailing control tokens - Apply cleanup to final_text and second_pass_result in raw mode - Add mock strategy handling to raw mode (was missing) - Add debug output for cleanup steps
-
Your Name authored
When 'raw' is used without 'prompt', template_manager wasn't defined. Now creating it on-demand when needed.
-
Your Name authored
When 'raw' is used, skip the 'prompt', 'inject', and 'stop' handlers since raw mode handles everything separately. This was causing double assistant headers and corrupted prompts.
-
Your Name authored
The tokenizer approach was causing double assistant headers. Now using only template_manager.format_for_raw_completion which handles everything correctly.
-
Your Name authored
The AgenticTemplateManager already has a format_for_raw_completion method that handles prompt formatting with reasoning tags. No need to manually find the tokenizer - just use the existing template logic.
-
Your Name authored
Now shows: - current_manager type and backend type - Available attributes on the backend - Which path was used to find (or not find) the tokenizer - Also checks model_manager.tokenizer as fallback
-
Your Name authored
Fixed issue where raw mode variables were being re-initialized, which was overwriting the values set in the prompt handling section.
-
Your Name authored
- Added 'raw' to valid force-reasoning options (chat, stop, inject, prompt, twopass, mock, raw) - Implemented raw mode handler that: - Uses tokenizer.apply_chat_template() with add_generation_prompt=True - Seeds reasoning tag + commitment sentence - Uses two-pass generation: first captures reasoning, then gets final answer - Supports both streaming and non-streaming responses - Falls back gracefully if tokenizer not available This enables using the model's native tokenizer for prompt seeding, bypassing double-templating issues with chat APIs.
-
Your Name authored
-
Your Name authored
All Big 10 families now end with '<minimax:tool_call> ' without trailing space
-
Your Name authored
Replace messages with seeded prompt for raw completion
-
Your Name authored
Now ends with ']~b] ' instead of ']~b]'
-
Your Name authored
-
Your Name authored
When both inject and prompt are selected, use the same reasoning tag (]]) for consistency instead of <|thought|>
-
Your Name authored
When --system-prompt is specified, it now prepends to any existing system message instead of replacing it.
-
Your Name authored
- Add ]]> to stop sequences when using 'prompt' option - Add 'mock' strategy to add fake reasoning stats for VSCode plugin - Add 'twopass' option (not yet implemented)
-
Your Name authored
Shows: - Raw model output - Parsed output (after formatter) - Litellm debug info (via --debug)
-
Your Name authored
Use --force-reasoning all to enable chat, stop, inject, and prompt
-
Your Name authored
New options for --force-reasoning: - chat: Enable thinking API parameter - stop: Add reasoning stop tokens - inject: System prompt injection (includes stop) - prompt: Prompt seeding with thought tag (includes stop) Can combine: --force-reasoning chat,inject,prompt Also added force_reasoning_prompt() to templates.py for prompt seeding.
-
Your Name authored
- Add selectable parameters to format_for_raw_completion() - inject_system: toggle agentic system prompt injection - force_reasoning: toggle prompt seeding (thought tag) - Update create_reasoning_prompt() convenience function
-
Your Name authored
- Add REASONING_PREFIXES for Big 10 model families (Qwen, Llama3, DeepSeek, etc.) - Add REASONING_STOP_TOKENS for stopping reasoning generation - Add force_reasoning_prompt() to construct prompts ending with thought tags - Add extract_reasoning() to parse reasoning from responses - Add format_for_raw_completion() and create_reasoning_prompt() convenience functions - This enables 'token hijacking' to force models to start with reasoning
-
Your Name authored
- Enhanced flash attention status output in NvidiaBackend to always show availability - Added debug output in chat completions endpoint for force-reasoning mode - Shows CLI flag value, API param, reasoning action, and whether injection was done - Displays the actual injected system prompt content when debug mode is enabled
-
Your Name authored
-
Your Name authored
-
Your Name authored
-
Your Name authored
-