- 08 Mar, 2026 6 commits
-
-
Stefy Lanza (nextime / spora ) authored
- Fix: Handle None content in messages to prevent Jinja2 'dict object has no attribute content' error - Added safety check in chat_completions function - Fixed _manual_format_messages to explicitly check for None - Fixed format_messages in VulkanBackend to ensure content is never None - Fix: Always filter tool call format from output - Changed filter to run unconditionally (not just when tools are present) - Added extra regex patterns for JSON format tool calls like <tool>{...}</tool> - Also fixed: Minor typos in comments (cket ->cket) -
Stefy Lanza (nextime / spora ) authored
- Add seen_signatures set to extract_tool_calls() to prevent duplicates - Add strip_tool_calls_from_content() method to remove <tool>...</tool> tags - Filter tool format from each chunk in real-time during streaming - Simplify post-stream tool call handling since content is already cleaned - Also handle non-streaming responses for tool call content cleanup
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
- 07 Mar, 2026 3 commits
-
-
Stefy Lanza (nextime / spora ) authored
Detect chat template from model and use appropriate formatting - avoid Jinja errors by using manual formatting when template detection fails
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
- 05 Mar, 2026 3 commits
-
-
Stefy Lanza (nextime / spora ) authored
Modify _try_load_model() to catch TypeError when quantization arguments are not supported by the model class. When this happens, the method now: 1. Warns the user about unsupported quantization 2. Retries loading the model without quantization arguments 3. Returns the model successfully if loading works This fixes issues with models like Qwen3.5 that don't support bitsandbytes quantization.
-
Stefy Lanza (nextime / spora ) authored
- Wrap generate() with try-except to catch CUDA OOM errors - On OOM: clear CUDA cache, retry with half tokens, return graceful error if still failing - Wrap generate_stream() thread with error handling using shared variable - Yield error messages to client instead of crashing the process - Allows server to continue running after generation OOM
-
Stefy Lanza (nextime / spora ) authored
This new parameter allows users to specify the exact percentage of GPU VRAM to use, overriding the offload-strategy. When specified, the model will: 1. Use up to max-gpu-percent of VRAM 2. Offload remaining weights to CPU RAM (--ram) 3. Overflow to disk (--offload-dir) if RAM exhausted 4. Automatically fallback in 5% steps if OOM occurs Example usage for RTX 3090 with Qwen3.5-35B-A3B: coderai --model Qwen/Qwen3.5-35B-A3B --max-gpu-percent 50 --ram 64 This ensures MoE models with high VRAM requirements during generation can run without OOM by using CPU RAM as the primary offload target.
-
- 01 Mar, 2026 28 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
Add session management, readline history, context compression, --ctx, --micro flags, and context counter in prompt
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
Fix streaming display in coder CLI - use iter_lines for immediate output, remove threading timer, simplify tool parsing
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-