- 01 Mar, 2026 22 commits
-
-
Stefy Lanza (nextime / spora ) authored
Add session management, readline history, context compression, --ctx, --micro flags, and context counter in prompt
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
Fix streaming display in coder CLI - use iter_lines for immediate output, remove threading timer, simplify tool parsing
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
- 28 Feb, 2026 18 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- Add generate_chat() and generate_chat_stream() methods to VulkanBackend - These use create_chat_completion() which properly applies model's chat template - Fallback to manual formatting if create_chat_completion fails - Update API endpoints to pass messages dict directly instead of formatted prompt - Fixes garbled output with Qwen3 and other models that use custom chat templates
-
Stefy Lanza (nextime / spora ) authored
- Use apply_chat_template() to properly format messages for each model - This ensures Qwen3 and other models get their correct chat format - Fallback to <|im_start|>/|im_end|> format if apply_chat_template fails - Fixes garbled output with <|system|> tags appearing in responses
-
Stefy Lanza (nextime / spora ) authored
- Add --tiny command line flag for models under 3B parameters - Add 'tiny' field to config file (can be set via config or CLI) - Add TINY_MODEL_SYSTEM_PROMPT with simplified instructions - Emphasizes spacing rules for models that produce garbled output - Shorter, more direct system prompt for limited context windows
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- Add --system-prompt flag to coderai for optional system prompt injection - System prompt is only sent when flag is provided (no default) - Supports --system-prompt (default text) or --system-prompt 'custom text' - Add coder CLI tool for interactive chat with file editing - Add requests dependency for CLI tool
-
Stefy Lanza (nextime / spora ) authored
Add _get_gpu_memory_map() to configure optimal memory strategy: - GPU: 95% of available VRAM (leaves 5% for CUDA overhead) - CPU: Up to user-specified limit (--ram) or auto-detected - Disk: Only as last resort when GPU+CPU are full Update --ram help text to clarify it's the CPU offloading limit. This provides better performance by prioritizing GPU, then CPU, and only using slow disk offloading when absolutely necessary.
-
Stefy Lanza (nextime / spora ) authored
- Add _parse_nested_xml_tool() to extract tool calls from nested XML - Add _xml_to_dict() helper for recursive XML to dict conversion - Update extract_tool_calls() to try both JSON and nested XML formats - Improve system prompt with clearer tool format instructions and examples This fixes the issue where models outputting raw XML tool syntax (like Kimi K2.5) would have their tool calls end up in the response text instead of being properly parsed.
-
Stefy Lanza (nextime / spora ) authored
- Updated ChatMessage.content to accept Union[str, List[Dict]] - Added field_validator to convert multipart content arrays to strings - Handles modern OpenAI API format where content is an array of objects - Fixes 422 validation errors with clients like KiloCode that send multipart messages
-
Stefy Lanza (nextime / spora ) authored
- Updated main description to include Intel GPUs - Expanded features section to list Intel as a supported backend - Updated prerequisites to explain Vulkan works with Intel iGPUs and Arc - Clarified that build.sh vulkan works for both AMD and Intel - Added Intel-specific notes and recommendations - Updated GPU compatibility matrix with Intel hardware - Added performance expectations for different GPU types
-
Stefy Lanza (nextime / spora ) authored
- Added detailed request body logging with truncation for large payloads - Added JSON structure parsing to show message count and keys - Added comprehensive error response capture for 422 errors - Added validation error detail parsing (location, message, type) - Added full traceback logging for exceptions during request processing - This helps debug client compatibility issues with KiloCode
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- Added extra="allow" to ChatCompletionRequest and CompletionRequest - Added common OpenAI fields: seed, logprobs, top_logprobs, response_format, user, best_of, echo - This prevents 422 errors when clients send additional fields we don't use Fixes compatibility issues with KiloCode and other OpenAI-compatible clients
-
Stefy Lanza (nextime / spora ) authored
-