Commits · 1de9996c45e5b5b1b4f68491d243eeaba6fa5a7d · nexlab / coderai

01 Mar, 2026 22 commits
- Add session management, readline history, context compression, --ctx, --micro... · 1de9996c
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
```
Add session management, readline history, context compression, --ctx, --micro flags, and context counter in prompt
```
  1de9996c
- Hide raw tool output unless --debug flag is specified · 55fbd847
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  55fbd847
- Add --debug flag to coder CLI, hide raw tool calls unless debug mode is enabled · 59a9eb85
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  59a9eb85
- Fix streaming display in coder CLI - use iter_lines for immediate output,... · f9efda2b
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
```
Fix streaming display in coder CLI - use iter_lines for immediate output, remove threading timer, simplify tool parsing
```
  f9efda2b
- Add --endpoint, --token, and --model CLI arguments for temporary overrides · 81da8cc5
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  81da8cc5
- Fix tool_call regex to handle multiline JSON · ade8d849
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  ade8d849
- Add XML tool format parser and filter tools from thinking display · 30f3e8a0
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  30f3e8a0
- Fix thinking display to update on every chunk and fix timer thread · e4cb426b
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  e4cb426b
- Fix thinking display with timer thread and parse tool_call tags from content · bcd150e2
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  bcd150e2
- Fix thinking display to use single line with proper timer updates · 0c31d3fd
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  0c31d3fd
- Add .gitignore and remove cached files · 0d76e514
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  0d76e514
- Add tool confirmation and fix thinking display in coder CLI · 09edf3bd
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  09edf3bd
- Add visual separator and multiline input support · 55810d7b
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  55810d7b
- Show thinking as single self-overwriting line with timer · fc4f93f7
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  fc4f93f7
- Add colorful CLI with CoderCLI> prompt and /command shortcuts · 7e0e358b
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  7e0e358b
- Add --small and --tiny args, show thinking content with timer · 8eee7e27
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  8eee7e27
- Add --timeout arg (default 600s) and graceful thinking display · 19452eb8
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  19452eb8
- Fix CLI streaming to use iter_content with smaller chunks for real-time output · dc604d6c
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  dc604d6c
- Collect all chunks in thread pool before yielding to avoid generator issues · 47738566
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  47738566
- Simplify async streaming using run_in_executor instead of manual thread · bf2b3b0a
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  bf2b3b0a
- Run llama.cpp stream in background thread to prevent blocking · 8af5cf90
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  8af5cf90
- Add explicit flush after yielding each stream chunk · a9d112b5
  Stefy Lanza (nextime / spora ) authored Mar 01, 2026
  
  a9d112b5
28 Feb, 2026 18 commits

Fix indentation error in generate_chat_stream function · 8a4458a3
Stefy Lanza (nextime / spora ) authored Feb 28, 2026

8a4458a3
Fix broken generate_chat_stream function from incomplete edit · a63dee34
Stefy Lanza (nextime / spora ) authored Feb 28, 2026

a63dee34
Add more debugging to track llama.cpp streaming response · 1b47f3ff
Stefy Lanza (nextime / spora ) authored Feb 28, 2026

1b47f3ff
Fix content filter stripping single newlines · b341f96a
Stefy Lanza (nextime / spora ) authored Feb 28, 2026

b341f96a
Add debug output to diagnose empty responses · 837c429c
Stefy Lanza (nextime / spora ) authored Feb 28, 2026

837c429c

Use llama.cpp's create_chat_completion for proper chat template handling · 7947fb75

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Add generate_chat() and generate_chat_stream() methods to VulkanBackend
- These use create_chat_completion() which properly applies model's chat template
- Fallback to manual formatting if create_chat_completion fails
- Update API endpoints to pass messages dict directly instead of formatted prompt
- Fixes garbled output with Qwen3 and other models that use custom chat templates

7947fb75

Fix Vulkan backend to use llama.cpp's built-in chat template · eea67af6

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Use apply_chat_template() to properly format messages for each model
- This ensures Qwen3 and other models get their correct chat format
- Fallback to <|im_start|>/|im_end|> format if apply_chat_template fails
- Fixes garbled output with <|system|> tags appearing in responses

eea67af6

Add --tiny flag for tiny model support in coder CLI · d78e5bd7

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Add --tiny command line flag for models under 3B parameters
- Add 'tiny' field to config file (can be set via config or CLI)
- Add TINY_MODEL_SYSTEM_PROMPT with simplified instructions
- Emphasizes spacing rules for models that produce garbled output
- Shorter, more direct system prompt for limited context windows

d78e5bd7

Added cli · 74db6667
Stefy Lanza (nextime / spora ) authored Feb 28, 2026

74db6667

Add --system-prompt flag to coderai and coder CLI tool · f5c0aa0b

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Add --system-prompt flag to coderai for optional system prompt injection
- System prompt is only sent when flag is provided (no default)
- Supports --system-prompt (default text) or --system-prompt 'custom text'
- Add coder CLI tool for interactive chat with file editing
- Add requests dependency for CLI tool

f5c0aa0b

Implement smart memory management with 3-tier offloading · be8bac00

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

Add _get_gpu_memory_map() to configure optimal memory strategy:
- GPU: 95% of available VRAM (leaves 5% for CUDA overhead)
- CPU: Up to user-specified limit (--ram) or auto-detected
- Disk: Only as last resort when GPU+CPU are full

Update --ram help text to clarify it's the CPU offloading limit.

This provides better performance by prioritizing GPU, then CPU,
and only using slow disk offloading when absolutely necessary.

be8bac00

Fix tool calling: handle nested XML format from Kimi K2.5 model · 4837efb0

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Add _parse_nested_xml_tool() to extract tool calls from nested XML
- Add _xml_to_dict() helper for recursive XML to dict conversion
- Update extract_tool_calls() to try both JSON and nested XML formats
- Improve system prompt with clearer tool format instructions and examples

This fixes the issue where models outputting raw XML tool syntax
(like Kimi K2.5) would have their tool calls end up in the response
text instead of being properly parsed.

4837efb0

fix: support multipart content arrays in ChatMessage model for KiloCode compatibility · d9608d7c

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Updated ChatMessage.content to accept Union[str, List[Dict]]
- Added field_validator to convert multipart content arrays to strings
- Handles modern OpenAI API format where content is an array of objects
- Fixes 422 validation errors with clients like KiloCode that send multipart messages

d9608d7c

docs: update README to document Intel GPU support via Vulkan backend · f985ab5c

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Updated main description to include Intel GPUs
- Expanded features section to list Intel as a supported backend
- Updated prerequisites to explain Vulkan works with Intel iGPUs and Arc
- Clarified that build.sh vulkan works for both AMD and Intel
- Added Intel-specific notes and recommendations
- Updated GPU compatibility matrix with Intel hardware
- Added performance expectations for different GPU types

f985ab5c

feat: enhanced request logging middleware to capture detailed 422 validation errors · c8abdaef

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Added detailed request body logging with truncation for large payloads
- Added JSON structure parsing to show message count and keys
- Added comprehensive error response capture for 422 errors
- Added validation error detail parsing (location, message, type)
- Added full traceback logging for exceptions during request processing
- This helps debug client compatibility issues with KiloCode

c8abdaef

Add request logging middleware to debug 422 errors · 46a2eabb
Stefy Lanza (nextime / spora ) authored Feb 28, 2026

46a2eabb

Add extra field tolerance to API request models for better client compatibility · 56496d2f

Stefy Lanza (nextime / spora ) authored Feb 28, 2026

- Added extra="allow" to ChatCompletionRequest and CompletionRequest
- Added common OpenAI fields: seed, logprobs, top_logprobs, response_format, user, best_of, echo
- This prevents 422 errors when clients send additional fields we don't use

Fixes compatibility issues with KiloCode and other OpenAI-compatible clients

56496d2f

Fix count_vulkan_devices to correctly count GPU devices and exclude CPU devices · 044036e2
Stefy Lanza (nextime / spora ) authored Feb 28, 2026

044036e2