Commits · fb8ec881459a6e325af07079d76dec684a1866df · nexlab / coderai

16 Mar, 2026 15 commits

Fix known template fallback and use_manual condition for GGUF models · fb8ec881

Your Name authored Mar 16, 2026

- Directly set chat_template to known template names (qwen3, qwen, llama3, etc.)
  instead of trying to load non-existent HuggingFace tokenizers
- Add use_manual condition to use manual formatting when chat_template is set
  but hf_tokenizer is None (applies to both generate_chat and generate_chat_stream)
- This ensures GGUF models loaded from URLs with known templates use proper
  <|im_start|> formatting instead of failing on create_chat_completion

fb8ec881

Add fallback to try known chat template names when tokenizer loading fails · 8cc1af10

Your Name authored Mar 16, 2026

When HF tokenizer loading fails, try known template names based on model name:
- Qwen models: try qwen3, qwen templates
- Llama models: try llama3, llama templates
- Phi models: try phi template
- Mistral models: try mistral template

This helps when the tokenizer can't be loaded but we know the model family.

8cc1af10

Fix: Initialize model_backend_types in MultiModelManager.__init__ · cd877dc3

Your Name authored Mar 16, 2026

The model_backend_types attribute was not being initialized properly due to
incorrect indentation, causing 'MultiModelManager' object has no attribute
'model_backend_types' error when trying to load models on-demand.

cd877dc3

Add fallback for HuggingFace tokenizer loading with progressive model name shorter variants · 3479b3f0

Your Name authored Mar 16, 2026

- Add uppercase quantization suffixes (_Q4_K_M, etc.) to handle cached GGUF filenames
- Add progressive fallback to try shorter model names when tokenizer loading fails
- Example: Qwen3.5-27B-Uncensored-HauhauCS-Aggressive -> try Qwen3.5-27B-Uncensored -> Qwen3.5-27B -> Qwen3.5 -> Qwen
- Add warning when all tokenizer loading attempts fail (will use manual formatting instead)

3479b3f0

Fix: Hash prefix is 64 chars (SHA-256), add fallback for model_backend_types · 43cb91d5
Your Name authored Mar 16, 2026

43cb91d5
Fix: Remove hash prefix from cached GGUF filenames properly · 2da217a1
Your Name authored Mar 16, 2026

2da217a1
Fix: Remove hash prefix from cached GGUF filenames when extracting model name · 34648b9b
Your Name authored Mar 16, 2026

34648b9b
Fix HF tokenizer loading to check for cached local file first when model is URL · 13e81d0d
Your Name authored Mar 16, 2026

13e81d0d
Fix: Add _aggressive_vram_cleanup to MultiModelManager class · b4d3d43b
Your Name authored Mar 16, 2026

b4d3d43b
Reduce VRAM cleanup delay to 2 seconds · 6f42fbde
Your Name authored Mar 16, 2026

6f42fbde

Add aggressive VRAM cleanup for model switching · 7c150a4d

Your Name authored Mar 16, 2026

- Added _aggressive_vram_cleanup method to properly clear VRAM
- Moves model to CPU before deletion
- Deletes pipeline, vae, text_encoder, tokenizer explicitly
- Multiple rounds of gc.collect()
- Uses torch.cuda.synchronize() before clearing cache
- Increased delay to 5 seconds after cleanup

7c150a4d

Improve --hf-chat-template help text · 804dac03
Your Name authored Mar 16, 2026

804dac03

Add support for specifying chat template in --hf-chat-template · 6e794ae6

Your Name authored Mar 16, 2026

- Now can specify template directly: --hf-chat-template "model:template"
- Updated check_hf_chat_template to return tuple (should_use, template_name)
- Updated _load_huggingface_tokenizer to accept template_name parameter
- Updated README with new syntax and template examples

6e794ae6

Add auto-detect support for --hf-chat-template · e17bc553

Your Name authored Mar 16, 2026

- Added 'auto' as a valid value for --hf-chat-template
- When --hf-chat-template auto is used, it auto-detects and applies HF template to all models
- Updated README with new syntax

e17bc553

Add debug output showing raw vs escaped content · 079bd8dc
Your Name authored Mar 16, 2026

079bd8dc

15 Mar, 2026 25 commits

Make --hf-chat-template repeatable per model · 31b6480e