{% extends "base.html" %} {% block wrapper_class %}container container--full{% endblock %} {% block title %}Models — CoderAI{% endblock %} {% block head %} {% endblock %} {% block content %}
Configure and download AI models
VAEs, text encoders (T5-XXL, CLIP), and other standalone files used alongside main models.
Create local audio models backed by dedicated whisper-server subprocess configurations.
Note: KV-cache quantization only applies to GGUF models on the llama.cpp backend. HF-transformers models (incl. gemma and other sliding-window / hybrid linear-attention architectures) ignore this and keep an fp16 KV cache — their quantized-cache path crashes during generation. For those, lower n_ctx to shrink the KV VRAM reserve instead.
4-bit + text encoder 8-bit + VAE None = best quality-per-GB.