Files · 4754beff012b9f52ec3d4635e7e73d1af9c2105a · nexlab / coderai

backend: per-model kv_offload flag to keep the KV cache in host RAM · 4754beff

Stefy Lanza (nextime / spora ) authored Jun 20, 2026

Large contexts make the KV cache huge (a 256k q4_0 cache is several GB), which
won't fit in VRAM alongside the weights. llama.cpp can't page KV to disk, but it
can keep it in system RAM via --no-kv-offload. Expose that as a per-model
kv_offload flag (default unchanged = KV in VRAM): set kv_offload=false to pass
offload_kqv=False to llama.cpp, freeing VRAM for big contexts at the cost of
slower decode (KV ops cross PCIe). Also allow the key in the admin model-config
endpoint so it's persistable from the UI.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

4754beff

Name	Last commit	Last update
codai		Loading commit data...
docs		Loading commit data...
packaging		Loading commit data...
samples		Loading commit data...
tests		Loading commit data...
tools		Loading commit data...
.dockerignore		Loading commit data...
.gitignore		Loading commit data...
AI.PROMPT		Loading commit data...
CODERAI_API_DOCUMENTATION.md		Loading commit data...
CoderAI.gif		Loading commit data...
DISTRIBUTION.md		Loading commit data...
LICENSE.md		Loading commit data...
MULTIMODAL_CAPABILITIES.md		Loading commit data...
MULTIMODAL_UI_EXAMPLES.md		Loading commit data...
README.md		Loading commit data...
build-oci.sh		Loading commit data...
build.ps1		Loading commit data...
build.sh		Loading commit data...
coderai		Loading commit data...
coderai-broker-implementation-reference.md		Loading commit data...
coderai-integration.md		Loading commit data...
commands		Loading commit data...
osxbuild.sh		Loading commit data...
package-oci.sh		Loading commit data...
package-tarball.sh		Loading commit data...
requirements-nvidia.txt		Loading commit data...
requirements-vulkan.txt		Loading commit data...
requirements.txt		Loading commit data...
run-oci.sh		Loading commit data...
smoke-test-oci.sh		Loading commit data...
todo.md		Loading commit data...
video_editor.config.json		Loading commit data...

README.md