• Stefy Lanza (nextime / spora )'s avatar
    backend: per-model kv_offload flag to keep the KV cache in host RAM · 4754beff
    Stefy Lanza (nextime / spora ) authored
    Large contexts make the KV cache huge (a 256k q4_0 cache is several GB), which
    won't fit in VRAM alongside the weights. llama.cpp can't page KV to disk, but it
    can keep it in system RAM via --no-kv-offload. Expose that as a per-model
    kv_offload flag (default unchanged = KV in VRAM): set kv_offload=false to pass
    offload_kqv=False to llama.cpp, freeing VRAM for big contexts at the cost of
    slower decode (KV ops cross PCIe). Also allow the key in the admin model-config
    endpoint so it's persistable from the UI.
    Co-Authored-By: 's avatarClaude Opus 4.8 <noreply@anthropic.com>
    4754beff
Name
Last commit
Last update
codai Loading commit data...
docs Loading commit data...
packaging Loading commit data...
samples Loading commit data...
tests Loading commit data...
tools Loading commit data...
.dockerignore Loading commit data...
.gitignore Loading commit data...
AI.PROMPT Loading commit data...
CODERAI_API_DOCUMENTATION.md Loading commit data...
CoderAI.gif Loading commit data...
DISTRIBUTION.md Loading commit data...
LICENSE.md Loading commit data...
MULTIMODAL_CAPABILITIES.md Loading commit data...
MULTIMODAL_UI_EXAMPLES.md Loading commit data...
README.md Loading commit data...
build-oci.sh Loading commit data...
build.ps1 Loading commit data...
build.sh Loading commit data...
coderai Loading commit data...
coderai-broker-implementation-reference.md Loading commit data...
coderai-integration.md Loading commit data...
commands Loading commit data...
osxbuild.sh Loading commit data...
package-oci.sh Loading commit data...
package-tarball.sh Loading commit data...
requirements-nvidia.txt Loading commit data...
requirements-vulkan.txt Loading commit data...
requirements.txt Loading commit data...
run-oci.sh Loading commit data...
smoke-test-oci.sh Loading commit data...
todo.md Loading commit data...
video_editor.config.json Loading commit data...