packaging/linux/dist-bundle/install.sh · d669c7973b0d06cc128c838c5eba5cd93c61c64e · nexlab / coderai

docker/backend: graceful llama-cpp load + additive GPU modes + libcuda... · c077b7da

Stefy Lanza (nextime / spora ) authored Jun 21, 2026

docker/backend: graceful llama-cpp load + additive GPU modes + libcuda mapping; admin GGUF batch/slots tuning

Backend robustness:
- vulkan.py catches Exception (not just ImportError) around the llama_cpp
  import: a CUDA-built llama-cpp missing libcuda.so.1 raised RuntimeError/OSError
  that crash-looped the whole server. Now it logs a warning and marks the
  Vulkan/GGUF backend unavailable; CUDA/CPU/ds4 keep working.
- detect_available_backends() reads LLAMA_CPP_AVAILABLE instead of re-importing
  (which re-raised the same error).

Docker launcher (run_oci.sh):
- GPU backends are now additive: --nvidia --vulkan enables both (maps libcuda via
  --gpus all AND /dev/dri). Added --all and --with-libcuda[=PATH].
- --vulkan auto bind-mounts the host's libcuda.so.1 (the bundled llama-cpp is a
  CUDA build), so Vulkan GGUF loads without full --gpus all. Banner shows mode set
  and libcuda status.

Dist bundle:
- New uninstall.sh (removes runner + optional image), wired into make_dist_bundle.
- install.sh + uninstall.sh print what they'll do and confirm before proceeding,
  bypassable with --yes/-y.

Admin GGUF tuning:
- Expose n_batch / n_ubatch / n_seq_max (llama.cpp -b/-ub/-np) in the model config
  UI and apply them in the Vulkan backend to shrink VRAM at the ceiling; n_seq_max
  gated on llama-cpp-python support.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

c077b7da

install.sh 3.3 KB

Replace install.sh