Implement smart memory management with 3-tier offloading
Add _get_gpu_memory_map() to configure optimal memory strategy: - GPU: 95% of available VRAM (leaves 5% for CUDA overhead) - CPU: Up to user-specified limit (--ram) or auto-detected - Disk: Only as last resort when GPU+CPU are full Update --ram help text to clarify it's the CPU offloading limit. This provides better performance by prioritizing GPU, then CPU, and only using slow disk offloading when absolutely necessary.
Showing
Please
register
or
sign in
to comment