Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Contribute to GitLab
Sign in
Toggle navigation
C
coderai
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
nexlab
coderai
Commits
320ca0e7
Commit
320ca0e7
authored
Mar 01, 2026
by
Stefy Lanza (nextime / spora )
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Change NVIDIA backend VRAM limit from 99.9% to 93% to leave more headroom for CUDA overhead
parent
2ca7368f
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
6 deletions
+6
-6
coderai
coderai
+6
-6
No files found.
coderai
View file @
320ca0e7
...
@@ -541,17 +541,17 @@ class NvidiaBackend(ModelBackend):
...
@@ -541,17 +541,17 @@ class NvidiaBackend(ModelBackend):
return
None
return
None
def
_get_gpu_memory_map
(
self
)
->
Dict
:
def
_get_gpu_memory_map
(
self
)
->
Dict
:
"""Get max_memory dict for Accelerate with 9
9.9
%
GPU limit, then CPU, then disk."""
"""Get max_memory dict for Accelerate with 9
3
%
GPU limit, then CPU, then disk."""
import
torch
import
torch
max_memory
=
{}
max_memory
=
{}
# GPU memory: 9
9.9
% of available VRAM per GPU
# GPU memory: 9
3
% of available VRAM per GPU
if
torch
.
cuda
.
is_available
():
if
torch
.
cuda
.
is_available
():
for
i
in
range
(
torch
.
cuda
.
device_count
()):
for
i
in
range
(
torch
.
cuda
.
device_count
()):
props
=
torch
.
cuda
.
get_device_properties
(
i
)
props
=
torch
.
cuda
.
get_device_properties
(
i
)
total_vram
=
props
.
total_memory
total_vram
=
props
.
total_memory
# Leave
0.1% headroom for CUDA overhead
# Leave
7% headroom for CUDA overhead (changed from 0.1% to 7%)
usable_vram
=
int
(
total_vram
*
0.9
99
)
usable_vram
=
int
(
total_vram
*
0.9
3
)
max_memory
[
i
]
=
usable_vram
max_memory
[
i
]
=
usable_vram
print
(
f
" GPU {i}: {total_vram / 1e9:.1f}GB total, {usable_vram / 1e9:.1f}GB usable"
)
print
(
f
" GPU {i}: {total_vram / 1e9:.1f}GB total, {usable_vram / 1e9:.1f}GB usable"
)
...
@@ -606,12 +606,12 @@ class NvidiaBackend(ModelBackend):
...
@@ -606,12 +606,12 @@ class NvidiaBackend(ModelBackend):
# Prepare model loading arguments
# Prepare model loading arguments
load_kwargs
=
{
'trust_remote_code'
:
True
}
load_kwargs
=
{
'trust_remote_code'
:
True
}
# Setup memory management: GPU (9
5
%) → CPU (limit) → Disk
# Setup memory management: GPU (9
3
%) → CPU (limit) → Disk
if
self
.
device
==
"cuda"
:
if
self
.
device
==
"cuda"
:
max_memory
=
self
.
_get_gpu_memory_map
()
max_memory
=
self
.
_get_gpu_memory_map
()
load_kwargs
[
'max_memory'
]
=
max_memory
load_kwargs
[
'max_memory'
]
=
max_memory
load_kwargs
[
'device_map'
]
=
'auto'
load_kwargs
[
'device_map'
]
=
'auto'
print
(
f
" Memory strategy: GPU (9
9.9
%
VRAM) → CPU → Disk"
)
print
(
f
" Memory strategy: GPU (9
3
%
VRAM) → CPU → Disk"
)
else
:
else
:
# CPU-only mode
# CPU-only mode
load_kwargs
[
'device_map'
]
=
None
load_kwargs
[
'device_map'
]
=
None
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment