• Stefy Lanza (nextime / spora )'s avatar
    Improve GPU memory detection fallback chain · c2855737
    Stefy Lanza (nextime / spora ) authored
    - Add nvidia-smi as intermediate fallback before PyTorch in GPU stats collection
    - Fallback order: pynvml -> nvidia-smi -> PyTorch
    - Applied to api.py, backend.py, and cluster_client.py GPU stats functions
    - nvidia-smi provides accurate memory usage and utilization data
    - Fix SocketCommunicator.receive_message() timeout parameter error
    - Added optional timeout parameter to receive_message method
    - Fixes 'unexpected keyword argument timeout' error in api_stats and backend functions
    c2855737
cluster_client.py 38 KB