• Stefy Lanza (nextime / spora )'s avatar
    Fix distributed worker selection and local job processing · 93a6daac
    Stefy Lanza (nextime / spora ) authored
    - Enhanced cluster_master.select_worker_for_job() with more robust GPU detection:
      - Added flexible GPU info parsing with fallbacks
      - Support for incomplete GPU info structures
      - Allow CPU workers as fallback when no GPU workers available
      - Added detailed debug logging for troubleshooting worker selection
    - Fixed queue._execute_local_job() to properly poll for backend results:
      - Changed from simulate processing to actual result polling
      - Added timeout handling (10 minutes max)
      - Proper error handling for failed jobs
    - Simplified backend.handle_web_message() to use local worker routing:
      - Removed async cluster master calls that were failing
      - Use direct worker socket communication for local processing
    - These changes should resolve the 'No suitable distributed worker' issue and make local processing work properly
    
    The system now properly detects GPU workers, falls back to CPU workers if needed, and correctly processes jobs locally when distributed workers aren't available.
    93a6daac
Name
Last commit
Last update
docs Loading commit data...
templates Loading commit data...
vidai Loading commit data...
.gitignore Loading commit data...
AI.PROMPT Loading commit data...
CHANGELOG.md Loading commit data...
Dockerfile.runpod Loading commit data...
LICENSE Loading commit data...
README.md Loading commit data...
TODO.md Loading commit data...
build.bat Loading commit data...
build.sh Loading commit data...
clean.bat Loading commit data...
clean.sh Loading commit data...
create_pod.sh Loading commit data...
image.jpg Loading commit data...
requirements-cuda.txt Loading commit data...
requirements-rocm.txt Loading commit data...
requirements.txt Loading commit data...
setup.bat Loading commit data...
setup.sh Loading commit data...
start.bat Loading commit data...
test_comm.py Loading commit data...
test_runpod.py Loading commit data...
vidai.conf.sample Loading commit data...
vidai.py Loading commit data...
vidai.sh Loading commit data...