1. 07 Oct, 2025 40 commits
    • Stefy Lanza (nextime / spora )'s avatar
      Fix cluster nodes sorting error · e5e3cc98
      Stefy Lanza (nextime / spora ) authored
      - Handle timestamp strings in sorting function
      - Parse ISO timestamp strings to numeric values for proper sorting
      - Prevent TypeError when sorting by last_seen timestamp
      e5e3cc98
    • Stefy Lanza (nextime / spora )'s avatar
      Add database migration for connected_at column · 6f374101
      Stefy Lanza (nextime / spora ) authored
      - Add ALTER TABLE to create connected_at column in existing databases
      - Handle migration gracefully for databases created before schema update
      6f374101
    • Stefy Lanza (nextime / spora )'s avatar
      Fix subprocess import error in detect_gpu_backends · 44bb31c4
      Stefy Lanza (nextime / spora ) authored
      - Move subprocess import to function scope to avoid UnboundLocalError
      - Ensure subprocess is available for fallback GPU detection
      44bb31c4
    • Stefy Lanza (nextime / spora )'s avatar
      Fix cluster client uptime calculation · 5794b09d
      Stefy Lanza (nextime / spora ) authored
      - Add connected_at timestamp to track when client first connected
      - Calculate uptime from connection time instead of last seen time
      - Update database schema and API to use proper uptime tracking
      5794b09d
    • Stefy Lanza (nextime / spora )'s avatar
      Fix GPU VRAM detection for cluster clients · 1da6025d
      Stefy Lanza (nextime / spora ) authored
      - Update detect_gpu_backends to collect actual VRAM for each GPU device
      - Store device info including VRAM in gpu_info sent to master
      - Use real VRAM data in cluster nodes API instead of hardcoded values
      - Ensure consistent VRAM reporting between master and clients
      1da6025d
    • Stefy Lanza (nextime / spora )'s avatar
      Remove Status column from cluster nodes table · 97a13987
      Stefy Lanza (nextime / spora ) authored
      - Use green background color to indicate connected nodes
      - Remove status text column for cleaner interface
      - Update colspan values for table messages
      97a13987
    • Stefy Lanza (nextime / spora )'s avatar
      Update cluster nodes API to read from database · eb1870d9
      Stefy Lanza (nextime / spora ) authored
      - Store cluster client info in database for persistence
      - Update API to read connected clients from database
      - Maintain compatibility with existing web interface
      eb1870d9
    • Stefy Lanza (nextime / spora )'s avatar
      Fix method call in cluster master register processes · 4b98cda4
      Stefy Lanza (nextime / spora ) authored
      - Use _get_client_by_websocket instead of non-existent _get_client_by_socket
      - Fixes client connection error during process registration
      4b98cda4
    • Stefy Lanza (nextime / spora )'s avatar
      Remove /cluster path from websocket URI · 0fc8c705
      Stefy Lanza (nextime / spora ) authored
      - Client connects to wss://host:port instead of wss://host:port/cluster
      - Fixes connection loop issue
      0fc8c705
    • Stefy Lanza (nextime / spora )'s avatar
      Add reconnection logic to cluster client · d87215b6
      Stefy Lanza (nextime / spora ) authored
      - Client now attempts to reconnect if connection is lost
      - Prevents processes from being restarted on reconnection
      - Maintains persistent cluster node operation
      d87215b6
    • Stefy Lanza (nextime / spora )'s avatar
      Fix cluster client process registration dict comprehension · b0c7da40
      Stefy Lanza (nextime / spora ) authored
      - Correct the dict comprehension for registering processes with master
      - Fix duplicate entries and incorrect model assignment
      - Apply same fix to restart workers function
      b0c7da40
    • Stefy Lanza (nextime / spora )'s avatar
      Start local backend in cluster client mode · d3ac0046
      Stefy Lanza (nextime / spora ) authored
      - Workers need a local backend to connect to even in client mode
      - Add backend startup and readiness check for cluster clients
      - Ensure proper cleanup on exit
      d3ac0046
    • Stefy Lanza (nextime / spora )'s avatar
      Add cluster SSL certificates to .gitignore · 0bb73422
      Stefy Lanza (nextime / spora ) authored
      - Ignore cluster.crt and cluster.key generated certificates
      - Remove committed certificates from repository
      0bb73422
    • Stefy Lanza (nextime / spora )'s avatar
      Fix websockets handler signature for newer websockets version · 8d471b19
      Stefy Lanza (nextime / spora ) authored
      - Remove 'path' parameter from _handle_client method
      - Compatible with websockets 12+ which removed the path argument
      8d471b19
    • Stefy Lanza (nextime / spora )'s avatar
      Integrate secure websocket cluster master into main vidai.py · 81b440d2
      Stefy Lanza (nextime / spora ) authored
      - Modify ClusterMaster to accept host parameter
      - Start cluster master in vidai.py when running as master
      - Use --cluster-host and --cluster-port for websocket server binding
      - Default to 0.0.0.0:5003 for cluster master
      81b440d2
    • Stefy Lanza (nextime / spora )'s avatar
      Fix variable name conflict in admin config · 772f6213
      Stefy Lanza (nextime / spora ) authored
      - Rename 'flash' variable to 'flash_enabled' to avoid shadowing the flash() function
      - Resolve TypeError when saving admin configuration
      772f6213
    • Stefy Lanza (nextime / spora )'s avatar
      Remove backend selection from admin config AI Settings · 2c7ae960
      Stefy Lanza (nextime / spora ) authored
      - Remove analysis_backend and training_backend fields from /admin/config
      - These are now configured per worker in the cluster nodes interface
      - Clean up unused imports and form processing
      2c7ae960
    • Stefy Lanza (nextime / spora )'s avatar
      Fix missing imports in admin config page · 02b5e4e9
      Stefy Lanza (nextime / spora ) authored
      - Add missing set_* function imports to admin.py config route
      - Resolve NameError when saving admin configuration
      02b5e4e9
    • Stefy Lanza (nextime / spora )'s avatar
      Enlarge cluster nodes page container to 95% width · a55af666
      Stefy Lanza (nextime / spora ) authored
      - Change container max-width to 95% for better use of screen space
      - Maintain centered layout for the cluster nodes table
      a55af666
    • Stefy Lanza (nextime / spora )'s avatar
      Fix cluster nodes modal to show per-worker driver selects · 044c2cf2
      Stefy Lanza (nextime / spora ) authored
      - Show individual select forms for every worker in the driver modal
      - Update API to handle per-worker driver selection for local nodes
      - Maintain compatibility with existing backend switching logic
      044c2cf2
    • Stefy Lanza (nextime / spora )'s avatar
      Add --config CLI argument and fix cluster nodes driver selection · a7d2d90e
      Stefy Lanza (nextime / spora ) authored
      - Add --config <file> argument to load config from custom path
      - Modify config loader to use custom config file if specified
      - Fix cluster nodes interface to only show available GPU backends for workers
      - Differentiate between local and remote node driver selection
      a7d2d90e
    • Stefy Lanza (nextime / spora )'s avatar
      Remove Settings page link from admin navbar · 63965769
      Stefy Lanza (nextime / spora ) authored
      - Removed the 'Settings' link from the admin navigation menu
      - Settings page route and template still exist but are no longer accessible from navbar
      - Admin navbar now shows: Cluster Tokens, Cluster Nodes (no Settings)
      63965769
    • Stefy Lanza (nextime / spora )'s avatar
      Fix Set Driver modal functionality · 1e53bfb9
      Stefy Lanza (nextime / spora ) authored
      - Added workers array to local node API response for modal population
      - Fixed table colspan values to match 13 columns
      - Removed debug console.log statements
      - Modal should now open and show worker driver selection options
      1e53bfb9
    • Stefy Lanza (nextime / spora )'s avatar
      Fix Set Driver button click handler · 314a1125
      Stefy Lanza (nextime / spora ) authored
      - Fixed JavaScript template literal issue preventing button clicks from working
      - Changed from inline onclick with template variables to data attributes + event delegation
      - Added event listener for .set-driver-btn class buttons
      - Buttons now properly read hostname and token from data attributes
      - Modal should now open when clicking Set Driver buttons
      314a1125
    • Stefy Lanza (nextime / spora )'s avatar
      Remove NVIDIA-only GPU filtering, detect all working CUDA/ROCm GPUs · 13ffc88e
      Stefy Lanza (nextime / spora ) authored
      - Removed brand-specific filtering that only allowed NVIDIA GPUs
      - Now detects any GPU that can actually perform CUDA or ROCm operations
      - Functional test determines if GPU should be included, not brand
      - GPUs are shown with correct system indices (Device 0, 1, etc.)
      - AMD GPUs that support ROCm will be shown if functional
      - CUDA GPUs from any vendor will be shown if functional
      13ffc88e
    • Stefy Lanza (nextime / spora )'s avatar
      Fix GPU VRAM detection to use correct method from /api/stats · efbb77ce
      Stefy Lanza (nextime / spora ) authored
      - Updated GPU VRAM detection to use torch.cuda.get_device_properties(i).total_memory / 1024**3
      - Same method as used in /api/stats endpoint for consistency
      - Still filters out non-NVIDIA and non-functional GPUs
      - Now shows correct VRAM amounts (e.g., 24GB for RTX 3090 instead of hardcoded 8GB)
      - Fixed both worker-level and node-level GPU detection
      efbb77ce
    • Stefy Lanza (nextime / spora )'s avatar
      Add debug logging to GPU detection · f91fafcf
      Stefy Lanza (nextime / spora ) authored
      - Added debug output to see what CUDA device names are detected
      - Will help identify why AMD GPU is still being counted as CUDA device
      - Debug output shows device names and functional test results
      - User can now see what devices PyTorch is detecting
      f91fafcf
    • Stefy Lanza (nextime / spora )'s avatar
      Fix GPU detection to only count working, functional GPUs · 056cbbf3
      Stefy Lanza (nextime / spora ) authored
      - Modified detect_gpu_backends() to perform functional tests on GPUs
      - CUDA detection now verifies devices can actually perform tensor operations
      - ROCm detection now tests device functionality before counting
      - Only NVIDIA GPUs are counted for CUDA, and only functional devices
      - Prevents counting of non-working GPUs like old AMD cards misreported as CUDA
      - Example: System with old AMD GPU (device 0) + working CUDA GPU (device 1) now correctly shows only the functional CUDA GPU
      - Total VRAM calculation now reflects only actually usable GPUs
      - Both PyTorch and nvidia-smi/rocm-smi detection paths updated
      056cbbf3
    • Stefy Lanza (nextime / spora )'s avatar
      Fix GPU VRAM detection to count only available GPUs · ffe34516
      Stefy Lanza (nextime / spora ) authored
      - Modified local node GPU memory calculation to only count GPUs that are actually available for supported backends
      - Previously counted all GPUs in system, now only counts CUDA GPUs if CUDA is available and ROCm GPUs if ROCm is available
      - Fixes issue where unsupported GPUs (like old AMD GPUs without ROCm support) were incorrectly included in VRAM totals
      - Example: System with old AMD GPU (8GB, no ROCm) and CUDA GPU (24GB) now correctly shows 24GB total instead of 32GB
      - Ensures accurate GPU resource reporting in cluster nodes interface
      ffe34516
    • Stefy Lanza (nextime / spora )'s avatar
      Implement per-worker driver selection modal · 4ca34e75
      Stefy Lanza (nextime / spora ) authored
      - Modified modal to show individual GPU-requiring workers on each node
      - Allow granular driver selection (CUDA/ROCm/CPU) for each worker subprocess
      - Updated database schema to store driver preferences per worker (hostname + token + worker_name)
      - Enhanced API to handle per-worker driver setting with form field parsing
      - Added restart_client_worker method to cluster master for individual worker restarts
      - Frontend now displays worker-specific driver selection controls in modal
      - Maintains node-level table view while providing worker-level configuration
      - Supports CPU-only nodes and mixed GPU/CPU worker configurations
      - Backward compatible with existing single-driver preference system
      4ca34e75
    • Stefy Lanza (nextime / spora )'s avatar
      Fix NameError in cluster nodes API · 5cbdab26
      Stefy Lanza (nextime / spora ) authored
      - Fixed undefined variable 'local_gpu_backends' in api_cluster_nodes function
      - Properly defined local_available_backends, local_gpu_backends, and local_cpu_backends
      - Updated local node detection to show nodes with any available backends (GPU or CPU)
      - Ensured CPU-only nodes are correctly identified and displayed
      - Maintained backward compatibility with existing GPU-only node detection
      5cbdab26
    • Stefy Lanza (nextime / spora )'s avatar
      Allow CPU-only cluster clients and flexible backend support · bd087af5
      Stefy Lanza (nextime / spora ) authored
      - Removed GPU-only requirement for cluster client connections
      - CPU-only clients can now join cluster and run CPU-based workers
      - Master accepts all clients regardless of GPU availability
      - Nodes are properly marked as CPU-only when no GPUs detected
      - Driver selection modal supports CUDA, ROCm, and CPU backends
      - Local and remote workers can use any available backend (GPU or CPU)
      - Enhanced cluster flexibility for mixed hardware environments
      - CPU nodes contribute to cluster for CPU-only processing tasks
      - Maintains backward compatibility with existing GPU-only workflows
      - Clear node type identification in cluster management interface
      bd087af5
    • Stefy Lanza (nextime / spora )'s avatar
      Enforce GPU-only cluster participation · f57a1468
      Stefy Lanza (nextime / spora ) authored
      - Cluster clients now refuse to connect without GPU capabilities (CUDA/ROCm)
      - Cluster master rejects authentication from clients without GPU backends
      - Local master node only appears in cluster nodes list if GPU backends are available
      - Master already prevented launching local worker processes without GPUs
      - Systems without GPUs cannot participate in distributed processing
      - Clear error messages when GPU requirements are not met
      - Maintains cluster integrity by ensuring all nodes contribute computational power
      f57a1468
    • Stefy Lanza (nextime / spora )'s avatar
      Restrict driver selection to available GPU backends only · abec9e31
      Stefy Lanza (nextime / spora ) authored
      - Removed CPU option from driver selection (only CUDA/ROCm GPU drivers)
      - Set CUDA as default driver selection when available
      - Added available_gpu_backends field to node API responses
      - Frontend dynamically populates driver options based on node's available GPUs
      - API validation rejects non-GPU driver requests
      - Cluster clients only accept CUDA/ROCm backend restart commands
      - Improved user experience by showing only relevant driver options per node
      abec9e31
    • Stefy Lanza (nextime / spora )'s avatar
      Enable dynamic backend switching for cluster clients with mixed GPU support · bedc1de9
      Stefy Lanza (nextime / spora ) authored
      - Added restart_workers command from master to clients for backend switching
      - Cluster clients can now restart their workers with different backends (CUDA/ROCm/CPU)
      - Added mixed GPU detection - nodes with both CUDA and ROCm show 'Mixed GPU Available' indicator
      - Clients with mixed GPUs can switch between CUDA and ROCm backends dynamically
      - Updated API endpoint to send restart commands to connected clients
      - Clients save driver preferences and restart workers immediately when changed
      - Graceful fallback to available backends if requested backend not available
      - Visual indicator for nodes capable of backend switching
      bedc1de9
    • Stefy Lanza (nextime / spora )'s avatar
      Enable driver switching for local workers and show master weight · 6b838e4a
      Stefy Lanza (nextime / spora ) authored
      - Display actual cluster master weight instead of 'N/A' for local node
      - Implement driver switching for local workers via modal popup
      - Add switch_local_worker_backends() function to restart workers with new backends
      - Update API endpoint to handle local worker driver changes
      - Add CPU option to driver selection modal
      - Local workers can now switch between CUDA, ROCm, and CPU backends dynamically
      - Workers are terminated and restarted with new backend configuration
      6b838e4a
    • Stefy Lanza (nextime / spora )'s avatar
      Add config file support for cluster master weight with 'auto' mode · fb7ad973
      Stefy Lanza (nextime / spora ) authored
      - Added cluster_master_weight config option (default: 'auto')
      - Implemented weight precedence: command line > config file > default 'auto'
      - 'auto' mode enables automatic weight adjustment (100->0 on first client, 0->100 when all disconnect)
      - Explicit numeric weights disable automatic adjustment
      - Updated sample config file with cluster_master_weight setting
      - Enhanced command line parsing to accept 'auto' or numeric values
      - Improved startup messages to indicate weight source and behavior
      fb7ad973
    • Stefy Lanza (nextime / spora )'s avatar
      Make cluster master weight auto-adjustment conditional on explicit setting · 8fedb8dc
      Stefy Lanza (nextime / spora ) authored
      - Added weight_explicit flag to track if --weight was specified on command line
      - Automatic weight changes (100->0 on first client, 0->100 on last disconnect) only apply when weight is not explicitly set
      - When --weight is specified, master maintains the explicit weight regardless of client connections
      - Updated command line help and startup messages to clarify the behavior
      - This allows administrators to override automatic weight management when needed
      8fedb8dc
    • Stefy Lanza (nextime / spora )'s avatar
      Refactor cluster nodes display to show nodes instead of individual workers · 711719c4
      Stefy Lanza (nextime / spora ) authored
      - Modified API to aggregate workers per node instead of showing each worker separately
      - Each cluster node now appears as a single row with summarized worker information
      - Workers column shows count and types: '2 workers - Analysis (CUDA), Training (ROCm)'
      - Local workers are grouped into a single 'Local Master Node' entry
      - Updated frontend to display worker summaries with detailed breakdown
      - Updated API documentation to reflect new response format with workers_summary field
      711719c4
    • Stefy Lanza (nextime / spora )'s avatar
      Add local worker processes to cluster nodes display · 27e73381
      Stefy Lanza (nextime / spora ) authored
      - Detect running local worker processes on cluster master using psutil
      - Include local workers in cluster nodes API response with distinct styling
      - Show local workers with blue background and 'Local' status indicator
      - Display backend information (CUDA/ROCm) in worker names
      - Indicate that local workers require manual restart for driver changes
      - Update API documentation with local worker response format
      - Local workers show N/A for weight since they don't participate in cluster load balancing
      27e73381