- 07 Oct, 2025 40 commits
-
-
Stefy Lanza (nextime / spora ) authored
- Store connected_at as proper UTC timestamp in database using FROM_UNIXTIME/datetime - Update web interface to handle datetime objects and timestamps correctly - Ensure uptime starts from actual connection time, not offset by timezone
-
Stefy Lanza (nextime / spora ) authored
- Fix cluster master to properly detect GPU backends from client capabilities - Extract available_backends from capabilities list instead of gpu_info - Ensure clients with GPU workers are correctly identified as GPU-enabled
-
Stefy Lanza (nextime / spora ) authored
- Fix cluster client uptime calculation to start from 0 by using explicit UTC timestamps - Fix cluster client workers not showing by populating cluster_master.clients dictionary - Ensure connected_at uses current UTC time instead of database CURRENT_TIMESTAMP
-
Stefy Lanza (nextime / spora ) authored
- Get real worker process information from cluster master instead of placeholder data - Display correct number of workers and their actual backends - Improve accuracy of cluster node statistics
-
Stefy Lanza (nextime / spora ) authored
- Update connected_at timestamp on each successful connection - Uptime now resets to 00:00:00 on reconnections as expected
-
Stefy Lanza (nextime / spora ) authored
- Extract real client IP address from websocket connection - Preserve connected_at timestamp for accurate uptime calculation - Send full GPU device info from client to master for proper VRAM reporting
-
Stefy Lanza (nextime / spora ) authored
- Handle timestamp strings in sorting function - Parse ISO timestamp strings to numeric values for proper sorting - Prevent TypeError when sorting by last_seen timestamp
-
Stefy Lanza (nextime / spora ) authored
- Add ALTER TABLE to create connected_at column in existing databases - Handle migration gracefully for databases created before schema update
-
Stefy Lanza (nextime / spora ) authored
- Move subprocess import to function scope to avoid UnboundLocalError - Ensure subprocess is available for fallback GPU detection
-
Stefy Lanza (nextime / spora ) authored
- Add connected_at timestamp to track when client first connected - Calculate uptime from connection time instead of last seen time - Update database schema and API to use proper uptime tracking
-
Stefy Lanza (nextime / spora ) authored
- Update detect_gpu_backends to collect actual VRAM for each GPU device - Store device info including VRAM in gpu_info sent to master - Use real VRAM data in cluster nodes API instead of hardcoded values - Ensure consistent VRAM reporting between master and clients
-
Stefy Lanza (nextime / spora ) authored
- Use green background color to indicate connected nodes - Remove status text column for cleaner interface - Update colspan values for table messages
-
Stefy Lanza (nextime / spora ) authored
- Store cluster client info in database for persistence - Update API to read connected clients from database - Maintain compatibility with existing web interface
-
Stefy Lanza (nextime / spora ) authored
- Use _get_client_by_websocket instead of non-existent _get_client_by_socket - Fixes client connection error during process registration
-
Stefy Lanza (nextime / spora ) authored
- Client connects to wss://host:port instead of wss://host:port/cluster - Fixes connection loop issue
-
Stefy Lanza (nextime / spora ) authored
- Client now attempts to reconnect if connection is lost - Prevents processes from being restarted on reconnection - Maintains persistent cluster node operation
-
Stefy Lanza (nextime / spora ) authored
- Correct the dict comprehension for registering processes with master - Fix duplicate entries and incorrect model assignment - Apply same fix to restart workers function
-
Stefy Lanza (nextime / spora ) authored
- Workers need a local backend to connect to even in client mode - Add backend startup and readiness check for cluster clients - Ensure proper cleanup on exit
-
Stefy Lanza (nextime / spora ) authored
- Ignore cluster.crt and cluster.key generated certificates - Remove committed certificates from repository
-
Stefy Lanza (nextime / spora ) authored
- Remove 'path' parameter from _handle_client method - Compatible with websockets 12+ which removed the path argument
-
Stefy Lanza (nextime / spora ) authored
- Modify ClusterMaster to accept host parameter - Start cluster master in vidai.py when running as master - Use --cluster-host and --cluster-port for websocket server binding - Default to 0.0.0.0:5003 for cluster master
-
Stefy Lanza (nextime / spora ) authored
- Rename 'flash' variable to 'flash_enabled' to avoid shadowing the flash() function - Resolve TypeError when saving admin configuration
-
Stefy Lanza (nextime / spora ) authored
- Remove analysis_backend and training_backend fields from /admin/config - These are now configured per worker in the cluster nodes interface - Clean up unused imports and form processing
-
Stefy Lanza (nextime / spora ) authored
- Add missing set_* function imports to admin.py config route - Resolve NameError when saving admin configuration
-
Stefy Lanza (nextime / spora ) authored
- Change container max-width to 95% for better use of screen space - Maintain centered layout for the cluster nodes table
-
Stefy Lanza (nextime / spora ) authored
- Show individual select forms for every worker in the driver modal - Update API to handle per-worker driver selection for local nodes - Maintain compatibility with existing backend switching logic
-
Stefy Lanza (nextime / spora ) authored
- Add --config <file> argument to load config from custom path - Modify config loader to use custom config file if specified - Fix cluster nodes interface to only show available GPU backends for workers - Differentiate between local and remote node driver selection
-
Stefy Lanza (nextime / spora ) authored
- Removed the 'Settings' link from the admin navigation menu - Settings page route and template still exist but are no longer accessible from navbar - Admin navbar now shows: Cluster Tokens, Cluster Nodes (no Settings)
-
Stefy Lanza (nextime / spora ) authored
- Added workers array to local node API response for modal population - Fixed table colspan values to match 13 columns - Removed debug console.log statements - Modal should now open and show worker driver selection options
-
Stefy Lanza (nextime / spora ) authored
- Fixed JavaScript template literal issue preventing button clicks from working - Changed from inline onclick with template variables to data attributes + event delegation - Added event listener for .set-driver-btn class buttons - Buttons now properly read hostname and token from data attributes - Modal should now open when clicking Set Driver buttons
-
Stefy Lanza (nextime / spora ) authored
- Removed brand-specific filtering that only allowed NVIDIA GPUs - Now detects any GPU that can actually perform CUDA or ROCm operations - Functional test determines if GPU should be included, not brand - GPUs are shown with correct system indices (Device 0, 1, etc.) - AMD GPUs that support ROCm will be shown if functional - CUDA GPUs from any vendor will be shown if functional
-
Stefy Lanza (nextime / spora ) authored
- Updated GPU VRAM detection to use torch.cuda.get_device_properties(i).total_memory / 1024**3 - Same method as used in /api/stats endpoint for consistency - Still filters out non-NVIDIA and non-functional GPUs - Now shows correct VRAM amounts (e.g., 24GB for RTX 3090 instead of hardcoded 8GB) - Fixed both worker-level and node-level GPU detection
-
Stefy Lanza (nextime / spora ) authored
- Added debug output to see what CUDA device names are detected - Will help identify why AMD GPU is still being counted as CUDA device - Debug output shows device names and functional test results - User can now see what devices PyTorch is detecting
-
Stefy Lanza (nextime / spora ) authored
- Modified detect_gpu_backends() to perform functional tests on GPUs - CUDA detection now verifies devices can actually perform tensor operations - ROCm detection now tests device functionality before counting - Only NVIDIA GPUs are counted for CUDA, and only functional devices - Prevents counting of non-working GPUs like old AMD cards misreported as CUDA - Example: System with old AMD GPU (device 0) + working CUDA GPU (device 1) now correctly shows only the functional CUDA GPU - Total VRAM calculation now reflects only actually usable GPUs - Both PyTorch and nvidia-smi/rocm-smi detection paths updated
-
Stefy Lanza (nextime / spora ) authored
- Modified local node GPU memory calculation to only count GPUs that are actually available for supported backends - Previously counted all GPUs in system, now only counts CUDA GPUs if CUDA is available and ROCm GPUs if ROCm is available - Fixes issue where unsupported GPUs (like old AMD GPUs without ROCm support) were incorrectly included in VRAM totals - Example: System with old AMD GPU (8GB, no ROCm) and CUDA GPU (24GB) now correctly shows 24GB total instead of 32GB - Ensures accurate GPU resource reporting in cluster nodes interface
-
Stefy Lanza (nextime / spora ) authored
- Modified modal to show individual GPU-requiring workers on each node - Allow granular driver selection (CUDA/ROCm/CPU) for each worker subprocess - Updated database schema to store driver preferences per worker (hostname + token + worker_name) - Enhanced API to handle per-worker driver setting with form field parsing - Added restart_client_worker method to cluster master for individual worker restarts - Frontend now displays worker-specific driver selection controls in modal - Maintains node-level table view while providing worker-level configuration - Supports CPU-only nodes and mixed GPU/CPU worker configurations - Backward compatible with existing single-driver preference system
-
Stefy Lanza (nextime / spora ) authored
- Fixed undefined variable 'local_gpu_backends' in api_cluster_nodes function - Properly defined local_available_backends, local_gpu_backends, and local_cpu_backends - Updated local node detection to show nodes with any available backends (GPU or CPU) - Ensured CPU-only nodes are correctly identified and displayed - Maintained backward compatibility with existing GPU-only node detection
-
Stefy Lanza (nextime / spora ) authored
- Removed GPU-only requirement for cluster client connections - CPU-only clients can now join cluster and run CPU-based workers - Master accepts all clients regardless of GPU availability - Nodes are properly marked as CPU-only when no GPUs detected - Driver selection modal supports CUDA, ROCm, and CPU backends - Local and remote workers can use any available backend (GPU or CPU) - Enhanced cluster flexibility for mixed hardware environments - CPU nodes contribute to cluster for CPU-only processing tasks - Maintains backward compatibility with existing GPU-only workflows - Clear node type identification in cluster management interface
-
Stefy Lanza (nextime / spora ) authored
- Cluster clients now refuse to connect without GPU capabilities (CUDA/ROCm) - Cluster master rejects authentication from clients without GPU backends - Local master node only appears in cluster nodes list if GPU backends are available - Master already prevented launching local worker processes without GPUs - Systems without GPUs cannot participate in distributed processing - Clear error messages when GPU requirements are not met - Maintains cluster integrity by ensuring all nodes contribute computational power
-
Stefy Lanza (nextime / spora ) authored
- Removed CPU option from driver selection (only CUDA/ROCm GPU drivers) - Set CUDA as default driver selection when available - Added available_gpu_backends field to node API responses - Frontend dynamically populates driver options based on node's available GPUs - API validation rejects non-GPU driver requests - Cluster clients only accept CUDA/ROCm backend restart commands - Improved user experience by showing only relevant driver options per node
-