- 07 Oct, 2025 33 commits
-
-
Stefy Lanza (nextime / spora ) authored
- Remove analysis_backend and training_backend fields from /admin/config - These are now configured per worker in the cluster nodes interface - Clean up unused imports and form processing
-
Stefy Lanza (nextime / spora ) authored
- Add missing set_* function imports to admin.py config route - Resolve NameError when saving admin configuration
-
Stefy Lanza (nextime / spora ) authored
- Change container max-width to 95% for better use of screen space - Maintain centered layout for the cluster nodes table
-
Stefy Lanza (nextime / spora ) authored
- Show individual select forms for every worker in the driver modal - Update API to handle per-worker driver selection for local nodes - Maintain compatibility with existing backend switching logic
-
Stefy Lanza (nextime / spora ) authored
- Add --config <file> argument to load config from custom path - Modify config loader to use custom config file if specified - Fix cluster nodes interface to only show available GPU backends for workers - Differentiate between local and remote node driver selection
-
Stefy Lanza (nextime / spora ) authored
- Removed the 'Settings' link from the admin navigation menu - Settings page route and template still exist but are no longer accessible from navbar - Admin navbar now shows: Cluster Tokens, Cluster Nodes (no Settings)
-
Stefy Lanza (nextime / spora ) authored
- Added workers array to local node API response for modal population - Fixed table colspan values to match 13 columns - Removed debug console.log statements - Modal should now open and show worker driver selection options
-
Stefy Lanza (nextime / spora ) authored
- Fixed JavaScript template literal issue preventing button clicks from working - Changed from inline onclick with template variables to data attributes + event delegation - Added event listener for .set-driver-btn class buttons - Buttons now properly read hostname and token from data attributes - Modal should now open when clicking Set Driver buttons
-
Stefy Lanza (nextime / spora ) authored
- Removed brand-specific filtering that only allowed NVIDIA GPUs - Now detects any GPU that can actually perform CUDA or ROCm operations - Functional test determines if GPU should be included, not brand - GPUs are shown with correct system indices (Device 0, 1, etc.) - AMD GPUs that support ROCm will be shown if functional - CUDA GPUs from any vendor will be shown if functional
-
Stefy Lanza (nextime / spora ) authored
- Updated GPU VRAM detection to use torch.cuda.get_device_properties(i).total_memory / 1024**3 - Same method as used in /api/stats endpoint for consistency - Still filters out non-NVIDIA and non-functional GPUs - Now shows correct VRAM amounts (e.g., 24GB for RTX 3090 instead of hardcoded 8GB) - Fixed both worker-level and node-level GPU detection
-
Stefy Lanza (nextime / spora ) authored
- Added debug output to see what CUDA device names are detected - Will help identify why AMD GPU is still being counted as CUDA device - Debug output shows device names and functional test results - User can now see what devices PyTorch is detecting
-
Stefy Lanza (nextime / spora ) authored
- Modified detect_gpu_backends() to perform functional tests on GPUs - CUDA detection now verifies devices can actually perform tensor operations - ROCm detection now tests device functionality before counting - Only NVIDIA GPUs are counted for CUDA, and only functional devices - Prevents counting of non-working GPUs like old AMD cards misreported as CUDA - Example: System with old AMD GPU (device 0) + working CUDA GPU (device 1) now correctly shows only the functional CUDA GPU - Total VRAM calculation now reflects only actually usable GPUs - Both PyTorch and nvidia-smi/rocm-smi detection paths updated
-
Stefy Lanza (nextime / spora ) authored
- Modified local node GPU memory calculation to only count GPUs that are actually available for supported backends - Previously counted all GPUs in system, now only counts CUDA GPUs if CUDA is available and ROCm GPUs if ROCm is available - Fixes issue where unsupported GPUs (like old AMD GPUs without ROCm support) were incorrectly included in VRAM totals - Example: System with old AMD GPU (8GB, no ROCm) and CUDA GPU (24GB) now correctly shows 24GB total instead of 32GB - Ensures accurate GPU resource reporting in cluster nodes interface
-
Stefy Lanza (nextime / spora ) authored
- Modified modal to show individual GPU-requiring workers on each node - Allow granular driver selection (CUDA/ROCm/CPU) for each worker subprocess - Updated database schema to store driver preferences per worker (hostname + token + worker_name) - Enhanced API to handle per-worker driver setting with form field parsing - Added restart_client_worker method to cluster master for individual worker restarts - Frontend now displays worker-specific driver selection controls in modal - Maintains node-level table view while providing worker-level configuration - Supports CPU-only nodes and mixed GPU/CPU worker configurations - Backward compatible with existing single-driver preference system
-
Stefy Lanza (nextime / spora ) authored
- Fixed undefined variable 'local_gpu_backends' in api_cluster_nodes function - Properly defined local_available_backends, local_gpu_backends, and local_cpu_backends - Updated local node detection to show nodes with any available backends (GPU or CPU) - Ensured CPU-only nodes are correctly identified and displayed - Maintained backward compatibility with existing GPU-only node detection
-
Stefy Lanza (nextime / spora ) authored
- Removed GPU-only requirement for cluster client connections - CPU-only clients can now join cluster and run CPU-based workers - Master accepts all clients regardless of GPU availability - Nodes are properly marked as CPU-only when no GPUs detected - Driver selection modal supports CUDA, ROCm, and CPU backends - Local and remote workers can use any available backend (GPU or CPU) - Enhanced cluster flexibility for mixed hardware environments - CPU nodes contribute to cluster for CPU-only processing tasks - Maintains backward compatibility with existing GPU-only workflows - Clear node type identification in cluster management interface
-
Stefy Lanza (nextime / spora ) authored
- Cluster clients now refuse to connect without GPU capabilities (CUDA/ROCm) - Cluster master rejects authentication from clients without GPU backends - Local master node only appears in cluster nodes list if GPU backends are available - Master already prevented launching local worker processes without GPUs - Systems without GPUs cannot participate in distributed processing - Clear error messages when GPU requirements are not met - Maintains cluster integrity by ensuring all nodes contribute computational power
-
Stefy Lanza (nextime / spora ) authored
- Removed CPU option from driver selection (only CUDA/ROCm GPU drivers) - Set CUDA as default driver selection when available - Added available_gpu_backends field to node API responses - Frontend dynamically populates driver options based on node's available GPUs - API validation rejects non-GPU driver requests - Cluster clients only accept CUDA/ROCm backend restart commands - Improved user experience by showing only relevant driver options per node
-
Stefy Lanza (nextime / spora ) authored
- Added restart_workers command from master to clients for backend switching - Cluster clients can now restart their workers with different backends (CUDA/ROCm/CPU) - Added mixed GPU detection - nodes with both CUDA and ROCm show 'Mixed GPU Available' indicator - Clients with mixed GPUs can switch between CUDA and ROCm backends dynamically - Updated API endpoint to send restart commands to connected clients - Clients save driver preferences and restart workers immediately when changed - Graceful fallback to available backends if requested backend not available - Visual indicator for nodes capable of backend switching
-
Stefy Lanza (nextime / spora ) authored
- Display actual cluster master weight instead of 'N/A' for local node - Implement driver switching for local workers via modal popup - Add switch_local_worker_backends() function to restart workers with new backends - Update API endpoint to handle local worker driver changes - Add CPU option to driver selection modal - Local workers can now switch between CUDA, ROCm, and CPU backends dynamically - Workers are terminated and restarted with new backend configuration
-
Stefy Lanza (nextime / spora ) authored
- Added cluster_master_weight config option (default: 'auto') - Implemented weight precedence: command line > config file > default 'auto' - 'auto' mode enables automatic weight adjustment (100->0 on first client, 0->100 when all disconnect) - Explicit numeric weights disable automatic adjustment - Updated sample config file with cluster_master_weight setting - Enhanced command line parsing to accept 'auto' or numeric values - Improved startup messages to indicate weight source and behavior
-
Stefy Lanza (nextime / spora ) authored
- Added weight_explicit flag to track if --weight was specified on command line - Automatic weight changes (100->0 on first client, 0->100 on last disconnect) only apply when weight is not explicitly set - When --weight is specified, master maintains the explicit weight regardless of client connections - Updated command line help and startup messages to clarify the behavior - This allows administrators to override automatic weight management when needed
-
Stefy Lanza (nextime / spora ) authored
- Modified API to aggregate workers per node instead of showing each worker separately - Each cluster node now appears as a single row with summarized worker information - Workers column shows count and types: '2 workers - Analysis (CUDA), Training (ROCm)' - Local workers are grouped into a single 'Local Master Node' entry - Updated frontend to display worker summaries with detailed breakdown - Updated API documentation to reflect new response format with workers_summary field
-
Stefy Lanza (nextime / spora ) authored
- Detect running local worker processes on cluster master using psutil - Include local workers in cluster nodes API response with distinct styling - Show local workers with blue background and 'Local' status indicator - Display backend information (CUDA/ROCm) in worker names - Indicate that local workers require manual restart for driver changes - Update API documentation with local worker response format - Local workers show N/A for weight since they don't participate in cluster load balancing
-
Stefy Lanza (nextime / spora ) authored
- Add weight column to cluster nodes table showing load balancing weight - Set default weights: master=0, clients=100 - Update API response to include client weight - Update frontend to display weight information - Update API documentation with weight field
-
Stefy Lanza (nextime / spora ) authored
- Add --shared-dir argument to cluster_master.py and cluster_client.py - Implement shared directory file transfer for model files - Falls back to websocket transfer if shared directory unavailable - Update cluster client to handle model_shared_file messages - Add documentation for shared directory feature in architecture.md - Maintain backward compatibility with existing websocket transfers
-
Stefy Lanza (nextime / spora ) authored
- Add uptime calculation for cluster nodes and master - Include active/completed job counts per node and totals for master - Display cluster master statistics before the nodes list - Update API response format with master_stats and node-level metrics - Add uptime formatting and job statistics to frontend - Update API documentation with new response structure
-
Stefy Lanza (nextime / spora ) authored
- Add hostname passing from cluster client to master - Create client_driver_preferences database table for storing driver preferences - Add /admin/cluster_nodes page with auto-updating node list - Add API endpoints for fetching nodes and setting driver preferences - Update admin navbar and API documentation - Apply database migrations
-
Stefy Lanza (nextime / spora ) authored
Implement secure websockets for cluster master and client with auto-generated self-signed certificates
-
Stefy Lanza (nextime / spora ) authored
Show all defaults in /admin/config if not set in database, hide configs set by config file/CLI/env, add redis config
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
Modify /admin/config to show only database-set configurations, excluding database and network configs
-
Stefy Lanza (nextime / spora ) authored
-
- 06 Oct, 2025 7 commits
-
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- Modified analysis page to only show stats sidebar for admin users - Enhanced /api/stats endpoint to include cluster information for admins - Added GPU backend detection summary to stats - Updated JavaScript to display comprehensive system and cluster stats - Stats now show local resource usage and cluster status for administrators Note: Full job-specific worker stats (showing resources from the machine executing each specific job) would require additional development to track job-to-worker mappings and implement worker resource reporting.
-
Stefy Lanza (nextime / spora ) authored
- Added --weight parameter to client connections (default: 100) - Modified cluster master to prioritize GPU-enabled clients for job distribution - GPU clients always get precedence over CPU-only clients - When no GPU workers have required model, GPU clients still preferred for model distribution - Client weights are combined with process weights for load balancing - Higher weight = more jobs assigned to that client Job distribution priority: 1. GPU clients with required model already loaded 2. CPU clients with required model already loaded 3. GPU clients (model will be sent) 4. CPU clients (model will be sent) Within each category, clients are selected based on combined weight.
-
Stefy Lanza (nextime / spora ) authored
- Added --no-gpu command line flag - When --no-gpu is specified or no GPUs are detected, local worker processes are not started - This allows running vidai as a cluster master without local GPU processing - Useful for dedicated cluster master nodes that only manage remote clients
-
Stefy Lanza (nextime / spora ) authored
- Add GPU detection utility functions in compat.py - Modify vidai.py to detect GPUs at startup and configure backends - Update cluster_client.py to detect GPUs and send capabilities to master - Modify cluster_master.py to handle client capabilities and model distribution - Update config.html template to dynamically show/hide backend options - Update web.py config route to handle dynamic backend availability - Add model file transfer functionality between master and clients - Update worker processes to handle model downloads from master - Test GPU detection and configuration - Update API documentation for new capabilities Features implemented: - Automatic detection of NVIDIA CUDA and AMD ROCm GPUs - Dynamic configuration of analysis/training backends based on available hardware - Cluster clients report GPU capabilities to master - Model distribution from master to clients when needed - Admin config page hides unavailable backend options - Updated API documentation reflecting new GPU detection capabilities
-