1. 07 Oct, 2025 33 commits
    • Stefy Lanza (nextime / spora )'s avatar
      Remove backend selection from admin config AI Settings · 2c7ae960
      Stefy Lanza (nextime / spora ) authored
      - Remove analysis_backend and training_backend fields from /admin/config
      - These are now configured per worker in the cluster nodes interface
      - Clean up unused imports and form processing
      2c7ae960
    • Stefy Lanza (nextime / spora )'s avatar
      Fix missing imports in admin config page · 02b5e4e9
      Stefy Lanza (nextime / spora ) authored
      - Add missing set_* function imports to admin.py config route
      - Resolve NameError when saving admin configuration
      02b5e4e9
    • Stefy Lanza (nextime / spora )'s avatar
      Enlarge cluster nodes page container to 95% width · a55af666
      Stefy Lanza (nextime / spora ) authored
      - Change container max-width to 95% for better use of screen space
      - Maintain centered layout for the cluster nodes table
      a55af666
    • Stefy Lanza (nextime / spora )'s avatar
      Fix cluster nodes modal to show per-worker driver selects · 044c2cf2
      Stefy Lanza (nextime / spora ) authored
      - Show individual select forms for every worker in the driver modal
      - Update API to handle per-worker driver selection for local nodes
      - Maintain compatibility with existing backend switching logic
      044c2cf2
    • Stefy Lanza (nextime / spora )'s avatar
      Add --config CLI argument and fix cluster nodes driver selection · a7d2d90e
      Stefy Lanza (nextime / spora ) authored
      - Add --config <file> argument to load config from custom path
      - Modify config loader to use custom config file if specified
      - Fix cluster nodes interface to only show available GPU backends for workers
      - Differentiate between local and remote node driver selection
      a7d2d90e
    • Stefy Lanza (nextime / spora )'s avatar
      Remove Settings page link from admin navbar · 63965769
      Stefy Lanza (nextime / spora ) authored
      - Removed the 'Settings' link from the admin navigation menu
      - Settings page route and template still exist but are no longer accessible from navbar
      - Admin navbar now shows: Cluster Tokens, Cluster Nodes (no Settings)
      63965769
    • Stefy Lanza (nextime / spora )'s avatar
      Fix Set Driver modal functionality · 1e53bfb9
      Stefy Lanza (nextime / spora ) authored
      - Added workers array to local node API response for modal population
      - Fixed table colspan values to match 13 columns
      - Removed debug console.log statements
      - Modal should now open and show worker driver selection options
      1e53bfb9
    • Stefy Lanza (nextime / spora )'s avatar
      Fix Set Driver button click handler · 314a1125
      Stefy Lanza (nextime / spora ) authored
      - Fixed JavaScript template literal issue preventing button clicks from working
      - Changed from inline onclick with template variables to data attributes + event delegation
      - Added event listener for .set-driver-btn class buttons
      - Buttons now properly read hostname and token from data attributes
      - Modal should now open when clicking Set Driver buttons
      314a1125
    • Stefy Lanza (nextime / spora )'s avatar
      Remove NVIDIA-only GPU filtering, detect all working CUDA/ROCm GPUs · 13ffc88e
      Stefy Lanza (nextime / spora ) authored
      - Removed brand-specific filtering that only allowed NVIDIA GPUs
      - Now detects any GPU that can actually perform CUDA or ROCm operations
      - Functional test determines if GPU should be included, not brand
      - GPUs are shown with correct system indices (Device 0, 1, etc.)
      - AMD GPUs that support ROCm will be shown if functional
      - CUDA GPUs from any vendor will be shown if functional
      13ffc88e
    • Stefy Lanza (nextime / spora )'s avatar
      Fix GPU VRAM detection to use correct method from /api/stats · efbb77ce
      Stefy Lanza (nextime / spora ) authored
      - Updated GPU VRAM detection to use torch.cuda.get_device_properties(i).total_memory / 1024**3
      - Same method as used in /api/stats endpoint for consistency
      - Still filters out non-NVIDIA and non-functional GPUs
      - Now shows correct VRAM amounts (e.g., 24GB for RTX 3090 instead of hardcoded 8GB)
      - Fixed both worker-level and node-level GPU detection
      efbb77ce
    • Stefy Lanza (nextime / spora )'s avatar
      Add debug logging to GPU detection · f91fafcf
      Stefy Lanza (nextime / spora ) authored
      - Added debug output to see what CUDA device names are detected
      - Will help identify why AMD GPU is still being counted as CUDA device
      - Debug output shows device names and functional test results
      - User can now see what devices PyTorch is detecting
      f91fafcf
    • Stefy Lanza (nextime / spora )'s avatar
      Fix GPU detection to only count working, functional GPUs · 056cbbf3
      Stefy Lanza (nextime / spora ) authored
      - Modified detect_gpu_backends() to perform functional tests on GPUs
      - CUDA detection now verifies devices can actually perform tensor operations
      - ROCm detection now tests device functionality before counting
      - Only NVIDIA GPUs are counted for CUDA, and only functional devices
      - Prevents counting of non-working GPUs like old AMD cards misreported as CUDA
      - Example: System with old AMD GPU (device 0) + working CUDA GPU (device 1) now correctly shows only the functional CUDA GPU
      - Total VRAM calculation now reflects only actually usable GPUs
      - Both PyTorch and nvidia-smi/rocm-smi detection paths updated
      056cbbf3
    • Stefy Lanza (nextime / spora )'s avatar
      Fix GPU VRAM detection to count only available GPUs · ffe34516
      Stefy Lanza (nextime / spora ) authored
      - Modified local node GPU memory calculation to only count GPUs that are actually available for supported backends
      - Previously counted all GPUs in system, now only counts CUDA GPUs if CUDA is available and ROCm GPUs if ROCm is available
      - Fixes issue where unsupported GPUs (like old AMD GPUs without ROCm support) were incorrectly included in VRAM totals
      - Example: System with old AMD GPU (8GB, no ROCm) and CUDA GPU (24GB) now correctly shows 24GB total instead of 32GB
      - Ensures accurate GPU resource reporting in cluster nodes interface
      ffe34516
    • Stefy Lanza (nextime / spora )'s avatar
      Implement per-worker driver selection modal · 4ca34e75
      Stefy Lanza (nextime / spora ) authored
      - Modified modal to show individual GPU-requiring workers on each node
      - Allow granular driver selection (CUDA/ROCm/CPU) for each worker subprocess
      - Updated database schema to store driver preferences per worker (hostname + token + worker_name)
      - Enhanced API to handle per-worker driver setting with form field parsing
      - Added restart_client_worker method to cluster master for individual worker restarts
      - Frontend now displays worker-specific driver selection controls in modal
      - Maintains node-level table view while providing worker-level configuration
      - Supports CPU-only nodes and mixed GPU/CPU worker configurations
      - Backward compatible with existing single-driver preference system
      4ca34e75
    • Stefy Lanza (nextime / spora )'s avatar
      Fix NameError in cluster nodes API · 5cbdab26
      Stefy Lanza (nextime / spora ) authored
      - Fixed undefined variable 'local_gpu_backends' in api_cluster_nodes function
      - Properly defined local_available_backends, local_gpu_backends, and local_cpu_backends
      - Updated local node detection to show nodes with any available backends (GPU or CPU)
      - Ensured CPU-only nodes are correctly identified and displayed
      - Maintained backward compatibility with existing GPU-only node detection
      5cbdab26
    • Stefy Lanza (nextime / spora )'s avatar
      Allow CPU-only cluster clients and flexible backend support · bd087af5
      Stefy Lanza (nextime / spora ) authored
      - Removed GPU-only requirement for cluster client connections
      - CPU-only clients can now join cluster and run CPU-based workers
      - Master accepts all clients regardless of GPU availability
      - Nodes are properly marked as CPU-only when no GPUs detected
      - Driver selection modal supports CUDA, ROCm, and CPU backends
      - Local and remote workers can use any available backend (GPU or CPU)
      - Enhanced cluster flexibility for mixed hardware environments
      - CPU nodes contribute to cluster for CPU-only processing tasks
      - Maintains backward compatibility with existing GPU-only workflows
      - Clear node type identification in cluster management interface
      bd087af5
    • Stefy Lanza (nextime / spora )'s avatar
      Enforce GPU-only cluster participation · f57a1468
      Stefy Lanza (nextime / spora ) authored
      - Cluster clients now refuse to connect without GPU capabilities (CUDA/ROCm)
      - Cluster master rejects authentication from clients without GPU backends
      - Local master node only appears in cluster nodes list if GPU backends are available
      - Master already prevented launching local worker processes without GPUs
      - Systems without GPUs cannot participate in distributed processing
      - Clear error messages when GPU requirements are not met
      - Maintains cluster integrity by ensuring all nodes contribute computational power
      f57a1468
    • Stefy Lanza (nextime / spora )'s avatar
      Restrict driver selection to available GPU backends only · abec9e31
      Stefy Lanza (nextime / spora ) authored
      - Removed CPU option from driver selection (only CUDA/ROCm GPU drivers)
      - Set CUDA as default driver selection when available
      - Added available_gpu_backends field to node API responses
      - Frontend dynamically populates driver options based on node's available GPUs
      - API validation rejects non-GPU driver requests
      - Cluster clients only accept CUDA/ROCm backend restart commands
      - Improved user experience by showing only relevant driver options per node
      abec9e31
    • Stefy Lanza (nextime / spora )'s avatar
      Enable dynamic backend switching for cluster clients with mixed GPU support · bedc1de9
      Stefy Lanza (nextime / spora ) authored
      - Added restart_workers command from master to clients for backend switching
      - Cluster clients can now restart their workers with different backends (CUDA/ROCm/CPU)
      - Added mixed GPU detection - nodes with both CUDA and ROCm show 'Mixed GPU Available' indicator
      - Clients with mixed GPUs can switch between CUDA and ROCm backends dynamically
      - Updated API endpoint to send restart commands to connected clients
      - Clients save driver preferences and restart workers immediately when changed
      - Graceful fallback to available backends if requested backend not available
      - Visual indicator for nodes capable of backend switching
      bedc1de9
    • Stefy Lanza (nextime / spora )'s avatar
      Enable driver switching for local workers and show master weight · 6b838e4a
      Stefy Lanza (nextime / spora ) authored
      - Display actual cluster master weight instead of 'N/A' for local node
      - Implement driver switching for local workers via modal popup
      - Add switch_local_worker_backends() function to restart workers with new backends
      - Update API endpoint to handle local worker driver changes
      - Add CPU option to driver selection modal
      - Local workers can now switch between CUDA, ROCm, and CPU backends dynamically
      - Workers are terminated and restarted with new backend configuration
      6b838e4a
    • Stefy Lanza (nextime / spora )'s avatar
      Add config file support for cluster master weight with 'auto' mode · fb7ad973
      Stefy Lanza (nextime / spora ) authored
      - Added cluster_master_weight config option (default: 'auto')
      - Implemented weight precedence: command line > config file > default 'auto'
      - 'auto' mode enables automatic weight adjustment (100->0 on first client, 0->100 when all disconnect)
      - Explicit numeric weights disable automatic adjustment
      - Updated sample config file with cluster_master_weight setting
      - Enhanced command line parsing to accept 'auto' or numeric values
      - Improved startup messages to indicate weight source and behavior
      fb7ad973
    • Stefy Lanza (nextime / spora )'s avatar
      Make cluster master weight auto-adjustment conditional on explicit setting · 8fedb8dc
      Stefy Lanza (nextime / spora ) authored
      - Added weight_explicit flag to track if --weight was specified on command line
      - Automatic weight changes (100->0 on first client, 0->100 on last disconnect) only apply when weight is not explicitly set
      - When --weight is specified, master maintains the explicit weight regardless of client connections
      - Updated command line help and startup messages to clarify the behavior
      - This allows administrators to override automatic weight management when needed
      8fedb8dc
    • Stefy Lanza (nextime / spora )'s avatar
      Refactor cluster nodes display to show nodes instead of individual workers · 711719c4
      Stefy Lanza (nextime / spora ) authored
      - Modified API to aggregate workers per node instead of showing each worker separately
      - Each cluster node now appears as a single row with summarized worker information
      - Workers column shows count and types: '2 workers - Analysis (CUDA), Training (ROCm)'
      - Local workers are grouped into a single 'Local Master Node' entry
      - Updated frontend to display worker summaries with detailed breakdown
      - Updated API documentation to reflect new response format with workers_summary field
      711719c4
    • Stefy Lanza (nextime / spora )'s avatar
      Add local worker processes to cluster nodes display · 27e73381
      Stefy Lanza (nextime / spora ) authored
      - Detect running local worker processes on cluster master using psutil
      - Include local workers in cluster nodes API response with distinct styling
      - Show local workers with blue background and 'Local' status indicator
      - Display backend information (CUDA/ROCm) in worker names
      - Indicate that local workers require manual restart for driver changes
      - Update API documentation with local worker response format
      - Local workers show N/A for weight since they don't participate in cluster load balancing
      27e73381
    • Stefy Lanza (nextime / spora )'s avatar
      Add client weight display to cluster nodes page · 1c9ae89a
      Stefy Lanza (nextime / spora ) authored
      - Add weight column to cluster nodes table showing load balancing weight
      - Set default weights: master=0, clients=100
      - Update API response to include client weight
      - Update frontend to display weight information
      - Update API documentation with weight field
      1c9ae89a
    • Stefy Lanza (nextime / spora )'s avatar
      Add --cluster-shared-dir option for optimized file transfers · b48679df
      Stefy Lanza (nextime / spora ) authored
      - Add --shared-dir argument to cluster_master.py and cluster_client.py
      - Implement shared directory file transfer for model files
      - Falls back to websocket transfer if shared directory unavailable
      - Update cluster client to handle model_shared_file messages
      - Add documentation for shared directory feature in architecture.md
      - Maintain backward compatibility with existing websocket transfers
      b48679df
    • Stefy Lanza (nextime / spora )'s avatar
      Enhance cluster nodes page with uptime, job stats, and master statistics · 3c309139
      Stefy Lanza (nextime / spora ) authored
      - Add uptime calculation for cluster nodes and master
      - Include active/completed job counts per node and totals for master
      - Display cluster master statistics before the nodes list
      - Update API response format with master_stats and node-level metrics
      - Add uptime formatting and job statistics to frontend
      - Update API documentation with new response structure
      3c309139
    • Stefy Lanza (nextime / spora )'s avatar
      Add admin cluster nodes page with real-time monitoring and driver preferences · 3f496bf6
      Stefy Lanza (nextime / spora ) authored
      - Add hostname passing from cluster client to master
      - Create client_driver_preferences database table for storing driver preferences
      - Add /admin/cluster_nodes page with auto-updating node list
      - Add API endpoints for fetching nodes and setting driver preferences
      - Update admin navbar and API documentation
      - Apply database migrations
      3f496bf6
    • Stefy Lanza (nextime / spora )'s avatar
      Implement secure websockets for cluster master and client with auto-generated... · c01dda41
      Stefy Lanza (nextime / spora ) authored
      Implement secure websockets for cluster master and client with auto-generated self-signed certificates
      c01dda41
    • Stefy Lanza (nextime / spora )'s avatar
      Show all defaults in /admin/config if not set in database, hide configs set by... · bb0f720a
      Stefy Lanza (nextime / spora ) authored
      Show all defaults in /admin/config if not set in database, hide configs set by config file/CLI/env, add redis config
      bb0f720a
    • Stefy Lanza (nextime / spora )'s avatar
    • Stefy Lanza (nextime / spora )'s avatar
      Modify /admin/config to show only database-set configurations, excluding... · eb0e13cb
      Stefy Lanza (nextime / spora ) authored
      Modify /admin/config to show only database-set configurations, excluding database and network configs
      eb0e13cb
    • Stefy Lanza (nextime / spora )'s avatar
  2. 06 Oct, 2025 7 commits
    • Stefy Lanza (nextime / spora )'s avatar
    • Stefy Lanza (nextime / spora )'s avatar
      e16ed66a
    • Stefy Lanza (nextime / spora )'s avatar
      Update AI.PROMPT · efb7f521
      Stefy Lanza (nextime / spora ) authored
      efb7f521
    • Stefy Lanza (nextime / spora )'s avatar
      Restrict stats display to admin users and enhance stats information · 8b202f53
      Stefy Lanza (nextime / spora ) authored
      - Modified analysis page to only show stats sidebar for admin users
      - Enhanced /api/stats endpoint to include cluster information for admins
      - Added GPU backend detection summary to stats
      - Updated JavaScript to display comprehensive system and cluster stats
      - Stats now show local resource usage and cluster status for administrators
      
      Note: Full job-specific worker stats (showing resources from the machine executing each specific job) would require additional development to track job-to-worker mappings and implement worker resource reporting.
      8b202f53
    • Stefy Lanza (nextime / spora )'s avatar
      Implement GPU prioritization and weight-based job distribution · 4f6f914d
      Stefy Lanza (nextime / spora ) authored
      - Added --weight parameter to client connections (default: 100)
      - Modified cluster master to prioritize GPU-enabled clients for job distribution
      - GPU clients always get precedence over CPU-only clients
      - When no GPU workers have required model, GPU clients still preferred for model distribution
      - Client weights are combined with process weights for load balancing
      - Higher weight = more jobs assigned to that client
      
      Job distribution priority:
      1. GPU clients with required model already loaded
      2. CPU clients with required model already loaded
      3. GPU clients (model will be sent)
      4. CPU clients (model will be sent)
      
      Within each category, clients are selected based on combined weight.
      4f6f914d
    • Stefy Lanza (nextime / spora )'s avatar
      Add --no-gpu flag to disable local worker processes · 6f92e72a
      Stefy Lanza (nextime / spora ) authored
      - Added --no-gpu command line flag
      - When --no-gpu is specified or no GPUs are detected, local worker processes are not started
      - This allows running vidai as a cluster master without local GPU processing
      - Useful for dedicated cluster master nodes that only manage remote clients
      6f92e72a
    • Stefy Lanza (nextime / spora )'s avatar
      Implement GPU detection and dynamic configuration · c98a5bf2
      Stefy Lanza (nextime / spora ) authored
      - Add GPU detection utility functions in compat.py
      - Modify vidai.py to detect GPUs at startup and configure backends
      - Update cluster_client.py to detect GPUs and send capabilities to master
      - Modify cluster_master.py to handle client capabilities and model distribution
      - Update config.html template to dynamically show/hide backend options
      - Update web.py config route to handle dynamic backend availability
      - Add model file transfer functionality between master and clients
      - Update worker processes to handle model downloads from master
      - Test GPU detection and configuration
      - Update API documentation for new capabilities
      
      Features implemented:
      - Automatic detection of NVIDIA CUDA and AMD ROCm GPUs
      - Dynamic configuration of analysis/training backends based on available hardware
      - Cluster clients report GPU capabilities to master
      - Model distribution from master to clients when needed
      - Admin config page hides unavailable backend options
      - Updated API documentation reflecting new GPU detection capabilities
      c98a5bf2