Commits · 2c7ae9603270da02280b75a1ce40ed49e2410668 · SexHackMe / vidai

07 Oct, 2025 33 commits

Remove backend selection from admin config AI Settings · 2c7ae960

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Remove analysis_backend and training_backend fields from /admin/config
- These are now configured per worker in the cluster nodes interface
- Clean up unused imports and form processing

2c7ae960

Fix missing imports in admin config page · 02b5e4e9

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Add missing set_* function imports to admin.py config route
- Resolve NameError when saving admin configuration

02b5e4e9

Enlarge cluster nodes page container to 95% width · a55af666

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Change container max-width to 95% for better use of screen space
- Maintain centered layout for the cluster nodes table

a55af666

Fix cluster nodes modal to show per-worker driver selects · 044c2cf2

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Show individual select forms for every worker in the driver modal
- Update API to handle per-worker driver selection for local nodes
- Maintain compatibility with existing backend switching logic

044c2cf2

Add --config CLI argument and fix cluster nodes driver selection · a7d2d90e

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Add --config <file> argument to load config from custom path
- Modify config loader to use custom config file if specified
- Fix cluster nodes interface to only show available GPU backends for workers
- Differentiate between local and remote node driver selection

a7d2d90e

Remove Settings page link from admin navbar · 63965769

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Removed the 'Settings' link from the admin navigation menu
- Settings page route and template still exist but are no longer accessible from navbar
- Admin navbar now shows: Cluster Tokens, Cluster Nodes (no Settings)

63965769

Fix Set Driver modal functionality · 1e53bfb9

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Added workers array to local node API response for modal population
- Fixed table colspan values to match 13 columns
- Removed debug console.log statements
- Modal should now open and show worker driver selection options

1e53bfb9

Fix Set Driver button click handler · 314a1125

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Fixed JavaScript template literal issue preventing button clicks from working
- Changed from inline onclick with template variables to data attributes + event delegation
- Added event listener for .set-driver-btn class buttons
- Buttons now properly read hostname and token from data attributes
- Modal should now open when clicking Set Driver buttons

314a1125

Remove NVIDIA-only GPU filtering, detect all working CUDA/ROCm GPUs · 13ffc88e

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Removed brand-specific filtering that only allowed NVIDIA GPUs
- Now detects any GPU that can actually perform CUDA or ROCm operations
- Functional test determines if GPU should be included, not brand
- GPUs are shown with correct system indices (Device 0, 1, etc.)
- AMD GPUs that support ROCm will be shown if functional
- CUDA GPUs from any vendor will be shown if functional

13ffc88e

Fix GPU VRAM detection to use correct method from /api/stats · efbb77ce

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Updated GPU VRAM detection to use torch.cuda.get_device_properties(i).total_memory / 1024**3
- Same method as used in /api/stats endpoint for consistency
- Still filters out non-NVIDIA and non-functional GPUs
- Now shows correct VRAM amounts (e.g., 24GB for RTX 3090 instead of hardcoded 8GB)
- Fixed both worker-level and node-level GPU detection

efbb77ce

Add debug logging to GPU detection · f91fafcf

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Added debug output to see what CUDA device names are detected
- Will help identify why AMD GPU is still being counted as CUDA device
- Debug output shows device names and functional test results
- User can now see what devices PyTorch is detecting

f91fafcf

Fix GPU detection to only count working, functional GPUs · 056cbbf3

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Modified detect_gpu_backends() to perform functional tests on GPUs
- CUDA detection now verifies devices can actually perform tensor operations
- ROCm detection now tests device functionality before counting
- Only NVIDIA GPUs are counted for CUDA, and only functional devices
- Prevents counting of non-working GPUs like old AMD cards misreported as CUDA
- Example: System with old AMD GPU (device 0) + working CUDA GPU (device 1) now correctly shows only the functional CUDA GPU
- Total VRAM calculation now reflects only actually usable GPUs
- Both PyTorch and nvidia-smi/rocm-smi detection paths updated

056cbbf3

Fix GPU VRAM detection to count only available GPUs · ffe34516

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Modified local node GPU memory calculation to only count GPUs that are actually available for supported backends
- Previously counted all GPUs in system, now only counts CUDA GPUs if CUDA is available and ROCm GPUs if ROCm is available
- Fixes issue where unsupported GPUs (like old AMD GPUs without ROCm support) were incorrectly included in VRAM totals
- Example: System with old AMD GPU (8GB, no ROCm) and CUDA GPU (24GB) now correctly shows 24GB total instead of 32GB
- Ensures accurate GPU resource reporting in cluster nodes interface

ffe34516

Implement per-worker driver selection modal · 4ca34e75

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Modified modal to show individual GPU-requiring workers on each node
- Allow granular driver selection (CUDA/ROCm/CPU) for each worker subprocess
- Updated database schema to store driver preferences per worker (hostname + token + worker_name)
- Enhanced API to handle per-worker driver setting with form field parsing
- Added restart_client_worker method to cluster master for individual worker restarts
- Frontend now displays worker-specific driver selection controls in modal
- Maintains node-level table view while providing worker-level configuration
- Supports CPU-only nodes and mixed GPU/CPU worker configurations
- Backward compatible with existing single-driver preference system

4ca34e75

Fix NameError in cluster nodes API · 5cbdab26

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Fixed undefined variable 'local_gpu_backends' in api_cluster_nodes function
- Properly defined local_available_backends, local_gpu_backends, and local_cpu_backends
- Updated local node detection to show nodes with any available backends (GPU or CPU)
- Ensured CPU-only nodes are correctly identified and displayed
- Maintained backward compatibility with existing GPU-only node detection

5cbdab26

Allow CPU-only cluster clients and flexible backend support · bd087af5

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Removed GPU-only requirement for cluster client connections
- CPU-only clients can now join cluster and run CPU-based workers
- Master accepts all clients regardless of GPU availability
- Nodes are properly marked as CPU-only when no GPUs detected
- Driver selection modal supports CUDA, ROCm, and CPU backends
- Local and remote workers can use any available backend (GPU or CPU)
- Enhanced cluster flexibility for mixed hardware environments
- CPU nodes contribute to cluster for CPU-only processing tasks
- Maintains backward compatibility with existing GPU-only workflows
- Clear node type identification in cluster management interface

bd087af5

Enforce GPU-only cluster participation · f57a1468

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Cluster clients now refuse to connect without GPU capabilities (CUDA/ROCm)
- Cluster master rejects authentication from clients without GPU backends
- Local master node only appears in cluster nodes list if GPU backends are available
- Master already prevented launching local worker processes without GPUs
- Systems without GPUs cannot participate in distributed processing
- Clear error messages when GPU requirements are not met
- Maintains cluster integrity by ensuring all nodes contribute computational power

f57a1468

Restrict driver selection to available GPU backends only · abec9e31

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Removed CPU option from driver selection (only CUDA/ROCm GPU drivers)
- Set CUDA as default driver selection when available
- Added available_gpu_backends field to node API responses
- Frontend dynamically populates driver options based on node's available GPUs
- API validation rejects non-GPU driver requests
- Cluster clients only accept CUDA/ROCm backend restart commands
- Improved user experience by showing only relevant driver options per node

abec9e31

Enable dynamic backend switching for cluster clients with mixed GPU support · bedc1de9

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Added restart_workers command from master to clients for backend switching
- Cluster clients can now restart their workers with different backends (CUDA/ROCm/CPU)
- Added mixed GPU detection - nodes with both CUDA and ROCm show 'Mixed GPU Available' indicator
- Clients with mixed GPUs can switch between CUDA and ROCm backends dynamically
- Updated API endpoint to send restart commands to connected clients
- Clients save driver preferences and restart workers immediately when changed
- Graceful fallback to available backends if requested backend not available
- Visual indicator for nodes capable of backend switching

bedc1de9

Enable driver switching for local workers and show master weight · 6b838e4a

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Display actual cluster master weight instead of 'N/A' for local node
- Implement driver switching for local workers via modal popup
- Add switch_local_worker_backends() function to restart workers with new backends
- Update API endpoint to handle local worker driver changes
- Add CPU option to driver selection modal
- Local workers can now switch between CUDA, ROCm, and CPU backends dynamically
- Workers are terminated and restarted with new backend configuration

6b838e4a

Add config file support for cluster master weight with 'auto' mode · fb7ad973

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Added cluster_master_weight config option (default: 'auto')
- Implemented weight precedence: command line > config file > default 'auto'
- 'auto' mode enables automatic weight adjustment (100->0 on first client, 0->100 when all disconnect)
- Explicit numeric weights disable automatic adjustment
- Updated sample config file with cluster_master_weight setting
- Enhanced command line parsing to accept 'auto' or numeric values
- Improved startup messages to indicate weight source and behavior

fb7ad973

Make cluster master weight auto-adjustment conditional on explicit setting · 8fedb8dc

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Added weight_explicit flag to track if --weight was specified on command line
- Automatic weight changes (100->0 on first client, 0->100 on last disconnect) only apply when weight is not explicitly set
- When --weight is specified, master maintains the explicit weight regardless of client connections
- Updated command line help and startup messages to clarify the behavior
- This allows administrators to override automatic weight management when needed

8fedb8dc

Refactor cluster nodes display to show nodes instead of individual workers · 711719c4

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Modified API to aggregate workers per node instead of showing each worker separately
- Each cluster node now appears as a single row with summarized worker information
- Workers column shows count and types: '2 workers - Analysis (CUDA), Training (ROCm)'
- Local workers are grouped into a single 'Local Master Node' entry
- Updated frontend to display worker summaries with detailed breakdown
- Updated API documentation to reflect new response format with workers_summary field

711719c4

Add local worker processes to cluster nodes display · 27e73381

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Detect running local worker processes on cluster master using psutil
- Include local workers in cluster nodes API response with distinct styling
- Show local workers with blue background and 'Local' status indicator
- Display backend information (CUDA/ROCm) in worker names
- Indicate that local workers require manual restart for driver changes
- Update API documentation with local worker response format
- Local workers show N/A for weight since they don't participate in cluster load balancing

27e73381

Add client weight display to cluster nodes page · 1c9ae89a

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Add weight column to cluster nodes table showing load balancing weight
- Set default weights: master=0, clients=100
- Update API response to include client weight
- Update frontend to display weight information
- Update API documentation with weight field

1c9ae89a

Add --cluster-shared-dir option for optimized file transfers · b48679df

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Add --shared-dir argument to cluster_master.py and cluster_client.py
- Implement shared directory file transfer for model files
- Falls back to websocket transfer if shared directory unavailable
- Update cluster client to handle model_shared_file messages
- Add documentation for shared directory feature in architecture.md
- Maintain backward compatibility with existing websocket transfers

b48679df

Enhance cluster nodes page with uptime, job stats, and master statistics · 3c309139

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Add uptime calculation for cluster nodes and master
- Include active/completed job counts per node and totals for master
- Display cluster master statistics before the nodes list
- Update API response format with master_stats and node-level metrics
- Add uptime formatting and job statistics to frontend
- Update API documentation with new response structure

3c309139

Add admin cluster nodes page with real-time monitoring and driver preferences · 3f496bf6

Stefy Lanza (nextime / spora ) authored Oct 07, 2025

- Add hostname passing from cluster client to master
- Create client_driver_preferences database table for storing driver preferences
- Add /admin/cluster_nodes page with auto-updating node list
- Add API endpoints for fetching nodes and setting driver preferences
- Update admin navbar and API documentation
- Apply database migrations

3f496bf6

Implement secure websockets for cluster master and client with auto-generated... · c01dda41
Stefy Lanza (nextime / spora ) authored Oct 07, 2025
```
Implement secure websockets for cluster master and client with auto-generated self-signed certificates
```
c01dda41
Show all defaults in /admin/config if not set in database, hide configs set by... · bb0f720a
Stefy Lanza (nextime / spora ) authored Oct 07, 2025
```
Show all defaults in /admin/config if not set in database, hide configs set by config file/CLI/env, add redis config
```
bb0f720a
Add all database config options except db_* to /admin/config page · 3af9ce3d
Stefy Lanza (nextime / spora ) authored Oct 07, 2025

3af9ce3d
Modify /admin/config to show only database-set configurations, excluding... · eb0e13cb
Stefy Lanza (nextime / spora ) authored Oct 07, 2025
```
Modify /admin/config to show only database-set configurations, excluding database and network configs
```
eb0e13cb
Move /config to /admin/config and rewrite to show all database configurations grouped by type · bc318365
Stefy Lanza (nextime / spora ) authored Oct 07, 2025

bc318365

06 Oct, 2025 7 commits

Invert content of settings and configuration pages · 9d7fff40
Stefy Lanza (nextime / spora ) authored Oct 06, 2025

9d7fff40
Add navigation header to /config page · e16ed66a
Stefy Lanza (nextime / spora ) authored Oct 06, 2025

e16ed66a
Update AI.PROMPT · efb7f521
Stefy Lanza (nextime / spora ) authored Oct 06, 2025

efb7f521

Restrict stats display to admin users and enhance stats information · 8b202f53

Stefy Lanza (nextime / spora ) authored Oct 06, 2025

- Modified analysis page to only show stats sidebar for admin users
- Enhanced /api/stats endpoint to include cluster information for admins
- Added GPU backend detection summary to stats
- Updated JavaScript to display comprehensive system and cluster stats
- Stats now show local resource usage and cluster status for administrators

Note: Full job-specific worker stats (showing resources from the machine executing each specific job) would require additional development to track job-to-worker mappings and implement worker resource reporting.

8b202f53

Implement GPU prioritization and weight-based job distribution · 4f6f914d

Stefy Lanza (nextime / spora ) authored Oct 06, 2025

- Added --weight parameter to client connections (default: 100)
- Modified cluster master to prioritize GPU-enabled clients for job distribution
- GPU clients always get precedence over CPU-only clients
- When no GPU workers have required model, GPU clients still preferred for model distribution
- Client weights are combined with process weights for load balancing
- Higher weight = more jobs assigned to that client

Job distribution priority:
1. GPU clients with required model already loaded
2. CPU clients with required model already loaded
3. GPU clients (model will be sent)
4. CPU clients (model will be sent)

Within each category, clients are selected based on combined weight.

4f6f914d

Add --no-gpu flag to disable local worker processes · 6f92e72a

Stefy Lanza (nextime / spora ) authored Oct 06, 2025

- Added --no-gpu command line flag
- When --no-gpu is specified or no GPUs are detected, local worker processes are not started
- This allows running vidai as a cluster master without local GPU processing
- Useful for dedicated cluster master nodes that only manage remote clients

6f92e72a

Implement GPU detection and dynamic configuration · c98a5bf2

Stefy Lanza (nextime / spora ) authored Oct 06, 2025

- Add GPU detection utility functions in compat.py
- Modify vidai.py to detect GPUs at startup and configure backends
- Update cluster_client.py to detect GPUs and send capabilities to master
- Modify cluster_master.py to handle client capabilities and model distribution
- Update config.html template to dynamically show/hide backend options
- Update web.py config route to handle dynamic backend availability
- Add model file transfer functionality between master and clients
- Update worker processes to handle model downloads from master
- Test GPU detection and configuration
- Update API documentation for new capabilities

Features implemented:
- Automatic detection of NVIDIA CUDA and AMD ROCm GPUs
- Dynamic configuration of analysis/training backends based on available hardware
- Cluster clients report GPU capabilities to master
- Model distribution from master to clients when needed
- Admin config page hides unavailable backend options
- Updated API documentation reflecting new GPU detection capabilities

c98a5bf2