- 08 Oct, 2025 40 commits
-
-
Stefy Lanza (nextime / spora ) authored
- Add restart button in job history for cancelled jobs - Add /job/<id>/restart route in web interface - Add restart_job method in QueueManager to reset cancelled jobs to queued
-
Stefy Lanza (nextime / spora ) authored
- Worker now prints when receiving jobs and sending results - Cluster master uses TCP polling for consistency with clients
-
Stefy Lanza (nextime / spora ) authored
- Use get_queue_by_job_id to check job status - More reliable than TCP polling for local jobs
-
Stefy Lanza (nextime / spora ) authored
- Initialize self.pending_jobs dict for job monitoring tasks - Fixes AttributeError when assigning local jobs
-
Stefy Lanza (nextime / spora ) authored
- Use 'analyze_request' instead of 'analysis_request' - Match the expected message type in worker processes
-
Stefy Lanza (nextime / spora ) authored
- Change response.get('msg_type') to response.msg_type - Change response.get('data') to response.data - Message objects don't have get method, use attributes instead
-
Stefy Lanza (nextime / spora ) authored
- If --weight is not specified, master weight changes to 0 when clients connect - If --weight is specified, master participates in job selection with that weight
-
Stefy Lanza (nextime / spora ) authored
- Cluster master now participates in job selection even when clients are connected - Local workers compete with external workers based on weight and VRAM
-
Stefy Lanza (nextime / spora ) authored
- Jobs are inserted as 'queued', not 'processing' - Cluster master now finds and assigns queued jobs
-
Stefy Lanza (nextime / spora ) authored
- Shows total VRAM detected on local GPUs when registering processes
-
Stefy Lanza (nextime / spora ) authored
- Local workers now require sufficient VRAM like other workers - Since the server has 24GB VRAM and jobs need 16GB, the check passes normally
-
Stefy Lanza (nextime / spora ) authored
- Local jobs now monitor for completion and handle results - Prevents jobs from hanging without result retrieval
-
Stefy Lanza (nextime / spora ) authored
- Changed local client ID to 'local' and marked as local to prevent cleanup - Local clients are not cleaned up after 60 seconds - Prevents 'Client local disconnected' messages
-
Stefy Lanza (nextime / spora ) authored
- Register local processes in cluster master when weight > 0 - Handle local job assignment by forwarding to backend via TCP - Allows jobs to run locally when no cluster clients are connected
-
Stefy Lanza (nextime / spora ) authored
- Modified cluster client to connect to backend's TCP web port instead of worker Unix socket - Backend acts as proper bridge: web interface (TCP)
↔ workers (Unix socket) - Cluster client now communicates with backend the same way as web interface - This fixes the timeout issue and ensures proper job flow through the backend -
Stefy Lanza (nextime / spora ) authored
- Added clean_queue API endpoint in web.py for admin users - Added clean_queue database function to delete all queued/processing jobs - Added Clean Queue button to admin dashboard template - Button is only visible to admin users and allows clearing stuck jobs
-
Stefy Lanza (nextime / spora ) authored
- Removed backend process startup from cluster_client.py since vidai.py already starts it for client mode - This prevents 'Address already in use' error when running as cluster client - Cluster client now only manages worker processes, not the backend
-
Stefy Lanza (nextime / spora ) authored
- Modified cluster client to start a local backend process alongside workers - Backend process handles communication between cluster client and local workers - Fixed process cleanup to properly terminate backend and worker processes - This resolves the timeout issue when cluster client forwards jobs to local backend
-
Stefy Lanza (nextime / spora ) authored
- Modified queue.py to allow retried jobs to use distributed processing when available - Fixed async coroutine warning by adding await to _transfer_job_files call - Jobs that fail on clients will now be properly re-queued for distributed processing instead of falling back to local workers that may not exist
-
Stefy Lanza (nextime / spora ) authored
- Made assign_job_to_worker, _transfer_job_files, _transfer_file_via_websocket, enable_process, disable_process, update_process_weight, restart_client_workers, and restart_client_worker async methods - Added proper exception handling for websocket send operations - When websocket send fails due to broken connection, clients are now properly removed from available workers selection - This ensures that disconnected clients are immediately removed from the worker pool and jobs are re-assigned to available workers
-
Stefy Lanza (nextime / spora ) authored
- Fixed cluster_client.py to send proper Message objects instead of dicts to backend_comm.send_message() - Modified queue.py to prevent failed jobs from being immediately re-assigned to distributed processing - Jobs with retry_count > 0 now use local processing to avoid loops with failing distributed workers
-
Stefy Lanza (nextime / spora ) authored
- Added last_status_print timestamp to QueueManager class - Modified _process_queue to only print job status messages once every 10 seconds - This prevents console spam from the queue manager when jobs are waiting for workers
-
Stefy Lanza (nextime / spora ) authored
- Calculate should_print_status once per loop iteration instead of updating timestamp inside the loop - This ensures consistent rate limiting where all job status messages are either printed together or not at all
-
Stefy Lanza (nextime / spora ) authored
- Added last_job_status_print timestamp to ClusterMaster class - Modified _management_loop to only print job status messages once every 10 seconds - This prevents console spam when jobs are waiting for workers
-
Stefy Lanza (nextime / spora ) authored
- Added consecutive_failures and failing flags to client tracking - Increment failure counter on job failures, reset on success - Mark clients as failing after 3 consecutive failures - Exclude failing clients from worker selection in all methods - Reset failure tracking when clients reconnect - This prevents problematic clients from receiving jobs until they reconnect
-
Stefy Lanza (nextime / spora ) authored
- Added missing import os at the top of vidai/cluster_client.py - Removed redundant local import os in _handle_model_transfer_complete function - This fixes the error when handling master commands on client receiving a job
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
-
Stefy Lanza (nextime / spora ) authored
- Added delete button next to 'View Result' for completed jobs in history.html - Button appears only for completed jobs and includes confirmation dialog - Uses existing /job/{job_id}/delete route which already handles ownership checks - Maintains consistent styling with other action buttons Users can now clean up their completed job history by deleting individual jobs they no longer need.
-