- 08 Oct, 2025 40 commits
-
-
Stefy Lanza (nextime / spora ) authored
- Convert queue_id to int in assign_job_to_worker and cancel_job - Fixes type mismatch issues between str and int comparisons - Ensures active_jobs lookup works correctly
-
Stefy Lanza (nextime / spora ) authored
- Update global cluster_master instance for cross-module access - Allows queue manager to access active jobs for cancellation - Enables proper job_id logging in cancellation messages
-
Stefy Lanza (nextime / spora ) authored
- Assignment: 'Assigned job 1 (job_xxx) to worker' - Cancellation: 'Job 1 (job_xxx) cancelled' - Waiting: 'Job 1 waiting for available workers' (no cluster id yet) - Clear correlation between queue and cluster job IDs
-
Stefy Lanza (nextime / spora ) authored
- Show actual cluster job_id in cancellation log - More informative logging with job_xxx format - Consistent with assignment logging
-
Stefy Lanza (nextime / spora ) authored
- cancel_job logs job cancellation to console - Provides feedback when jobs are cancelled - Clean, single log message per cancellation
-
Stefy Lanza (nextime / spora ) authored
- Silent job cancellation operations - No logging for cancel commands sent to workers - Clean operation without console noise
-
Stefy Lanza (nextime / spora ) authored
- Management loop silently completes cancelled jobs - Eliminates duplicate completion messages - Cleaner logging with only cancel_job messages
-
Stefy Lanza (nextime / spora ) authored
- Prevents active_jobs from persisting across server restarts - Eliminates leftover job tracking from previous runs - Ensures clean state for each server start
-
Stefy Lanza (nextime / spora ) authored
- Only complete and log cancelled jobs that are still active - Prevents duplicate completion messages for jobs already handled - Cleaner logging for job cancellation
-
Stefy Lanza (nextime / spora ) authored
- Don't remove jobs based on age since jobs can run for hours - Leftover jobs from previous runs will be handled by restart cleanup - Prevents interrupting long-running legitimate jobs
-
Stefy Lanza (nextime / spora ) authored
- Remove active jobs older than 10 minutes to prevent accumulation - Cleans up leftover jobs from previous runs or crashes - Prevents duplicate job tracking issues
-
Stefy Lanza (nextime / spora ) authored
- Management loop only completes cancelled jobs, doesn't send duplicate cancel messages - Cancel_job sends the cancel command, management loop cleans up - Prevents duplicate cancellation logs
-
Stefy Lanza (nextime / spora ) authored
- Check if job is already assigned before assigning again - Prevents multiple active jobs for the same queue entry - Fixes duplicate cancellation attempts
-
Stefy Lanza (nextime / spora ) authored
- Pass queue_id to workers for proper cancellation detection - Workers now check correct job id for cancellation status - Workers receive effective stop commands via database polling
-
Stefy Lanza (nextime / spora ) authored
- Workers receive stop processing commands when jobs are cancelled - Ensures workers halt processing immediately on cancel - Maintains proper cleanup of resources
-
Stefy Lanza (nextime / spora ) authored
- Workers are freed immediately when jobs are cancelled - Clean up active jobs in cluster master when cancelling processing jobs - Remove unnecessary cleanup from restart (handled by cancel)
-
Stefy Lanza (nextime / spora ) authored
- Implements job cancellation by notifying workers - Sends cancel messages to local backend or remote clients - Cleans up cancelled job resources
-
Stefy Lanza (nextime / spora ) authored
- Pass queue_id to _assign_local_job method - Fix NameError when assigning local jobs
-
Stefy Lanza (nextime / spora ) authored
- Store queue_id in active_jobs tracking - Properly detect cancelled jobs by checking queue status - Clean up worker resources when jobs are cancelled - Workers become available for new jobs after cancellation
-
Stefy Lanza (nextime / spora ) authored
- Cluster master sends cancel_job messages to backend/client when jobs are cancelled - Add _handle_cancel_job to process cancel confirmations from clients - Workers can be notified faster to stop processing and free resources
-
Stefy Lanza (nextime / spora ) authored
- Restarted jobs set to 'queued' status - Cluster master looks for 'queued' jobs, sets to 'processing', then assigns - Proper job lifecycle: queued -> processing -> assigned -> completed/failed
-
Stefy Lanza (nextime / spora ) authored
- Restarted jobs now set to 'processing' with empty job_id - Cluster master will pick up restarted jobs for assignment
-
Stefy Lanza (nextime / spora ) authored
- QueueManager now only handles job submission and management - Job processing is handled exclusively by cluster master - Eliminates duplicate queue processing between web and cluster processes
-
Stefy Lanza (nextime / spora ) authored
- Show when worker connects to backend - Show when worker registers - Help debug why jobs aren't being received
-
Stefy Lanza (nextime / spora ) authored
- Queue manager marks jobs as processing, cluster master assigns them - Changed query back to look for processing jobs without job_id
-
Stefy Lanza (nextime / spora ) authored
- Always use cluster master for job assignment, even for local jobs - Remove separate local processing path - Local processes treated same as remote, except for auto weight adjustment
-
Stefy Lanza (nextime / spora ) authored
- Allow local jobs to start even if worker socket check fails - Backend handles worker availability, queue manager should always allow local processing
-
Stefy Lanza (nextime / spora ) authored
- Add restart button in job history for cancelled jobs - Add /job/<id>/restart route in web interface - Add restart_job method in QueueManager to reset cancelled jobs to queued
-
Stefy Lanza (nextime / spora ) authored
- Worker now prints when receiving jobs and sending results - Cluster master uses TCP polling for consistency with clients
-
Stefy Lanza (nextime / spora ) authored
- Use get_queue_by_job_id to check job status - More reliable than TCP polling for local jobs
-
Stefy Lanza (nextime / spora ) authored
- Initialize self.pending_jobs dict for job monitoring tasks - Fixes AttributeError when assigning local jobs
-
Stefy Lanza (nextime / spora ) authored
- Use 'analyze_request' instead of 'analysis_request' - Match the expected message type in worker processes
-
Stefy Lanza (nextime / spora ) authored
- Change response.get('msg_type') to response.msg_type - Change response.get('data') to response.data - Message objects don't have get method, use attributes instead
-
Stefy Lanza (nextime / spora ) authored
- If --weight is not specified, master weight changes to 0 when clients connect - If --weight is specified, master participates in job selection with that weight
-
Stefy Lanza (nextime / spora ) authored
- Cluster master now participates in job selection even when clients are connected - Local workers compete with external workers based on weight and VRAM
-
Stefy Lanza (nextime / spora ) authored
- Jobs are inserted as 'queued', not 'processing' - Cluster master now finds and assigns queued jobs
-
Stefy Lanza (nextime / spora ) authored
- Shows total VRAM detected on local GPUs when registering processes
-
Stefy Lanza (nextime / spora ) authored
- Local workers now require sufficient VRAM like other workers - Since the server has 24GB VRAM and jobs need 16GB, the check passes normally
-
Stefy Lanza (nextime / spora ) authored
- Local jobs now monitor for completion and handle results - Prevents jobs from hanging without result retrieval
-
Stefy Lanza (nextime / spora ) authored
- Changed local client ID to 'local' and marked as local to prevent cleanup - Local clients are not cleaned up after 60 seconds - Prevents 'Client local disconnected' messages
-