Commits · 5014c7f854c9493edfb1d391c1b23066e4a8a779 · SexHackMe / vidai

08 Oct, 2025 40 commits

Ensure queue_id is int for proper comparison · 5014c7f8

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Convert queue_id to int in assign_job_to_worker and cancel_job
- Fixes type mismatch issues between str and int comparisons
- Ensures active_jobs lookup works correctly

5014c7f8

Fix cluster master access · acfaf858

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Update global cluster_master instance for cross-module access
- Allows queue manager to access active jobs for cancellation
- Enables proper job_id logging in cancellation messages

acfaf858

Consistent job logging format · 44af5090

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Assignment: 'Assigned job 1 (job_xxx) to worker'
- Cancellation: 'Job 1 (job_xxx) cancelled'
- Waiting: 'Job 1 waiting for available workers' (no cluster id yet)
- Clear correlation between queue and cluster job IDs

44af5090

Log cluster job_id on cancellation · 2f1e4f41

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Show actual cluster job_id in cancellation log
- More informative logging with job_xxx format
- Consistent with assignment logging

2f1e4f41

Add cancellation logging · 4e46e897

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- cancel_job logs job cancellation to console
- Provides feedback when jobs are cancelled
- Clean, single log message per cancellation

4e46e897

Remove cancel job logging · bfac94aa

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Silent job cancellation operations
- No logging for cancel commands sent to workers
- Clean operation without console noise

bfac94aa

Remove duplicate completion logs · 7585f948

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Management loop silently completes cancelled jobs
- Eliminates duplicate completion messages
- Cleaner logging with only cancel_job messages

7585f948

Create new cluster master instance per start · bf7bbb5e

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Prevents active_jobs from persisting across server restarts
- Eliminates leftover job tracking from previous runs
- Ensures clean state for each server start

bf7bbb5e

Prevent duplicate completion logs · 61fe95c5

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Only complete and log cancelled jobs that are still active
- Prevents duplicate completion messages for jobs already handled
- Cleaner logging for job cancellation

61fe95c5

Remove old job cleanup · 75fcfabe

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Don't remove jobs based on age since jobs can run for hours
- Leftover jobs from previous runs will be handled by restart cleanup
- Prevents interrupting long-running legitimate jobs

75fcfabe

Add cleanup of old active jobs · ebcbfda5

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Remove active jobs older than 10 minutes to prevent accumulation
- Cleans up leftover jobs from previous runs or crashes
- Prevents duplicate job tracking issues

ebcbfda5

Fix duplicate cancellation · 7534e5d0

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Management loop only completes cancelled jobs, doesn't send duplicate cancel messages
- Cancel_job sends the cancel command, management loop cleans up
- Prevents duplicate cancellation logs

7534e5d0

Prevent double job assignment · 76ed5870

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Check if job is already assigned before assigning again
- Prevents multiple active jobs for the same queue entry
- Fixes duplicate cancellation attempts

76ed5870

Fix worker cancellation checking · ca9a9669

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Pass queue_id to workers for proper cancellation detection
- Workers now check correct job id for cancellation status
- Workers receive effective stop commands via database polling

ca9a9669

Send cancel commands to workers · b2f2786e

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Workers receive stop processing commands when jobs are cancelled
- Ensures workers halt processing immediately on cancel
- Maintains proper cleanup of resources

b2f2786e

Fix job cancellation cleanup · 855d427a

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Workers are freed immediately when jobs are cancelled
- Clean up active jobs in cluster master when cancelling processing jobs
- Remove unnecessary cleanup from restart (handled by cancel)

855d427a

Add missing _cancel_job_processing method · c633ee4d

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Implements job cancellation by notifying workers
- Sends cancel messages to local backend or remote clients
- Cleans up cancelled job resources

c633ee4d

Fix queue_id parameter passing · a5920057
Stefy Lanza (nextime / spora ) authored Oct 08, 2025
```
- Pass queue_id to _assign_local_job method
- Fix NameError when assigning local jobs
```
a5920057

Fix job cancellation cleanup · d28295fc

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Store queue_id in active_jobs tracking
- Properly detect cancelled jobs by checking queue status
- Clean up worker resources when jobs are cancelled
- Workers become available for new jobs after cancellation

d28295fc

Add job cancellation communication · 83825820

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Cluster master sends cancel_job messages to backend/client when jobs are cancelled
- Add _handle_cancel_job to process cancel confirmations from clients
- Workers can be notified faster to stop processing and free resources

83825820

Fix job restart flow · 15dd9ddc

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Restarted jobs set to 'queued' status
- Cluster master looks for 'queued' jobs, sets to 'processing', then assigns
- Proper job lifecycle: queued -> processing -> assigned -> completed/failed

15dd9ddc

Fix job restart to set status to processing · 21d81a1c

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Restarted jobs now set to 'processing' with empty job_id
- Cluster master will pick up restarted jobs for assignment

21d81a1c

Remove queue processing from QueueManager · 2071e268

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- QueueManager now only handles job submission and management
- Job processing is handled exclusively by cluster master
- Eliminates duplicate queue processing between web and cluster processes

2071e268

Add debug prints to worker connection and registration · c243e798
Stefy Lanza (nextime / spora ) authored Oct 08, 2025
```
- Show when worker connects to backend
- Show when worker registers
- Help debug why jobs aren't being received
```
c243e798

Fix cluster master to look for processing jobs · a2e19c42

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Queue manager marks jobs as processing, cluster master assigns them
- Changed query back to look for processing jobs without job_id

a2e19c42

Unify job processing through cluster master · 668c482a

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Always use cluster master for job assignment, even for local jobs
- Remove separate local processing path
- Local processes treated same as remote, except for auto weight adjustment

668c482a

Fix job assignment for restarted jobs · 32ef8692

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Allow local jobs to start even if worker socket check fails
- Backend handles worker availability, queue manager should always allow local processing

32ef8692

Add restart functionality for cancelled jobs · 599fa8f3

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Add restart button in job history for cancelled jobs
- Add /job/<id>/restart route in web interface
- Add restart_job method in QueueManager to reset cancelled jobs to queued

599fa8f3

Add debug prints to worker and revert monitoring to TCP · 82f5cbfe
Stefy Lanza (nextime / spora ) authored Oct 08, 2025
```
- Worker now prints when receiving jobs and sending results
- Cluster master uses TCP polling for consistency with clients
```
82f5cbfe
Change local job monitoring to poll database instead of TCP · 09a3589a
Stefy Lanza (nextime / spora ) authored Oct 08, 2025
```
- Use get_queue_by_job_id to check job status
- More reliable than TCP polling for local jobs
```
09a3589a
Add missing pending_jobs attribute to ClusterMaster · 08a9a99e
Stefy Lanza (nextime / spora ) authored Oct 08, 2025
```
- Initialize self.pending_jobs dict for job monitoring tasks
- Fixes AttributeError when assigning local jobs
```
08a9a99e
Fix message type for local job assignment · 42837adf
Stefy Lanza (nextime / spora ) authored Oct 08, 2025
```
- Use 'analyze_request' instead of 'analysis_request'
- Match the expected message type in worker processes
```
42837adf

Fix Message object attribute access in job monitoring · fb38da16

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Change response.get('msg_type') to response.msg_type
- Change response.get('data') to response.data
- Message objects don't have get method, use attributes instead

fb38da16

Restore automatic weight adjustment only when weight is not explicitly set · 384c96f5

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- If --weight is not specified, master weight changes to 0 when clients connect
- If --weight is specified, master participates in job selection with that weight

384c96f5

Remove automatic weight adjustment when clients connect · de670441

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Cluster master now participates in job selection even when clients are connected
- Local workers compete with external workers based on weight and VRAM

de670441

Fix cluster master to look for queued jobs instead of processing · 89c63cf8
Stefy Lanza (nextime / spora ) authored Oct 08, 2025
```
- Jobs are inserted as 'queued', not 'processing'
- Cluster master now finds and assigns queued jobs
```
89c63cf8
Add debug output for detected local GPU VRAM · acc3a58a
Stefy Lanza (nextime / spora ) authored Oct 08, 2025
```
- Shows total VRAM detected on local GPUs when registering processes
```
acc3a58a

Remove special VRAM allowance for local workers · 826da5da

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Local workers now require sufficient VRAM like other workers
- Since the server has 24GB VRAM and jobs need 16GB, the check passes normally

826da5da

Add job result monitoring for local job assignment · 837264f3
Stefy Lanza (nextime / spora ) authored Oct 08, 2025
```
- Local jobs now monitor for completion and handle results
- Prevents jobs from hanging without result retrieval
```
837264f3

Fix local client registration to prevent disconnection · 9b864708

Stefy Lanza (nextime / spora ) authored Oct 08, 2025

- Changed local client ID to 'local' and marked as local to prevent cleanup
- Local clients are not cleaned up after 60 seconds
- Prevents 'Client local disconnected' messages

9b864708