Fix colorspace detection to handle single-channel outputs (grayscale)

The detection was failing with 'index out of bounds' when models output single-channel or channels-first tensor formats. Now handles: - Grayscale (H, W) -> expands to 3 channels - Single channel (H, W, 1) -> replicates to 3 channels - Channels-first (C, H, W) -> transposes to channels-last - 2-channel images -> adds third channel - >3 channels (e.g., RGBA) -> takes first 3 Also removed incorrect 8GB RAM check that was blocking detection.

Fix colorspace detection to handle single-channel outputs (grayscale)
The detection was failing with 'index out of bounds' when models output single-channel or channels-first tensor formats. Now handles: - Grayscale (H, W) -> expands to 3 channels - Single channel (H, W, 1) -> replicates to 3 channels - Channels-first (C, H, W) -> transposes to channels-last - 2-channel images -> adds third channel - >3 channels (e.g., RGBA) -> takes first 3 Also removed incorrect 8GB RAM check that was blocking detection.
818ddf67 · Stefy Lanza (nextime / spora ) · f80e682b · 818ddf67
Commit 818ddf67 authored Feb 28, 2026 by Stefy Lanza (nextime / spora )
Hide whitespace changes
Inline Side-by-side

Showing with 26 additions and 6 deletions

videogen.py videogen.py +26 -6

No files found.
--- a/videogen.py
+++ b/videogen.py
@@ -1292,7 +1292,7 @@ def detect_model_colorspace(pipe, model_name, m_info, args):
                    test_frames_data = np.transpose(test_frames_data, (1, 2, 3, 0))
            # Take first frame
-            test_frame = test_frames_data[0] if test_frames_data.ndim == 4 else test_frames_data
+            test_frame = test_frames_data[0] if test_frames_data.ndim >= 4 else test_frames_data
            # Normalize to 0-255 if needed
            if test_frame.dtype == np.float32 or test_frame.dtype == np.float64:
@@ -1300,11 +1300,31 @@ def detect_model_colorspace(pipe, model_name, m_info, args):
                    test_frame = test_frame * 255
                test_frame = test_frame.astype(np.uint8)
-            # Ensure we have 3 channels
+            # Handle different channel configurations
            if test_frame.ndim == 2:
+                # Grayscale - expand to 3 channels
                test_frame = np.stack([test_frame] * 3, axis=-1)
-            elif test_frame.shape[-1] > 3:
+            elif test_frame.ndim == 3:
-                test_frame = test_frame[..., :3]
+                # Check if channels first (C, H, W) or last (H, W, C)
+                if test_frame.shape[0] in [1, 3, 4] and test_frame.shape[0] < test_frame.shape[1]:
+                    # Channels first - convert to channels last
+                    test_frame = np.transpose(test_frame, (1, 2, 0))
+                # Now should be (H, W, C)
+                if test_frame.shape[-1] == 1:
+                    # Single channel - replicate to 3
+                    test_frame = np.repeat(test_frame, 3, axis=-1)
+                elif test_frame.shape[-1] > 3:
+                    # Take first 3 channels (e.g., RGBA -> RGB)
+                    test_frame = test_frame[..., :3]
+                elif test_frame.shape[-1] == 2:
+                    # 2 channels - add a third
+                    test_frame = np.concatenate([test_frame, test_frame[..., :1]], axis=-1)
+            # Ensure we have 3 channels at this point
+            if test_frame.shape[-1] != 3:
+                print(f"     ⚠️ Unexpected channel count: {test_frame.shape[-1]}, defaulting to RGB")
+                return "RGB"
            # Analyze the colors - check center region to avoid borders
            h, w = test_frame.shape[:2]
@@ -1312,8 +1332,8 @@ def detect_model_colorspace(pipe, model_name, m_info, args):
            # Calculate average of each channel
            r_avg = np.mean(center_region[..., 0])
-            g_avg = np.mean(center_region[..., 1])
+            g_avg = np.mean(center_region[..., 1]) if center_region.shape[-1] > 1 else 0
-            b_avg = np.mean(center_region[..., 2])
+            b_avg = np.mean(center_region[..., 2]) if center_region.shape[-1] > 2 else 0
            print(f"     Channel averages - R: {r_avg:.1f}, G: {g_avg:.1f}, B: {b_avg:.1f}")