Add --audio-chunk option for audio/video chunking strategies

Added --audio-chunk argument with 3 modes:
- overlap (default): overlapping chunks like [0-60], [58-118]
- word-boundary: uses Whisper timestamps to split at word boundaries
- vad: uses Voice Activity Detection to skip silence

Also added --audio-chunk-overlap to control overlap duration.

New functions added:
- process_video_with_vad(): VAD-based chunking
- process_video_word_boundary(): Word-boundary chunking using Whisper

Modified:
- transcribe_video_audio(): accepts audio_chunk_type and audio_chunk_overlap params
- _transcribe_chunked(): accepts chunk_type and overlap params
parent caf3c707
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment