# Audio Transcription App

This Python application transcribes audio files with speaker diarization and timestamps using Qwen2.5-Omni-7B and Resemblyzer models by default. Use --whisper to use Whisper instead.

## Features
- Automatic speech recognition with Qwen-Omni-7B (4-bit quantized)
- Speaker diarization using pyannote.audio
- Timestamps for each utterance
- Output in TXT format with same name as input audio

## Requirements
- Python 3.8+
- 24GB VRAM GPU (for Qwen-Omni-7B quantized)
- Hugging Face account with access to pyannote models

## Setup
1. Clone or download the repository
2. Create virtual environment:
   ```bash
   python3 -m venv venv
   ```
3. Activate venv:
   ```bash
   source venv/bin/activate
   ```
4. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```
5. No additional setup required - models download automatically on first run.

## Usage
```bash
python transcript.py path/to/audio.wav
```

Output: `path/to/audio.txt`

## Output Format
```
[00:00:00.00 - 00:00:05.00] SPEAKER_00: Transcribed text here.
[00:00:05.00 - 00:00:10.00] SPEAKER_01: More transcribed text.
```

## Notes
- Supports common audio formats (wav, mp3, etc.)
- Requires internet for model downloads on first run
- Processing time depends on audio length and hardware