Add README with setup and usage instructions

45156b52 · Stefy Lanza (nextime / spora ) · 633bba14 · 45156b52
Commit 45156b52 authored Dec 11, 2025 by Stefy Lanza (nextime / spora )
Hide whitespace changes
Inline Side-by-side

Showing with 51 additions and 0 deletions

README.md README.md +51 -0

No files found.
--- a/README.md
+++ b/README.md
+# Audio Transcription App
+
+This Python application transcribes audio files with speaker diarization and timestamps using Qwen-Omni-7B model.
+
+## Features
+- Automatic speech recognition with Qwen-Omni-7B (4-bit quantized)
+- Speaker diarization using pyannote.audio
+- Timestamps for each utterance
+- Output in TXT format with same name as input audio
+
+## Requirements
+- Python 3.8+
+- 24GB VRAM GPU (for Qwen-Omni-7B quantized)
+- Hugging Face account with access to pyannote models
+
+## Setup
+1. Clone or download the repository
+2. Create virtual environment:
+   ```bash
+   python3 -m venv venv
+   ```
+3. Activate venv:
+   ```bash
+   source venv/bin/activate
+   ```
+4. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+5. Set Hugging Face token:
+   ```bash
+   export HF_TOKEN=your_huggingface_token
+   ```
+
+## Usage
+```bash
+python transcript.py path/to/audio.wav
+```
+
+Output: `path/to/audio.txt`
+
+## Output Format
+```
+[00:00:00.00 - 00:00:05.00] SPEAKER_00: Transcribed text here.
+[00:00:05.00 - 00:00:10.00] SPEAKER_01: More transcribed text.
+```
+
+## Notes
+- Supports common audio formats (wav, mp3, etc.)
+- Requires internet for model downloads on first run
+- Processing time depends on audio length and hardware
\ No newline at end of file