Add README with setup and usage instructions

parent 633bba14
# Audio Transcription App
This Python application transcribes audio files with speaker diarization and timestamps using Qwen-Omni-7B model.
## Features
- Automatic speech recognition with Qwen-Omni-7B (4-bit quantized)
- Speaker diarization using pyannote.audio
- Timestamps for each utterance
- Output in TXT format with same name as input audio
## Requirements
- Python 3.8+
- 24GB VRAM GPU (for Qwen-Omni-7B quantized)
- Hugging Face account with access to pyannote models
## Setup
1. Clone or download the repository
2. Create virtual environment:
```bash
python3 -m venv venv
```
3. Activate venv:
```bash
source venv/bin/activate
```
4. Install dependencies:
```bash
pip install -r requirements.txt
```
5. Set Hugging Face token:
```bash
export HF_TOKEN=your_huggingface_token
```
## Usage
```bash
python transcript.py path/to/audio.wav
```
Output: `path/to/audio.txt`
## Output Format
```
[00:00:00.00 - 00:00:05.00] SPEAKER_00: Transcribed text here.
[00:00:05.00 - 00:00:10.00] SPEAKER_01: More transcribed text.
```
## Notes
- Supports common audio formats (wav, mp3, etc.)
- Requires internet for model downloads on first run
- Processing time depends on audio length and hardware
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment