Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Contribute to GitLab
Sign in
Toggle navigation
A
airtanscropt
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
SexHackMe
airtanscropt
Commits
994dbc48
Commit
994dbc48
authored
Dec 11, 2025
by
Stefy Lanza (nextime / spora )
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Implement Qwen2.5-Omni-7B using AutoModel as per Hugging Face documentation
parent
85466098
Pipeline
#203
canceled with stages
Changes
2
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
19 additions
and
9 deletions
+19
-9
README.md
README.md
+1
-1
transcript.py
transcript.py
+18
-8
No files found.
README.md
View file @
994dbc48
# Audio Transcription App
This Python application transcribes audio files with speaker diarization and timestamps using
Whisper
and Resemblyzer models.
This Python application transcribes audio files with speaker diarization and timestamps using
Qwen2.5-Omni-7B
and Resemblyzer models.
## Features
-
Automatic speech recognition with Qwen-Omni-7B (4-bit quantized)
...
...
transcript.py
View file @
994dbc48
import
argparse
import
torch
from
transformers
import
pipeline
from
transformers
import
AutoProcessor
,
AutoModel
from
resemblyzer
import
VoiceEncoder
from
sklearn.cluster
import
AgglomerativeClustering
import
webrtcvad
...
...
@@ -62,7 +62,7 @@ def get_diarization(audio, sr):
return
merged
def
main
():
parser
=
argparse
.
ArgumentParser
(
description
=
'Transcribe audio with speakers and timestamps'
)
parser
=
argparse
.
ArgumentParser
(
description
=
'Transcribe audio with speakers and timestamps
using Qwen2.5-Omni-7B
'
)
parser
.
add_argument
(
'audio_file'
,
help
=
'Path to the audio file'
)
args
=
parser
.
parse_args
()
...
...
@@ -73,9 +73,9 @@ def main():
print
(
f
"Error: Audio file '{audio_file}' not found."
)
return
# Load
Whisper for transcription
device
=
0
if
torch
.
cuda
.
is_available
()
else
-
1
transcriber
=
pipeline
(
"automatic-speech-recognition"
,
model
=
"openai/whisper-large-v3"
,
device
=
device
)
# Load
Qwen2.5-Omni-7B model
processor
=
AutoProcessor
.
from_pretrained
(
"Qwen/Qwen2.5-Omni-7B"
)
model
=
AutoModel
.
from_pretrained
(
"Qwen/Qwen2.5-Omni-7B"
)
# Load audio
audio
,
sr
=
librosa
.
load
(
audio_file
,
sr
=
16000
)
...
...
@@ -94,9 +94,19 @@ def main():
if
len
(
audio_chunk
)
==
0
:
continue
# Transcribe with Whisper
result
=
transcriber
(
audio_chunk
,
return_timestamps
=
False
)
text
=
result
[
'text'
]
.
strip
()
# Prepare inputs for Qwen-Omni
conversation
=
[
{
"role"
:
"user"
,
"content"
:
[
{
"type"
:
"audio"
,
"audio"
:
{
"waveform"
:
audio_chunk
,
"sample_rate"
:
sr
}},
{
"type"
:
"text"
,
"text"
:
"Transcribe this audio segment exactly as spoken."
}
]}
]
inputs
=
processor
(
conversation
=
conversation
,
return_tensors
=
"pt"
)
# Generate transcription
with
torch
.
no_grad
():
generated_ids
=
model
.
generate
(
**
inputs
,
max_new_tokens
=
200
,
do_sample
=
False
)
text
=
processor
.
batch_decode
(
generated_ids
,
skip_special_tokens
=
True
)[
0
]
.
strip
()
# Format timestamps
start_min
,
start_sec
=
divmod
(
start
,
60
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment