{% extends "base.html" %} {% block title %}Studio - AISBF Dashboard{% endblock %} {% block extra_css %} {% endblock %} {% block content %}

CoderAI Studio

Select a functionality and bind the required models in the sidebar

Enter to send · Shift+Enter for newline
Image will appear here
Edited image will appear here
Result will appear here
Upscaled image will appear here
Depth map will appear here
Segmented image will appear here
Result will appear here
Deblurred image will appear here
Restored image will appear here
Result will appear here
3D output will appear here
Rendered image will appear here
{# vid_ctrl and vid_postproc are injected by JS below #}
Video will appear here
Characters & Environment
Animated video will appear here
Characters & Environment
Transformed video will appear here
Characters & Environment
Add audio
Subtitles
Post-processing
Video will appear here
Interpolated video will appear here
Result will appear here
Voice identity, timing, emotion, and singing preservation depend on model support and may fall back to a simpler dubbed track.
Dubbed video will appear here
Upscaled video will appear here
3D video will appear here
Turntable video will appear here
Generated audio will appear here
Audio will appear here
Save voice profile
Cloned voice audio will appear here

Converts only the timbre of the source audio to match the target voice, preserving pitch, melody, rhythm and expression. Use Singing mode for music and songs.

Converted audio will appear here
Transcript will appear here
Current studio behavior: this panel can now run transcript-first intake through existing speech-to-text primitives, then return a truthful partial workflow plan for lyric localization.
Fallback path: transcribe lyrics, translate/adapt them, then resynthesize guide vocals or use Voice Convert on isolated vocals before recombining outside this panel. Rhythm-safe lyric adaptation and automatic mix replacement are still not exposed here.
Transcript, limitations, and next-step guidance will appear here.
Current studio behavior: understanding runs as a transcript-first composed request. If a text/chat model is also selected, Studio can ask it to summarize or reason over the transcript.
Fallback path: if no text/chat model is selected, this panel still returns the transcript so you can continue reasoning manually in Chat.
Transcript-backed understanding will appear here.
Native backend: Studio now sends a real /v1/audio/stems request. The current backend uses ffmpeg-only heuristics, so outputs are best-effort estimates rather than production-grade ML demixing.
Quality note: vocals/instrumental works best when vocals are center-panned. The 4-stem and drums/bass/other modes are broad frequency-group splits, not clean source-isolated stems.
Separated stem artifacts will appear here.
Native backend: Studio now sends a real /v1/audio/cleanup request. The current backend chains standard ffmpeg filters, so it can improve many recordings but is not a substitute for deep restoration models.
Fallback path: if the result is still not intelligible, use Transcribe to assess what survived and switch to dedicated external restoration before dubbing or conversion.
Cleaned audio and backend limitations will appear here.
Diagnostics
Evaluated from the current model type, declared capabilities, and existing Studio fallback rules.
No diagnostics yet.
Session artifact history
Image → Video Pipeline
Turn a text prompt into a generated still, animate it into motion, and optionally add soundtrack cues.
Generate image Animate Audio (opt)
imagevideosoundtrack
Add audio to video
Video → Dub + Subtitle
Transcribe spoken dialogue, translate it, synthesize replacement speech, and optionally burn subtitles into the localized video.
Transcribe Translate TTS dub Burn subs
videolocalizationsubtitles
Full Story
Draft a narrated multi-scene story, generate matching visuals, animate them, and add spoken narration in one pass.
LLM script Images Video TTS
storytellingimagevideotts
Audio/Video Dub with Voice Clone
Create a dubbed replacement track while trying to preserve the original speaker identity through reference-based voice cloning.
Transcribe Clone voice Replace audio
voice clonedubvideo
Face Swap
Transfer a source face into a target image or video result while keeping the rest of the scene intact.
Source face Target Result
imagevideoidentity
Voice Clone (TTS)
Synthesize fresh speech from text using a saved or uploaded voice reference as the target speaker style.
Reference audio Text Cloned speech
ttsvoice clonespeech

Synthesize new speech in a cloned voice. Good for dubbing scripts.

Voice Convert (SVC)
Convert the source timbre to a new speaker while preserving pitch, rhythm, and musical phrasing for speech or singing.
Source audio Target timbre Converted audio
audioconversionsinging

Convert timbre while preserving pitch, melody and expression. Use Singing mode for music.

+ Build New Pipeline Add stepsConfigureSave & Run
Embedding vectors will appear here
Switch to this tab to load generated files.
New Character
Character Profiles
Loading…
New Environment
Environment Profiles
Loading…
Extract Voice
Voice Profiles
Loading…
Requires TripoSR (image) or Shap-E (text/image). Falls back to a depth-based mesh when neither is installed.
GLB model will be ready for download
3D output will appear here
3D video will appear here
Render will appear here
{% endblock %} {% block extra_js %} {% endblock %}