Video to SRT: extract subtitle files from any video, and attach them back to any editor
The complete workflow — video → .srt → video. AI extraction in minutes, honest timing caveats, and the exact FFmpeg command (or NLE step) for attaching the .srt back.
.srtsubtitle file with timestamps in 1–3 minutes. Then attach it back to your video as a soft-sub track (viewer can toggle) or burn it into the pixels (permanently visible for TikTok / Reels / Shorts). DeluxeScribe extracts .srt with 99-language auto-detection and exports .srt, .vtt, .docx, PDF, or JSON with word-level timestamps. 60 minutes free, no credit card. Below: the extraction workflow, the AI timing accuracy caveat every vendor hides, and the exact FFmpeg command (or Premiere / DaVinci / Final Cut / CapCut / iMovie step) for attaching the .srt back.- 60 minutes free
- No credit card
- 99 languages
- Speaker labels
Last verified July 4, 2026
TL;DR — pick your path
| Your situation | Best path |
|---|---|
| I just want .srt from a video | Upload workflow (DeluxeScribe) |
| I already have .srt, need to attach | FFmpeg mux command |
| I want subtitles permanently visible | FFmpeg burn-in command |
| Toggle-able captions on YouTube/Vimeo | Soft-sub + platform upload |
| I’m on Premiere / DaVinci / FCP / CapCut | Editor-specific steps |
| Broadcast-standard captions (Netflix, BBC) | Human captioner |
| I own the YouTube video | YouTube Studio auto-caption download |
| Sensitive content (can’t upload) | Self-hosted Whisper |
Part 1 — Video to .srt (extraction)
The workflow, in 5 steps. Same as any cloud transcription service, with two subtitle-specific notes at the end.
- Sign up (60 minutes free, no credit card).
- Upload the video file.MP4, MOV, WebM, MKV, AVI, FLV, WMV, M4V all accepted. Up to 5 GB per file — roughly a 4K 30 fps video at 20–30 minutes, or a 1080p video at 4–6 hours. No need to extract audio first (see FFmpeg tip below if the file is too big).
- Language auto-detects. Leave it unless you want to force a specific dialect. Turn speaker labels off — for subtitle work they usually get in the way (they appear as
Speaker 1:prefixes in each cue). Turn them on only if the video is an interview or discussion where attribution matters. - Wait 1–3 minutes for typical files (30-60 min video). Longer files scale roughly linearly.
- Export as .srt. In the editor, click Export → SRT. Also available:
.vttfor web video,.docxfor editing,.jsonfor word-level timestamps (useful if you’re doing custom segmentation).
Generate .srt from your video in minutes
60 minutes free, no credit card. Accepts MP4, MOV, WebM, MKV and all common formats up to 5 GB. Exports .srt, .vtt, .docx, PDF, and JSON with word-level timestamps.
If your video is larger than 5 GB
Two options. Split with FFmpeg:
ffmpeg -i input.mp4 -f segment -segment_time 3600 -c copy part_%03d.mp4Produces one file per hour. Upload each separately, then concatenate the resulting .srt files (with timestamp offsets — see the segment-splitting section of our SRT generator page for the offset math).
Or extract just the audio and upload that instead (transcription only needs the audio track):
ffmpeg -i input.mp4 -vn -acodec copy audio.m4aAudio is typically 5–20× smaller than the source video, so a 10 GB video becomes a 500 MB–2 GB audio file that uploads easily.
Free extraction paths
Three legitimate free paths, each fits a different case:
1. YouTube Studio auto-caption download (for videos you own)
If you own the video and can wait 5–30 minutes for YouTube to process:
- Upload to YouTube (can be unlisted or private — no need to publish).
- Wait 5–30 minutes for auto-caption processing. Longer videos take longer.
- YouTube Studio → your video → Subtitles → hover over the English (auto-generated) track → three-dot menu → Download.
- Choose
.vttor.sbv. YouTube doesn’t export.srtnatively; convert.vtt→.srtwith any free tool (Subtitle Edit does it in one click).
Quality: good for clear English speech, mediocre on heavily accented or non-English content. Free for any video you can upload to YouTube.
2. Self-hosted Whisper (private, no upload)
Install once, then process any video locally:
pip install openai-whisperThen:
whisper input.mp4 --model large-v3 --output_format srtFree forever, no upload, no size limit, works on any video with audio. Speed:
- CPU:10–30× real-time (a 1-hour video takes 10–30 hours on typical laptop CPU)
- Apple Silicon (M1/M2/M3/M4): near real-time with MLX Whisper
- NVIDIA GPU (RTX 3060+):real-time to 2× faster
Fits: sensitive content that can’t leave your machine (court video, PHI, unreleased marketing), one-time transcription of a large batch, or if you want to learn the underlying model. Downside: no speaker labels out of the box, no browser editor, technical setup.
3. DeluxeScribe free tier
60 minutes of full-app credit, one-time, no credit card. Fits: creators trying the workflow before committing, or occasional users with less than an hour of video to transcribe. Same output quality as paid.
AI timing accuracy — the caveat every vendor hides
This is the section every SERP page skips because it complicates their marketing. The distinction that matters:
- Word-level timestamps.Whisper is accurate to ~200ms per word on clean speech. That’s professional-grade timing.
- Segment (caption cue) boundaries.Whisper groups words into “sentences” using pause detection and heuristics. These boundaries don’t respect professional captioning rules.
Professional captioning standards (BBC, Netflix, streaming platforms with QC) require:
- Max 42 characters per line(Netflix standard; BBC allows 37–39 for broadcast)
- Max 2 lines per cue
- Min 1 second display time per cue
- Max 17 characters per second reading speed (adult content; 20 CPS for shorter durations)
- Line breaks at natural syntactic boundaries (not mid-clause)
AI-generated .srt cues routinely violate all of these. Practical implications:
| Use case | AI .srt as-is? | Cleanup time (30-min video) |
|---|---|---|
| YouTube upload | Yes | 0-5 min |
| Vimeo / personal use | Yes | 0-5 min |
| TikTok / Reels / Shorts burn-in | Yes | 0-10 min (adjust for on-screen fit) |
| Corporate training / marketing | Sometimes | 10-20 min |
| Streaming platform QC (Netflix, HBO, Apple TV+) | No | 30-60 min or send to captioner |
| Broadcast (BBC iPlayer) | No | Send to captioner (see §11) |
For serious cleanup work, use Subtitle Edit (free, Windows/Mac/Linux) or Aegisub (free, cross-platform). Both have built-in CPS validators and line-break tooling. See our SRT generator page for the full segmentation rules.
Part 2 — .srt to video (attach)
You have an .srt. Now you need to attach it back to the video. Two fundamentally different ways:
- Soft-sub (muxed track).The .srt lives alongside the video in an MP4 or MKV container. Viewer can toggle captions on/off. YouTube, Vimeo, VLC, QuickTime, and most modern players honor this. Doesn’t touch video pixels.
- Burn-in (hardsub). The .srt is baked permanently into the video pixels. Always visible, no toggle. Required for platforms that strip caption tracks (TikTok, Reels, LinkedIn video). Reversible only by re-editing from the source video.
The chooser is in §6 below; the FFmpeg commands are in §7 (soft-sub) and §8 (burn-in); NLE-specific steps are in §9.
Soft-sub vs burn-in — which one?
Choose by target platform, not by preference:
| Platform / use | Soft-sub | Burn-in | Note |
|---|---|---|---|
| YouTube upload | ✓ | Skip | Upload .srt alongside video |
| Vimeo | ✓ | Skip | Same as YouTube |
| TikTok | ✗ stripped | ✓ | Strip caption tracks on upload |
| Instagram Reels | ✗ stripped | ✓ | Same as TikTok |
| YouTube Shorts | Auto-caption available | Recommended | Burn-in for style control |
| LinkedIn video | ✗ auto-mangled | ✓ | Native captions unreliable |
| Twitter/X video | ✗ | ✓ | No caption track support |
| Facebook video | Mostly ✗ | ✓ | Native captions inconsistent |
| Course platform (Kajabi, Teachable) | Check platform | Check platform | Varies; test one video first |
| Broadcast / OTT delivery | Spec-dependent | Spec-dependent | Follow delivery specification |
| Local playback (VLC, QuickTime) | ✓ | Skip | Both players support .srt sidecar |
| Archive for future editing | ✓ | Skip | Keep source clean |
Rule of thumb: soft-sub for platforms that respect caption tracks (YouTube, Vimeo, streaming, local playback). Burn-in for social platforms that strip them (TikTok, Reels, LinkedIn, X, Facebook).
FFmpeg — soft-sub (mux) command
Add an .srt track to an MP4 without re-encoding the video or audio:
ffmpeg -i input.mp4 -i subs.srt -c copy -c:s mov_text output.mp4Flags explained:
-i input.mp4— the source video-i subs.srt— the subtitle file-c copy— copy video and audio streams without re-encoding (fast, lossless)-c:s mov_text— encode the subtitle stream as MP4-compatible text codecoutput.mp4— muxed output
Use MKV instead of MP4 for better subtitle support
MP4 subtitle support is limited (only mov_text codec). MKV supports SRT natively without conversion:
ffmpeg -i input.mp4 -i subs.srt -c copy output.mkvMKV is the safer container if you’re publishing to desktop players (VLC handles both perfectly). MP4 is required if you’re uploading to platforms that reject MKV (YouTube converts fine either way; some older systems don’t).
Multi-language soft-sub
Add multiple .srt tracks with language metadata:
ffmpeg -i input.mp4 -i subs.en.srt -i subs.es.srt -i subs.ja.srt \ -map 0 -map 1 -map 2 -map 3 \ -c copy -c:s mov_text \ -metadata:s:s:0 language=eng \ -metadata:s:s:1 language=spa \ -metadata:s:s:2 language=jpn \ output.mp4
Viewer can select which language to display. Useful for international content, course platforms, or any video with a translation workflow.
FFmpeg — burn-in command
Render subtitles permanently into the video pixels:
ffmpeg -i input.mp4 -vf "subtitles=subs.srt" -c:a copy output.mp4Flags explained:
-vf "subtitles=subs.srt"— video filter that renders .srt into pixels-c:a copy— don’t re-encode audio (only video is re-encoded, since burn-in modifies pixels)
The output video quality is set by FFmpeg’s default H.264 encoder settings. For higher quality, add -crf 18 (visually lossless) or -crf 23 (default, good balance).
Custom font, size, and background (recommended)
Default FFmpeg subtitle rendering uses a small font with a thin outline — hard to read on complex video. Fix with force_style:
ffmpeg -i input.mp4 \ -vf "subtitles=subs.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,BorderStyle=3'" \ -c:a copy output.mp4
BorderStyle=3 adds a semi-opaque background box behind the text — best legibility on complex video. PrimaryColour and OutlineColour use BGR hex format (not RGB) prefixed with &H.
Non-Latin scripts (Japanese, Arabic, Hindi, Thai)
Default fonts don’t cover non-Latin character sets. You must specify a font that does:
# Japanese (Mac) ffmpeg -i input.mp4 \ -vf "subtitles=subs.srt:fontsdir=/System/Library/Fonts:force_style='FontName=Hiragino Sans'" \ -c:a copy output.mp4 # Arabic (Linux with Noto) ffmpeg -i input.mp4 \ -vf "subtitles=subs.srt:fontsdir=/usr/share/fonts:force_style='FontName=Noto Sans Arabic'" \ -c:a copy output.mp4
If the burn-in shows empty boxes instead of characters, the font doesn’t support that script. Install a Noto or Google Fonts variant that does.
UTF-8 BOM warning
.srt files created on Windows Notepad often have a UTF-8 byte-order mark (0xEF 0xBB 0xBF) that breaks FFmpeg’s subtitle parser. Symptom: the first subtitle doesn’t appear, or all subtitles are offset. Strip the BOM:
sed -i.bak '1s/^\xEF\xBB\xBF//' subs.srtOr save the .srt as “UTF-8 without BOM” in your text editor (VS Code, Sublime, Notepad++ all support this).
Attach in your video editor
One-paragraph guide per NLE. Every current version is covered.
Adobe Premiere Pro
- Window → Text → Captions → Import Captions
- Choose your .srt file. Premiere imports it as a caption track on the timeline.
- Drag the caption clip to align with the audio. Use the Essential Graphics panel to change font, size, position, color.
- File → Export → Format → check “Include Captions” to embed as soft-sub, or “Burn Captions Into Video” for burn-in.
DaVinci Resolve (19+)
- File → Import → Subtitle → select .srt
- Drag from Media Pool to Video 2 track (subtitle track appears automatically on the timeline).
- Inspector → Subtitle tab → adjust font, size, position, stroke. Resolve has excellent caption controls.
- Deliver page → Video → Subtitle Settings → toggle “Burn into video” for hardsub, or leave off for soft-sub (delivered as sidecar or muxed depending on format).
Final Cut Pro (10.7+)
- File → Import → Captions
- Choose your .srt file. Captions import as a new caption role attached to your project.
- Timeline shows captions as a track. Click any caption to edit text; use the inspector to change font, position, formatting.
- File → Share → Master File → Settings → check “Embed Captions” for soft-sub. For burn-in, use the “Burn In” option in the caption role settings.
CapCut (desktop and mobile)
- Two paths: (a) Text → Auto Captions to use CapCut’s own AI transcription, or (b) Text → Import → SRT to load your .srt file.
- Style panel → font, size, color, animation. CapCut has social-media-friendly caption templates (bounce, glow, typewriter).
- Export → Advanced → toggle burn-in or embed depending on destination platform.
iMovie (Mac)
iMovie doesn’t support .srt import directly. Workaround: use the FFmpeg burn-in command (§8) to bake captions into the video first, then edit the burned video in iMovie. Or use another NLE for the caption workflow.
Kapwing / Clideo (browser-based)
- Upload video → Add Subtitles → Upload SRT
- Position and style in the browser editor
- Export (both default to burn-in; check settings for soft-sub option)
Common gotchas
- UTF-8 BOM breaks FFmpeg subtitle parser. Strip with
sedor save as UTF-8 without BOM. - Non-Latin scripts need explicit font. Default FFmpeg fonts don’t cover Japanese, Arabic, Hindi, Thai. Specify a font that does (Noto family works broadly).
- TikTok / Reels strip caption tracks. Always burn-in for these platforms.
- YouTube upload: don’t pre-embed captions if you also plan to use YouTube’s own caption system. Both systems will run and duplicate.
- MP4 doesn’t natively support .srt. FFmpeg converts to
mov_textunder the hood, but some older players (pre-Big Sur QuickTime) still don’t display them. MKV is safer for local playback. - Speaker labels look ugly in subtitles. For .srt output, generate without speaker labels unless you specifically need them for accessibility.
- AI segment boundaries don’t respect CPS or line length. For professional work, run through Subtitle Edit or Aegisub, or send to a human captioner.
- “99%-accurate” AI claims are marketing. Real segment-boundary accuracy on broadcast-standard timing is much rougher than word-level timestamps.
- Timecode drift on long files. Some AI services accumulate small timing errors over multi-hour videos. Spot-check the last cue against the video before publishing.
When a human captioner fits better
Cases where AI-generated .srt (from DeluxeScribe or anyone else) is not the right choice:
- Broadcast / OTT delivery with QC.BBC iPlayer, Netflix, HBO Max, Apple TV+, Disney+ all require specific timing standards (line length, CPS, cue gaps) that AI doesn’t deliver reliably. Spec compliance is mandatory; a captioner is the correct choice.
- WCAG AAA compliance for legal/regulatory work (courtroom video evidence, medical patient education, government training materials).
- High-visibility marketing content where mis-captions damage brand — think Super Bowl ad, investor day keynote, CEO announcement video.
- Non-English content where the client is a native speaker of the target language. AI accuracy on non-English speech is worse than English; a native captioner catches errors AI misses.
- Low-quality audio (poor mic, heavy background noise, distant speakers). AI struggles here; humans do better with context.
Named alternatives — real vendors, no rankings, just honest options:
- Rev human tier — $1.50/audio minute plus captioner review. Fast turnaround, US-based captioners.
- 3Play Media — broadcast-standard captions; dominant vendor in US streaming and broadcast.
- GoTranscript human tier— $1.10–$2.50/audio minute; wider language coverage than most competitors.
- Verbit — specialized in education, legal, government. Higher-end pricing, AAA compliance standard.
How this page was verified
Related guides
- SRT Generator (format explainer)The SRT format itself — anatomy, BBC timing rules, .srt vs .vtt, the 4 ways to generate ranked by use case.
- Video to TextThe broader source-agnostic pillar — four paths for turning any video into text, including YouTube URL and FFmpeg extraction.
- Audio to SRT (audio source)Sibling companion — same workflow, audio source. Adds Podcasting 2.0 publishing, translation pivots, and course platform captions.
- How to Transcribe Audio (pillar)The broader pillar — every path across sources and formats and how to pick.