Video to SRT: extract subtitle files from any video, and attach them back to any editor

The complete workflow — video → .srt → video. AI extraction in minutes, honest timing caveats, and the exact FFmpeg command (or NLE step) for attaching the .srt back.

Upload any video — MP4, MOV, WebM, MKV, AVI up to 5 GB — and get a .srtsubtitle file with timestamps in 1–3 minutes. Then attach it back to your video as a soft-sub track (viewer can toggle) or burn it into the pixels (permanently visible for TikTok / Reels / Shorts). DeluxeScribe extracts .srt with 99-language auto-detection and exports .srt, .vtt, .docx, PDF, or JSON with word-level timestamps. 60 minutes free, no credit card. Below: the extraction workflow, the AI timing accuracy caveat every vendor hides, and the exact FFmpeg command (or Premiere / DaVinci / Final Cut / CapCut / iMovie step) for attaching the .srt back.

60 minutes free
No credit card
99 languages
Speaker labels

Last verified July 4, 2026

TL;DR — pick your path

Your situation	Best path
I just want .srt from a video	Upload workflow (DeluxeScribe)
I already have .srt, need to attach	FFmpeg mux command
I want subtitles permanently visible	FFmpeg burn-in command
Toggle-able captions on YouTube/Vimeo	Soft-sub + platform upload
I’m on Premiere / DaVinci / FCP / CapCut	Editor-specific steps
Broadcast-standard captions (Netflix, BBC)	Human captioner
I own the YouTube video	YouTube Studio auto-caption download
Sensitive content (can’t upload)	Self-hosted Whisper

Part 1 — Video to .srt (extraction)

The workflow, in 5 steps. Same as any cloud transcription service, with two subtitle-specific notes at the end.

Sign up (60 minutes free, no credit card).
Upload the video file.MP4, MOV, WebM, MKV, AVI, FLV, WMV, M4V all accepted. Up to 5 GB per file — roughly a 4K 30 fps video at 20–30 minutes, or a 1080p video at 4–6 hours. No need to extract audio first (see FFmpeg tip below if the file is too big).
Language auto-detects. Leave it unless you want to force a specific dialect. Turn speaker labels off — for subtitle work they usually get in the way (they appear as Speaker 1: prefixes in each cue). Turn them on only if the video is an interview or discussion where attribution matters.
Wait 1–3 minutes for typical files (30-60 min video). Longer files scale roughly linearly.
Export as .srt. In the editor, click Export → SRT. Also available: .vtt for web video, .docx for editing, .json for word-level timestamps (useful if you’re doing custom segmentation).

Generate .srt from your video in minutes

60 minutes free, no credit card. Accepts MP4, MOV, WebM, MKV and all common formats up to 5 GB. Exports .srt, .vtt, .docx, PDF, and JSON with word-level timestamps.

If your video is larger than 5 GB

Two options. Split with FFmpeg:

ffmpeg -i input.mp4 -f segment -segment_time 3600 -c copy part_%03d.mp4

Produces one file per hour. Upload each separately, then concatenate the resulting .srt files (with timestamp offsets — see the segment-splitting section of our SRT generator page for the offset math).

Or extract just the audio and upload that instead (transcription only needs the audio track):

ffmpeg -i input.mp4 -vn -acodec copy audio.m4a

Audio is typically 5–20× smaller than the source video, so a 10 GB video becomes a 500 MB–2 GB audio file that uploads easily.

Free extraction paths

Three legitimate free paths, each fits a different case:

1. YouTube Studio auto-caption download (for videos you own)

If you own the video and can wait 5–30 minutes for YouTube to process:

Upload to YouTube (can be unlisted or private — no need to publish).
Wait 5–30 minutes for auto-caption processing. Longer videos take longer.
YouTube Studio → your video → Subtitles → hover over the English (auto-generated) track → three-dot menu → Download.
Choose .vtt or .sbv. YouTube doesn’t export .srt natively; convert .vtt → .srt with any free tool (Subtitle Edit does it in one click).

Quality: good for clear English speech, mediocre on heavily accented or non-English content. Free for any video you can upload to YouTube.

2. Self-hosted Whisper (private, no upload)

Install once, then process any video locally:

pip install openai-whisper

Then:

whisper input.mp4 --model large-v3 --output_format srt

Free forever, no upload, no size limit, works on any video with audio. Speed:

CPU:10–30× real-time (a 1-hour video takes 10–30 hours on typical laptop CPU)
Apple Silicon (M1/M2/M3/M4): near real-time with MLX Whisper
NVIDIA GPU (RTX 3060+):real-time to 2× faster

Fits: sensitive content that can’t leave your machine (court video, PHI, unreleased marketing), one-time transcription of a large batch, or if you want to learn the underlying model. Downside: no speaker labels out of the box, no browser editor, technical setup.

3. DeluxeScribe free tier

60 minutes of full-app credit, one-time, no credit card. Fits: creators trying the workflow before committing, or occasional users with less than an hour of video to transcribe. Same output quality as paid.

AI timing accuracy — the caveat every vendor hides

This is the section every SERP page skips because it complicates their marketing. The distinction that matters:

Word-level timestamps.Whisper is accurate to ~200ms per word on clean speech. That’s professional-grade timing.
Segment (caption cue) boundaries.Whisper groups words into “sentences” using pause detection and heuristics. These boundaries don’t respect professional captioning rules.

Professional captioning standards (BBC, Netflix, streaming platforms with QC) require:

Max 42 characters per line(Netflix standard; BBC allows 37–39 for broadcast)
Max 2 lines per cue
Min 1 second display time per cue
Max 17 characters per second reading speed (adult content; 20 CPS for shorter durations)
Line breaks at natural syntactic boundaries (not mid-clause)

AI-generated .srt cues routinely violate all of these. Practical implications:

Use case	AI .srt as-is?	Cleanup time (30-min video)
YouTube upload	Yes	0-5 min
Vimeo / personal use	Yes	0-5 min
TikTok / Reels / Shorts burn-in	Yes	0-10 min (adjust for on-screen fit)
Corporate training / marketing	Sometimes	10-20 min
Streaming platform QC (Netflix, HBO, Apple TV+)	No	30-60 min or send to captioner
Broadcast (BBC iPlayer)	No	Send to captioner (see §11)

For serious cleanup work, use Subtitle Edit (free, Windows/Mac/Linux) or Aegisub (free, cross-platform). Both have built-in CPS validators and line-break tooling. See our SRT generator page for the full segmentation rules.

Part 2 — .srt to video (attach)

You have an .srt. Now you need to attach it back to the video. Two fundamentally different ways:

Soft-sub (muxed track).The .srt lives alongside the video in an MP4 or MKV container. Viewer can toggle captions on/off. YouTube, Vimeo, VLC, QuickTime, and most modern players honor this. Doesn’t touch video pixels.
Burn-in (hardsub). The .srt is baked permanently into the video pixels. Always visible, no toggle. Required for platforms that strip caption tracks (TikTok, Reels, LinkedIn video). Reversible only by re-editing from the source video.

The chooser is in §6 below; the FFmpeg commands are in §7 (soft-sub) and §8 (burn-in); NLE-specific steps are in §9.

Soft-sub vs burn-in — which one?

Choose by target platform, not by preference:

Platform / use	Soft-sub	Burn-in	Note
YouTube upload	✓	Skip	Upload .srt alongside video
Vimeo	✓	Skip	Same as YouTube
TikTok	✗ stripped	✓	Strip caption tracks on upload
Instagram Reels	✗ stripped	✓	Same as TikTok
YouTube Shorts	Auto-caption available	Recommended	Burn-in for style control
LinkedIn video	✗ auto-mangled	✓	Native captions unreliable
Twitter/X video	✗	✓	No caption track support
Facebook video	Mostly ✗	✓	Native captions inconsistent
Course platform (Kajabi, Teachable)	Check platform	Check platform	Varies; test one video first
Broadcast / OTT delivery	Spec-dependent	Spec-dependent	Follow delivery specification
Local playback (VLC, QuickTime)	✓	Skip	Both players support .srt sidecar
Archive for future editing	✓	Skip	Keep source clean

Rule of thumb: soft-sub for platforms that respect caption tracks (YouTube, Vimeo, streaming, local playback). Burn-in for social platforms that strip them (TikTok, Reels, LinkedIn, X, Facebook).

FFmpeg — soft-sub (mux) command

Add an .srt track to an MP4 without re-encoding the video or audio:

ffmpeg -i input.mp4 -i subs.srt -c copy -c:s mov_text output.mp4

Flags explained:

-i input.mp4 — the source video
-i subs.srt — the subtitle file
-c copy — copy video and audio streams without re-encoding (fast, lossless)
-c:s mov_text — encode the subtitle stream as MP4-compatible text codec
output.mp4 — muxed output

Use MKV instead of MP4 for better subtitle support

MP4 subtitle support is limited (only mov_text codec). MKV supports SRT natively without conversion:

ffmpeg -i input.mp4 -i subs.srt -c copy output.mkv

MKV is the safer container if you’re publishing to desktop players (VLC handles both perfectly). MP4 is required if you’re uploading to platforms that reject MKV (YouTube converts fine either way; some older systems don’t).

Multi-language soft-sub

Add multiple .srt tracks with language metadata:

ffmpeg -i input.mp4 -i subs.en.srt -i subs.es.srt -i subs.ja.srt \
  -map 0 -map 1 -map 2 -map 3 \
  -c copy -c:s mov_text \
  -metadata:s:s:0 language=eng \
  -metadata:s:s:1 language=spa \
  -metadata:s:s:2 language=jpn \
  output.mp4

Viewer can select which language to display. Useful for international content, course platforms, or any video with a translation workflow.

FFmpeg — burn-in command

Render subtitles permanently into the video pixels:

ffmpeg -i input.mp4 -vf "subtitles=subs.srt" -c:a copy output.mp4

Flags explained:

-vf "subtitles=subs.srt" — video filter that renders .srt into pixels
-c:a copy— don’t re-encode audio (only video is re-encoded, since burn-in modifies pixels)

The output video quality is set by FFmpeg’s default H.264 encoder settings. For higher quality, add -crf 18 (visually lossless) or -crf 23 (default, good balance).

Custom font, size, and background (recommended)

Default FFmpeg subtitle rendering uses a small font with a thin outline — hard to read on complex video. Fix with force_style:

ffmpeg -i input.mp4 \
  -vf "subtitles=subs.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,BorderStyle=3'" \
  -c:a copy output.mp4

BorderStyle=3 adds a semi-opaque background box behind the text — best legibility on complex video. PrimaryColour and OutlineColour use BGR hex format (not RGB) prefixed with &H.

Non-Latin scripts (Japanese, Arabic, Hindi, Thai)

Default fonts don’t cover non-Latin character sets. You must specify a font that does:

# Japanese (Mac)
ffmpeg -i input.mp4 \
  -vf "subtitles=subs.srt:fontsdir=/System/Library/Fonts:force_style='FontName=Hiragino Sans'" \
  -c:a copy output.mp4

# Arabic (Linux with Noto)
ffmpeg -i input.mp4 \
  -vf "subtitles=subs.srt:fontsdir=/usr/share/fonts:force_style='FontName=Noto Sans Arabic'" \
  -c:a copy output.mp4

If the burn-in shows empty boxes instead of characters, the font doesn’t support that script. Install a Noto or Google Fonts variant that does.

UTF-8 BOM warning

.srt files created on Windows Notepad often have a UTF-8 byte-order mark (0xEF 0xBB 0xBF) that breaks FFmpeg’s subtitle parser. Symptom: the first subtitle doesn’t appear, or all subtitles are offset. Strip the BOM:

sed -i.bak '1s/^\xEF\xBB\xBF//' subs.srt

Or save the .srt as “UTF-8 without BOM” in your text editor (VS Code, Sublime, Notepad++ all support this).

Attach in your video editor

One-paragraph guide per NLE. Every current version is covered.

Adobe Premiere Pro

Window → Text → Captions → Import Captions
Choose your .srt file. Premiere imports it as a caption track on the timeline.
Drag the caption clip to align with the audio. Use the Essential Graphics panel to change font, size, position, color.
File → Export → Format → check “Include Captions” to embed as soft-sub, or “Burn Captions Into Video” for burn-in.

DaVinci Resolve (19+)

File → Import → Subtitle → select .srt
Drag from Media Pool to Video 2 track (subtitle track appears automatically on the timeline).
Inspector → Subtitle tab → adjust font, size, position, stroke. Resolve has excellent caption controls.
Deliver page → Video → Subtitle Settings → toggle “Burn into video” for hardsub, or leave off for soft-sub (delivered as sidecar or muxed depending on format).

Final Cut Pro (10.7+)

File → Import → Captions
Choose your .srt file. Captions import as a new caption role attached to your project.
Timeline shows captions as a track. Click any caption to edit text; use the inspector to change font, position, formatting.
File → Share → Master File → Settings → check “Embed Captions” for soft-sub. For burn-in, use the “Burn In” option in the caption role settings.

CapCut (desktop and mobile)

Two paths: (a) Text → Auto Captions to use CapCut’s own AI transcription, or (b) Text → Import → SRT to load your .srt file.
Style panel → font, size, color, animation. CapCut has social-media-friendly caption templates (bounce, glow, typewriter).
Export → Advanced → toggle burn-in or embed depending on destination platform.

iMovie (Mac)

iMovie doesn’t support .srt import directly. Workaround: use the FFmpeg burn-in command (§8) to bake captions into the video first, then edit the burned video in iMovie. Or use another NLE for the caption workflow.

Kapwing / Clideo (browser-based)

Upload video → Add Subtitles → Upload SRT
Position and style in the browser editor
Export (both default to burn-in; check settings for soft-sub option)

Common gotchas

UTF-8 BOM breaks FFmpeg subtitle parser. Strip with sed or save as UTF-8 without BOM.
Non-Latin scripts need explicit font. Default FFmpeg fonts don’t cover Japanese, Arabic, Hindi, Thai. Specify a font that does (Noto family works broadly).
TikTok / Reels strip caption tracks. Always burn-in for these platforms.
YouTube upload: don’t pre-embed captions if you also plan to use YouTube’s own caption system. Both systems will run and duplicate.
MP4 doesn’t natively support .srt. FFmpeg converts to mov_textunder the hood, but some older players (pre-Big Sur QuickTime) still don’t display them. MKV is safer for local playback.
Speaker labels look ugly in subtitles. For .srt output, generate without speaker labels unless you specifically need them for accessibility.
AI segment boundaries don’t respect CPS or line length. For professional work, run through Subtitle Edit or Aegisub, or send to a human captioner.
“99%-accurate” AI claims are marketing. Real segment-boundary accuracy on broadcast-standard timing is much rougher than word-level timestamps.
Timecode drift on long files. Some AI services accumulate small timing errors over multi-hour videos. Spot-check the last cue against the video before publishing.

When a human captioner fits better

Cases where AI-generated .srt (from DeluxeScribe or anyone else) is not the right choice:

Broadcast / OTT delivery with QC.BBC iPlayer, Netflix, HBO Max, Apple TV+, Disney+ all require specific timing standards (line length, CPS, cue gaps) that AI doesn’t deliver reliably. Spec compliance is mandatory; a captioner is the correct choice.
WCAG AAA compliance for legal/regulatory work (courtroom video evidence, medical patient education, government training materials).
High-visibility marketing content where mis-captions damage brand — think Super Bowl ad, investor day keynote, CEO announcement video.
Non-English content where the client is a native speaker of the target language. AI accuracy on non-English speech is worse than English; a native captioner catches errors AI misses.
Low-quality audio (poor mic, heavy background noise, distant speakers). AI struggles here; humans do better with context.

Named alternatives — real vendors, no rankings, just honest options:

Rev human tier — $1.50/audio minute plus captioner review. Fast turnaround, US-based captioners.
3Play Media — broadcast-standard captions; dominant vendor in US streaming and broadcast.
GoTranscript human tier— $1.10–$2.50/audio minute; wider language coverage than most competitors.
Verbit — specialized in education, legal, government. Higher-end pricing, AAA compliance standard.

How this page was verified

FFmpeg subtitle command syntax verified against the official FFmpeg subtitles filter documentation and the mov_text codec reference. BBC/Netflix captioning timing rules (42-char line max, 17 CPS reading speed) come from the BBC Subtitle Guidelines v1.2.0 and Netflix’s published Timed Text Style Guide. NLE import steps verified against Adobe Premiere Pro, DaVinci Resolve 19, Final Cut Pro 10.7, and CapCut desktop documentation. YouTube caption download behavior verified against YouTube Help’s subtitle download documentation. AI timing accuracy ranges come from our own testing of 15 sample videos against reference human-captioned .srt files — see How Accurate Is Whisper for the broader WER-by-condition breakdown. We don’t cite the “99% accuracy” and “elite AI broadcast-ready” claims common on this SERP because we can’t source them to a published benchmark on realistic video captioning.

Related guides

Frequently Asked Questions

Can I generate .srt from any video?

Any video with an audio track that contains speech. Every common format works: MP4, MOV, WebM, MKV, AVI, FLV, WMV, M4V. If the video has no speech (silent film, gameplay without commentary, music-only) there's nothing to transcribe. If the audio is heavily distorted or in a language your service doesn't cover, quality drops. For DeluxeScribe: 99 languages, up to 5 GB per file, 60 minutes free on signup.

What's the difference between .srt, .vtt, and .sbv?

All three are timed-text subtitle formats. .srt (SubRip) is the oldest and most widely supported — plain text with numbered cues and HH:MM:SS,MMM timestamps. .vtt (WebVTT) is the W3C web standard used by HTML5 video and modern platforms; supports styling, positioning, and speaker labels natively. .sbv (SubViewer) is a legacy format YouTube still exposes as a download option. For most use cases: .srt for video editors and desktop players; .vtt for web video and Podcasting 2.0 apps; .sbv only if you're specifically pulling from YouTube Studio.

How do I attach an .srt to my video?

Two ways depending on what you need. Soft-sub (viewer can toggle on/off): use FFmpeg — ffmpeg -i input.mp4 -i subs.srt -c copy -c:s mov_text output.mp4. Burn-in (permanently visible pixels): use FFmpeg — ffmpeg -i input.mp4 -vf 'subtitles=subs.srt' -c:a copy output.mp4. Or use a video editor: Premiere (Text panel → Import Captions), DaVinci Resolve (File → Import → Subtitle), Final Cut (File → Import → Captions), CapCut (Text → Import → SRT). Full commands and editor steps below.

What's the difference between burn-in and soft-sub?

Soft-sub is a separate subtitle track alongside the video in an MP4 or MKV container. Viewer toggles it on/off. YouTube, Vimeo, VLC, QuickTime all support this. Doesn't touch video pixels. Burn-in permanently renders the subtitle text into the video pixels — always visible, no toggle, only reversible by re-editing. Required for TikTok/Reels/Shorts (which strip caption tracks). Rule of thumb: soft-sub for platforms that respect caption tracks (YouTube, Vimeo, streaming); burn-in for social platforms that don't (TikTok, Instagram, LinkedIn).

Can I edit an .srt after generating it?

Yes. .srt is plain text — open in any text editor (VS Code, Sublime, Notepad++). Structure is dead simple: numbered cue, HH:MM:SS,MMM --> HH:MM:SS,MMM timestamp, text lines, blank line separator. For serious caption editing (adjusting CPS, line lengths, timing to broadcast standards): use Subtitle Edit (free, Windows/Mac/Linux) or Aegisub (free, cross-platform). Or export .docx from DeluxeScribe, edit text there, re-export as .srt.

What's the free way to get .srt from a video?

Three paths. (1) YouTube Studio if you own the video: upload → wait 5-30 min for auto-caption → Subtitles → Download as .sbv (convert to .srt with any free tool) or .vtt. Google's auto-caption model is respectable for English. (2) Self-hosted Whisper: pip install openai-whisper, then whisper input.mp4 --model large-v3 --output_format srt. Free forever, private, works on any video. Slow on CPU (10-30x real-time), fast on Apple Silicon or NVIDIA GPU. (3) DeluxeScribe free tier: 60 minutes free one-time, no credit card.

Does YouTube provide .srt files?

For your own videos, yes — YouTube Studio → your video → Subtitles → Download. YouTube exports as .sbv (SubViewer) or .vtt (WebVTT); .srt is not a native export but every free converter tool handles .sbv → .srt trivially. For videos you don't own, YouTube shows a Show Transcript option (three-dot menu below the video) that lets you copy the text — but the copied output is not a properly-timed .srt file. To get an .srt from someone else's YouTube video, you'd need to run AI transcription on the audio (see our YouTube Transcript tool or the paste-URL section below).

Why do my burn-in subtitles look bad?

FFmpeg's default subtitle rendering uses a small font, thin outline, and no background — hard to read on complex video. Fix with force_style flags in the subtitles filter. Example: ffmpeg -i input.mp4 -vf "subtitles=subs.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,BorderStyle=3'" -c:a copy output.mp4. BorderStyle=3 adds a semi-opaque background box (best legibility). For non-Latin scripts (Japanese, Arabic, Hindi) you must specify a font that supports the character set — see the burn-in section below.

How do I add captions to a TikTok?

TikTok strips caption tracks from uploaded videos, so soft-sub doesn't work. You need burn-in. Two paths: (1) Use TikTok's built-in auto-captions feature (Edit → Captions during upload) — free but limited styling. (2) Generate an .srt, burn it into the video with FFmpeg or CapCut, then upload the burned video. Path 2 gives you full control over font, size, position, and timing — essential for creators with brand-consistent caption styling. Same workflow applies to Instagram Reels, YouTube Shorts, and LinkedIn video.

How accurate is AI-generated .srt?

Word-level timestamps: accurate to ~200ms on clean speech. That's professional-grade timing. Segment boundaries (where each caption starts and ends): heuristic, based on pause detection. AI-generated .srt cues often violate professional captioning standards — max 42 characters per line, max 2 lines per cue, max 17 characters per second reading speed. For YouTube, Vimeo, TikTok, or personal use, AI output is fine. For broadcast, streaming platforms with QC (Netflix, BBC iPlayer, HBO), or corporate compliance work, expect 10-30 min of manual cleanup per 30 min of video — or send to a human captioner.