Podcast Transcription: A Complete Guide for Listeners and Podcasters
Two very different jobs share one search. Pick the path that matches yours.
<podcast:transcript> tag, and a workflow to turn the transcript into show notes. DeluxeScribe transcribes podcast audio in 99 languages with speaker diarization and exports to TXT, DOCX, SRT, VTT, and JSON. 60 minutes free. Below: both paths, an honest tool ranking, the Podcasting 2.0 spec, and the copyright reality nobody else mentions.- 60 minutes free
- No credit card
- 99 languages
- Speaker labels
Last verified June 24, 2026
Pick your path
“Podcast transcription” covers two very different jobs. Use the table to find the one that matches yours.
| I want to… | Go to |
|---|---|
| Read an episode I listened to as a fan | Listener path |
| Transcribe my own podcast episodes | Podcaster path |
| Compare transcription tools honestly | Tools, ranked by criteria |
| Add transcripts to my RSS feed | Podcasting 2.0 spec |
| Turn a transcript into useful show notes | Show-notes workflow |
Listener path — how to read an episode
Apple Podcasts auto-transcripts
Since iOS 17.4 (March 2024), Apple Podcasts auto-generates transcripts for episodes in English, Spanish, French, and German, with additional languages added over time. Transcripts appear under the episode automatically — tap the “quote” icon in the player to view. Limits per Apple’s docs: 10-hour episode cap, no transcripts for music-only segments or songs.
Podcasters can also upload a custom VTT or SRT transcript via Apple Podcasts Connect, which overrides the auto-generated one. Apple ignores Podcasting 2.0 <podcast:transcript> tags in the RSS feed — Apple Podcasts uses Apple’s system only.
Spotify episode transcripts
Spotify rolled out episode transcripts to most shows in the mobile app, with availability varying by region and show language. They’re visible under the episode in the Spotify mobile app but not always in the web player. Spotify also doesn’t export transcripts as files — they’re read-only in-app.
When neither platform has a transcript
For shows on independent hosts (Buzzsprout, Transistor, Captivate, Acast), check the show’s website first — many podcasters publish transcripts on their show notes pages. If there’s no transcript anywhere, you have two options:
- Download the episode audio from any podcast app that allows local downloads, then upload the MP3 to a transcription service.
- Use a service that pulls from podcast URLs. DeluxeScribe accepts uploaded audio files; some services let you paste a podcast RSS or episode URL directly.
Fair use — the part that matters
Not legal advice; consult a lawyer for specific cases.
Transcribing a podcast episode for your own use — to study, to quote in writing, to translate for personal reading — is generally consistent with the four-factor fair-use test in 17 U.S.C. §107. The use is non-commercial, transformative (text from audio), limited to your personal copy, and doesn’t harm the market for the original podcast.
Republishing the full transcripton your website, in a newsletter, or as training data for a model is a different question. You’re reproducing a substantial portion of a copyrighted work for distribution; the transformativeness argument weakens, and the market-effect factor (under Andy Warhol Foundation v. Goldsmith) cuts against you. For redistribution, ask the podcaster for permission — most are happy to grant it for fair purposes.
Podcaster path — transcribe your own episodes
Why publish transcripts
- Accessibility. WCAG 2.1 SC 1.2.1 treats a transcript as the baseline accessibility requirement for audio-only content. Listeners who are deaf or hard of hearing need text, full stop.
- SEO.Search engines can’t index audio. A transcript on your show notes page turns each episode into a discoverable document. Largest lift on shows that cover specific topics, names, or technical terms.
- Repurposing. Pull quotes for social, generate chapter markers, draft a newsletter from the transcript, translate to other languages. The transcript is the input to every downstream content artifact.
- Ad-read auditing. If you sell ads, a transcript lets you confirm reads happened and check copy was delivered correctly.
Speaker labels — the actual hard problem
On a solo show, transcription is easy. On a 2-host show, modern AI services nail speaker labels 95%+ of the time. On 3+ speakers — particularly hybrid setups (host in studio, two remote guests on consumer mics), or guests with similar voices — speaker diarization becomes the limiting factor.
Speaker Error Rate (SER) measures how often a word is attributed to the wrong speaker. On clean 3+ speaker audio, modern services achieve roughly 5-15% SER. On hybrid remote/in-studio recordings, expect 15-25%. The fix is production-side, not tool-side:
- Record each speaker to a separate trackwhen possible (Riverside, SquadCast, Zencastr all do this). Speaker diarization on isolated tracks is essentially perfect — there’s only one voice per file.
- Use the same microphone class for all speakers if you can’t isolate tracks. The model learns characteristics of each voice including the mic colour; mixing a Shure SM7B and a laptop mic confuses it.
- Encode at 16 kHz mono or higher. Most transcription models downsample anyway, but starting from a compressed phone call is a losing battle.
Tools — ranked by defensible criteria
No tool is universally best. Ranked by what each one actually wins at:
| Tool | Free tier | Paid from | Best for | Skip if |
|---|---|---|---|---|
| Apple Podcasts auto-transcripts | Built-in | Free | Listeners on iOS who want to read along | You need the file outside Apple Podcasts |
| DeluxeScribe | 60 min one-time | $10/mo · 1,200 min | Solo creators, multi-language shows, lowest per-minute price | You need text-based audio editing |
| Descript | 1 hour/mo | $24/mo · 30 hours | Text-based editing — delete words in the transcript to cut the audio | You only need a transcript file |
| Otter | 300 min/mo | $17/mo | Meeting-style podcasts, calendar integration | Multi-language or long-form episodes |
| Adobe Podcast | Free transcribe + Enhance | Free for now | One-off transcribes plus audio cleanup | Volume — quotas may tighten |
| AssemblyAI / Deepgram (API) | Free credits | ~$0.12-0.40/hr | Builders integrating transcription into their own app | You’re not a developer |
| Whisper (self-hosted) | Free | Free | Full privacy, no upload, sensitive shows | You don’t want to run Python |
| Rev (human-reviewed tier) | None | $1.50/min | Legal/medical podcasts needing 99%+ accuracy | Cost-sensitive; 24-hour turnaround unacceptable |
Pricing captured June 2026.Verify on each vendor’s pricing page before committing.
Publishing transcripts via Podcasting 2.0
The Podcasting 2.0 namespace defines an open <podcast:transcript> element that lets you advertise a transcript file alongside each episode in your RSS feed. Modern independent podcast apps — Podverse, Fountain, Podcast Guru, CurioCaster— read the tag and render the transcript natively. Apple Podcasts and Spotify use their own systems and don’t currently honor this tag, but the open standard works across the rest of the ecosystem and is the only cross-app option for self-hosted shows.
Example RSS snippet
<item>
<title>Episode 42 — On Transcripts</title>
<enclosure url="https://example.com/audio/ep42.mp3" length="42000000" type="audio/mpeg"/>
<podcast:transcript
url="https://example.com/transcripts/ep42.vtt"
type="text/vtt"
language="en"
rel="captions"/>
<podcast:transcript
url="https://example.com/transcripts/ep42.srt"
type="application/x-subrip"
language="en"/>
</item>You can include multiple <podcast:transcript> elements per episode — different formats, different languages, captions vs full transcript. The rel="captions" attribute signals that a transcript is timed for caption-style display. Hosting can be your own CDN or a podcast host that supports the tag.
Format choice: VTT vs SRT vs JSON
- WebVTT (.vtt) — the W3C standard for web video captions. Best for HTML5 video players and Podcasting 2.0 apps. Recommended default.
- SRT (.srt) — older format, universally supported. Use if you also publish a YouTube version where SRT upload is the path of least resistance.
- JSON — structured data with word-level timestamps. Useful for building search or chapter UI on your own site. Not all podcast apps render JSON transcripts.
Which podcast hosts support the tag
Buzzsprout, Transistor, Captivate, Podbean, RSS.com, Blubrry, Fireside — all support uploading transcripts and emit the <podcast:transcript> tag in their generated RSS feeds. Libsyn and Anchor / Spotify for Podcastersare more limited; check current docs for transcript support. If your host doesn’t emit the tag, you can self-host the transcript files and modify the RSS feed manually if you control it.
Transcript → show notes workflow
Getting a transcript is the easy part. The actual work is turning a 12,000-word transcript into something a listener scrolls past in 30 seconds and decides to play. Here’s a repeatable 20-minute workflow for a 1-hour episode.
1. Two-sentence summary
Paste the transcript into your LLM of choice with this prompt:
Write a 2-sentence summary of this podcast episode. First sentence: what the episode is about + who's on it. Second sentence: the most surprising or non-obvious claim made in the episode. Don't editorialize. Don't add adjectives like "fascinating".
The second-sentence requirement is the trick — it forces the model to find actual content instead of producing generic puffery.
2. Chapter markers from topic transitions
Look for transcript moments where the conversation pivots — usually marked by “So tell us about…”, “Moving on to…”, or a long pause. Label each segment with a 3-5 word chapter title at the timestamp the pivot happens. Modern podcast apps render chapters as a navigable list. If your host supports the Podcasting 2.0 chapters JSON spec, you can publish them in a separate file alongside the transcript.
3. Five to seven quotable lines for social
Search the transcript for: first-person claims (“I think…”, “What we found was…”), specific numbers, contrarian takes that contradict conventional wisdom in the field. Save each quote with its timestamp so you can produce a 20-second audio clip to pair with the quote tile.
4. Three episode title candidates
- Literal titlefor podcast directory SEO — who and what. Example: “Marc Andreessen on AI Regulation”.
- Curiosity titlefor app browsing — a question or unfinished thought. Example: “Why VCs actually fund regulated industries”.
- Provocative titlefor social sharing — a counterintuitive claim from the episode. Example: “The regulation everyone misreads”.
A/B test in your social posts; the “winner” usually isn’t the one you’d pick.
5. Publish
Push the transcript to your CDN, add the <podcast:transcript>tag to your episode in the RSS feed (or use your host’s transcript upload), add the show notes + transcript to your website, and you’re done.
Common production gotchas
- Music intros eat the first 30 seconds. Speech-to-text models often hallucinate lyrics or produce nothing during music. The fix: when you submit for transcription, trim the music intro from the file or instruct the model to skip it.
- Dynamic ad insertion isn’t transcribed. If your host inserts ads at playback time, the transcript you generated from the source file won’t include those ads. Apple Podcasts auto-transcripts also skip dynamic ads. This is usually a feature (you don’t want stale ad transcripts), but worth knowing if you sell host-read, dynamically-inserted ads and want to track delivery.
- Remote guests on bad mics tank diarization accuracy. A guest on AirPods in a cafe will get misattributed words, missed words, and wrong speaker labels. Mitigations: record each speaker locally (double-ender style), run noise reduction (Adobe Podcast Enhance, Krisp) before transcription.
- Cross-talk and laughter destroy diarization.When two people talk simultaneously, the transcript will pick one and drop the other. There’s no current fix at the AI level — recording on isolated tracks is the production-side solution.
- Live episodes drift.If you publish a live episode and then a slightly-edited version, the transcript from the live recording won’t match the published audio. Always generate the transcript from the same file you ship.
How this page was verified
Related guides
- MP3 to TextMost podcasts export as MP3. Covers the per-format workflow and accuracy by recording type.
- M4A to TextIf your podcast is recorded into iPhone Voice Memos or any Apple-native tool, it lands as M4A.
- SRT GeneratorOnce you have a transcript, generate subtitle files for YouTube versions or video podcast clips.
- Voicemail to TextDifferent audio context, same toolkit. Useful if you're researching listener messages on a podcast hotline.