Podcast Transcription: A Complete Guide for Listeners and Podcasters

Two very different jobs share one search. Pick the path that matches yours.

If you want to read an episode you listened to, check Apple Podcasts and Spotify first — both ship auto-generated transcripts on supported shows, for free. If you’re a podcaster transcribing your own episodes, you need three things the listener-side guides skip: speaker labels that actually work on 3+ speakers, a way to publish transcripts in your RSS feed via the open <podcast:transcript> tag, and a workflow to turn the transcript into show notes. DeluxeScribe transcribes podcast audio in 99 languages with speaker diarization and exports to TXT, DOCX, SRT, VTT, and JSON. 60 minutes free. Below: both paths, an honest tool ranking, the Podcasting 2.0 spec, and the copyright reality nobody else mentions.
  • 60 minutes free
  • No credit card
  • 99 languages
  • Speaker labels

Last verified June 24, 2026

Pick your path

“Podcast transcription” covers two very different jobs. Use the table to find the one that matches yours.

I want to…Go to
Read an episode I listened to as a fanListener path
Transcribe my own podcast episodesPodcaster path
Compare transcription tools honestlyTools, ranked by criteria
Add transcripts to my RSS feedPodcasting 2.0 spec
Turn a transcript into useful show notesShow-notes workflow

Listener path — how to read an episode

Apple Podcasts auto-transcripts

Since iOS 17.4 (March 2024), Apple Podcasts auto-generates transcripts for episodes in English, Spanish, French, and German, with additional languages added over time. Transcripts appear under the episode automatically — tap the “quote” icon in the player to view. Limits per Apple’s docs: 10-hour episode cap, no transcripts for music-only segments or songs.

Podcasters can also upload a custom VTT or SRT transcript via Apple Podcasts Connect, which overrides the auto-generated one. Apple ignores Podcasting 2.0 <podcast:transcript> tags in the RSS feed — Apple Podcasts uses Apple’s system only.

Spotify episode transcripts

Spotify rolled out episode transcripts to most shows in the mobile app, with availability varying by region and show language. They’re visible under the episode in the Spotify mobile app but not always in the web player. Spotify also doesn’t export transcripts as files — they’re read-only in-app.

When neither platform has a transcript

For shows on independent hosts (Buzzsprout, Transistor, Captivate, Acast), check the show’s website first — many podcasters publish transcripts on their show notes pages. If there’s no transcript anywhere, you have two options:

  • Download the episode audio from any podcast app that allows local downloads, then upload the MP3 to a transcription service.
  • Use a service that pulls from podcast URLs. DeluxeScribe accepts uploaded audio files; some services let you paste a podcast RSS or episode URL directly.

Fair use — the part that matters

Not legal advice; consult a lawyer for specific cases.

Transcribing a podcast episode for your own use — to study, to quote in writing, to translate for personal reading — is generally consistent with the four-factor fair-use test in 17 U.S.C. §107. The use is non-commercial, transformative (text from audio), limited to your personal copy, and doesn’t harm the market for the original podcast.

Republishing the full transcripton your website, in a newsletter, or as training data for a model is a different question. You’re reproducing a substantial portion of a copyrighted work for distribution; the transformativeness argument weakens, and the market-effect factor (under Andy Warhol Foundation v. Goldsmith) cuts against you. For redistribution, ask the podcaster for permission — most are happy to grant it for fair purposes.

Podcaster path — transcribe your own episodes

Why publish transcripts

  • Accessibility. WCAG 2.1 SC 1.2.1 treats a transcript as the baseline accessibility requirement for audio-only content. Listeners who are deaf or hard of hearing need text, full stop.
  • SEO.Search engines can’t index audio. A transcript on your show notes page turns each episode into a discoverable document. Largest lift on shows that cover specific topics, names, or technical terms.
  • Repurposing. Pull quotes for social, generate chapter markers, draft a newsletter from the transcript, translate to other languages. The transcript is the input to every downstream content artifact.
  • Ad-read auditing. If you sell ads, a transcript lets you confirm reads happened and check copy was delivered correctly.

Speaker labels — the actual hard problem

On a solo show, transcription is easy. On a 2-host show, modern AI services nail speaker labels 95%+ of the time. On 3+ speakers — particularly hybrid setups (host in studio, two remote guests on consumer mics), or guests with similar voices — speaker diarization becomes the limiting factor.

Speaker Error Rate (SER) measures how often a word is attributed to the wrong speaker. On clean 3+ speaker audio, modern services achieve roughly 5-15% SER. On hybrid remote/in-studio recordings, expect 15-25%. The fix is production-side, not tool-side:

  • Record each speaker to a separate trackwhen possible (Riverside, SquadCast, Zencastr all do this). Speaker diarization on isolated tracks is essentially perfect — there’s only one voice per file.
  • Use the same microphone class for all speakers if you can’t isolate tracks. The model learns characteristics of each voice including the mic colour; mixing a Shure SM7B and a laptop mic confuses it.
  • Encode at 16 kHz mono or higher. Most transcription models downsample anyway, but starting from a compressed phone call is a losing battle.

Try multi-speaker podcast transcription free

60 minutes free, no credit card. Automatic speaker labels, timestamps to the millisecond, and exports to VTT/SRT for your show notes page.

Tools — ranked by defensible criteria

No tool is universally best. Ranked by what each one actually wins at:

ToolFree tierPaid fromBest forSkip if
Apple Podcasts auto-transcriptsBuilt-inFreeListeners on iOS who want to read alongYou need the file outside Apple Podcasts
DeluxeScribe60 min one-time$10/mo · 1,200 minSolo creators, multi-language shows, lowest per-minute priceYou need text-based audio editing
Descript1 hour/mo$24/mo · 30 hoursText-based editing — delete words in the transcript to cut the audioYou only need a transcript file
Otter300 min/mo$17/moMeeting-style podcasts, calendar integrationMulti-language or long-form episodes
Adobe PodcastFree transcribe + EnhanceFree for nowOne-off transcribes plus audio cleanupVolume — quotas may tighten
AssemblyAI / Deepgram (API)Free credits~$0.12-0.40/hrBuilders integrating transcription into their own appYou’re not a developer
Whisper (self-hosted)FreeFreeFull privacy, no upload, sensitive showsYou don’t want to run Python
Rev (human-reviewed tier)None$1.50/minLegal/medical podcasts needing 99%+ accuracyCost-sensitive; 24-hour turnaround unacceptable

Pricing captured June 2026.Verify on each vendor’s pricing page before committing.

Publishing transcripts via Podcasting 2.0

The Podcasting 2.0 namespace defines an open <podcast:transcript> element that lets you advertise a transcript file alongside each episode in your RSS feed. Modern independent podcast apps — Podverse, Fountain, Podcast Guru, CurioCaster— read the tag and render the transcript natively. Apple Podcasts and Spotify use their own systems and don’t currently honor this tag, but the open standard works across the rest of the ecosystem and is the only cross-app option for self-hosted shows.

Example RSS snippet

<item>
  <title>Episode 42 — On Transcripts</title>
  <enclosure url="https://example.com/audio/ep42.mp3" length="42000000" type="audio/mpeg"/>
  <podcast:transcript
    url="https://example.com/transcripts/ep42.vtt"
    type="text/vtt"
    language="en"
    rel="captions"/>
  <podcast:transcript
    url="https://example.com/transcripts/ep42.srt"
    type="application/x-subrip"
    language="en"/>
</item>

You can include multiple <podcast:transcript> elements per episode — different formats, different languages, captions vs full transcript. The rel="captions" attribute signals that a transcript is timed for caption-style display. Hosting can be your own CDN or a podcast host that supports the tag.

Format choice: VTT vs SRT vs JSON

  • WebVTT (.vtt) — the W3C standard for web video captions. Best for HTML5 video players and Podcasting 2.0 apps. Recommended default.
  • SRT (.srt) — older format, universally supported. Use if you also publish a YouTube version where SRT upload is the path of least resistance.
  • JSON — structured data with word-level timestamps. Useful for building search or chapter UI on your own site. Not all podcast apps render JSON transcripts.

Which podcast hosts support the tag

Buzzsprout, Transistor, Captivate, Podbean, RSS.com, Blubrry, Fireside — all support uploading transcripts and emit the <podcast:transcript> tag in their generated RSS feeds. Libsyn and Anchor / Spotify for Podcastersare more limited; check current docs for transcript support. If your host doesn’t emit the tag, you can self-host the transcript files and modify the RSS feed manually if you control it.

Transcript → show notes workflow

Getting a transcript is the easy part. The actual work is turning a 12,000-word transcript into something a listener scrolls past in 30 seconds and decides to play. Here’s a repeatable 20-minute workflow for a 1-hour episode.

1. Two-sentence summary

Paste the transcript into your LLM of choice with this prompt:

Write a 2-sentence summary of this podcast episode.
First sentence: what the episode is about + who's on it.
Second sentence: the most surprising or non-obvious claim made in the episode.
Don't editorialize. Don't add adjectives like "fascinating".

The second-sentence requirement is the trick — it forces the model to find actual content instead of producing generic puffery.

2. Chapter markers from topic transitions

Look for transcript moments where the conversation pivots — usually marked by “So tell us about…”, “Moving on to…”, or a long pause. Label each segment with a 3-5 word chapter title at the timestamp the pivot happens. Modern podcast apps render chapters as a navigable list. If your host supports the Podcasting 2.0 chapters JSON spec, you can publish them in a separate file alongside the transcript.

3. Five to seven quotable lines for social

Search the transcript for: first-person claims (“I think…”, “What we found was…”), specific numbers, contrarian takes that contradict conventional wisdom in the field. Save each quote with its timestamp so you can produce a 20-second audio clip to pair with the quote tile.

4. Three episode title candidates

  • Literal titlefor podcast directory SEO — who and what. Example: “Marc Andreessen on AI Regulation”.
  • Curiosity titlefor app browsing — a question or unfinished thought. Example: “Why VCs actually fund regulated industries”.
  • Provocative titlefor social sharing — a counterintuitive claim from the episode. Example: “The regulation everyone misreads”.

A/B test in your social posts; the “winner” usually isn’t the one you’d pick.

5. Publish

Push the transcript to your CDN, add the <podcast:transcript>tag to your episode in the RSS feed (or use your host’s transcript upload), add the show notes + transcript to your website, and you’re done.

Common production gotchas

  • Music intros eat the first 30 seconds. Speech-to-text models often hallucinate lyrics or produce nothing during music. The fix: when you submit for transcription, trim the music intro from the file or instruct the model to skip it.
  • Dynamic ad insertion isn’t transcribed. If your host inserts ads at playback time, the transcript you generated from the source file won’t include those ads. Apple Podcasts auto-transcripts also skip dynamic ads. This is usually a feature (you don’t want stale ad transcripts), but worth knowing if you sell host-read, dynamically-inserted ads and want to track delivery.
  • Remote guests on bad mics tank diarization accuracy. A guest on AirPods in a cafe will get misattributed words, missed words, and wrong speaker labels. Mitigations: record each speaker locally (double-ender style), run noise reduction (Adobe Podcast Enhance, Krisp) before transcription.
  • Cross-talk and laughter destroy diarization.When two people talk simultaneously, the transcript will pick one and drop the other. There’s no current fix at the AI level — recording on isolated tracks is the production-side solution.
  • Live episodes drift.If you publish a live episode and then a slightly-edited version, the transcript from the live recording won’t match the published audio. Always generate the transcript from the same file you ship.

How this page was verified

Platform support claims (Apple Podcasts iOS 17.4+ transcripts, 10-hour cap, custom VTT/SRT upload) come from Apple Podcasters Support. Podcasting 2.0 transcript-tag spec is from the Podcasting 2.0 namespace. Accessibility framing is from WCAG 2.1 SC 1.2.1. Fair-use bracketing references 17 U.S.C. §107, Authors Guild v. Google (2nd Cir. 2015), and Andy Warhol Foundation v. Goldsmith (SCOTUS 2023). Tool pricing in the comparison table was captured June 2026 from each vendor’s public pricing page. Accuracy claims are framed as “typically 95%+ on clean audio” rather than a fixed percentage because we don’t have a single published benchmark covering all the services we list.

Frequently Asked Questions

How do I get a transcript of a podcast I listened to?

On Apple Podcasts (iOS 17.4+), auto-generated transcripts show under the episode if the show is in a supported language. On Spotify, episode transcripts appear for many shows in the mobile app. If neither has one, the only other reliable path is to upload the audio to a transcription service. Most podcast apps allow you to download the episode audio first.

Is it legal to transcribe someone else's podcast?

For personal study, notes, or quoting in writing under fair use (17 USC §107), generally yes. Republishing the full transcript on your own site or in a newsletter is a separate question — that's reproducing a substantial portion of a copyrighted work and is unlikely to qualify as fair use. The short rule: transcribing for yourself is usually fine; redistributing the transcript is risky without permission.

How accurate is AI podcast transcription?

On clean studio audio with a solo host, modern AI services hit 95%+ word accuracy. With three or more speakers, accuracy on individual words usually stays high, but speaker diarization (knowing who said what) is the actual hard problem — speaker error rates of 5-15% are typical even on good services. Music intros, dynamically-inserted ads, and remote guests on low bandwidth degrade results further.

Should I use Apple Podcasts transcripts or transcribe my own episodes?

Apple Podcasts auto-transcripts are free and good enough for listeners who want to read along. As a podcaster, you should still produce your own transcripts: Apple's are Apple-only (not in your RSS feed, not on other apps, not editable), don't include dynamically-inserted ads, and don't show up on your website. Use Apple's as a fallback for listeners while publishing your own via the Podcasting 2.0 <podcast:transcript> tag.

What is the Podcasting 2.0 transcript tag?

It's an open RSS feed extension (defined at podcastindex.org) that lets you publish a transcript file alongside your podcast episode. Apps like Podverse, Fountain, and Podcast Guru read the tag and display transcripts natively. Apple Podcasts uses its own auto-generated transcripts and doesn't currently honor third-party <podcast:transcript> tags, but the open standard works across the independent podcast ecosystem.

Can I edit a podcast by editing the transcript?

Yes, but with one specific tool — Descript pioneered this with its overdub/text-based editing model. Other tools (DeluxeScribe, Otter, Adobe Podcast) produce a transcript but don't let you delete a sentence in text and have it propagate to the audio. If text-based editing is the core workflow, pick Descript. If you want the transcript as a deliverable but edit audio in Logic, Audition, or Reaper, the others are cheaper.

How long does it take to transcribe a 1-hour podcast?

AI services typically finish in 3-10 minutes for a 1-hour episode. Human-reviewed services (Rev's human tier, Trint's human-reviewed tier) take 24-48 hours. Self-hosted Whisper on a CPU can take 30 minutes to several hours per hour of audio; on a recent GPU it's near real-time.

Do transcripts help podcast SEO?

Yes, when you publish them as text on your show notes page. Search engines can't index audio directly, so a transcript is what makes your episode content discoverable in Google. The SEO lift is largest for shows that cover specific topics, names, or terms a listener might search for. Generic 'two friends chatting' shows benefit less because the transcript doesn't contain searchable intent.