Interview Transcription: A Complete Guide for Journalists, Researchers, and HR
Three different jobs share one search. The right tool, format, and verbatim style depend on which one is yours.
- 60 minutes free
- No credit card
- 99 languages
- Speaker labels
Last verified June 26, 2026
Not job-interview prep
This page is about converting recorded interview audio to text. If you’re looking for job-interview prep (questions, behavioral frameworks, how to handle salary negotiation), this isn’t it — sites like LeetCode, Glassdoor, or Levels.fyi cover that ground.
Pick your path
Different jobs need different workflows. Pick the row that matches yours.
| I’m a… | Go to |
|---|---|
| Journalist transcribing source interviews | Journalist workflow |
| Qualitative researcher (PhD, UX, social science) | Qualitative researcher workflow |
| Recruiter or HR person transcribing candidate interviews | Recruiter / HR workflow |
| Police investigator or legal team | Police / legal interview note |
| Student transcribing for a class project | Read the researcher workflow |
Verbatim vs intelligent verbatim — which to pick
The verbatim style decision is downstream of your method, not a personal preference. Pick the style your method requires; using the wrong style produces transcripts that won’t support your analysis.
Intelligent verbatim
Removes filler words (um, uh, like), false starts, repetitions, and verbal tics. Reads cleanly; preserves meaning, not exact speech. Right for: thematic analysis (Braun & Clarke 2006), journalism quotes, most market research, HR candidate review.
Full / strict verbatim
Every “um,” every pause, every false start, every “like.” Right for: conversation analysis (CA), phenomenological research, legal evidence, and any analysis where speech patterns are part of the data.
Jefferson notation
Full verbatim plus systematic notation for prosody, overlap, latching, intonation, and emphasis (e.g., [overlap], = for latching, (.) for micro-pause, (2.5) for timed pause, :: for elongation). Right for:conversation analysis (CA) specifically — Jefferson’s system is the canonical convention. Time-expensive; budget 6-10 hours per audio hour for full Jefferson transcripts.
Near-verbatim with pauses
Verbatim text plus systematic pause notation (e.g., (pause) or (2.5s)); some prosody optional. Right for: interpretative phenomenological analysis (IPA), where pauses and hesitations carry experiential meaning but full Jefferson would be overkill.
Decision table by method
| Method | Verbatim style | Estimated cleanup time |
|---|---|---|
| Thematic analysis (Braun & Clarke) | Intelligent verbatim | 1-2 hours per audio hour |
| Conversation analysis (CA) | Jefferson notation | 6-10 hours per audio hour |
| Interpretative phenomenological analysis (IPA) | Near-verbatim with pauses | 2-3 hours per audio hour |
| Discourse analysis | Full verbatim with prosody | 4-6 hours per audio hour |
| Journalism quotes | Intelligent verbatim | 0.5-1 hour per audio hour |
| HR candidate review | Intelligent verbatim | 0.5-1 hour per audio hour |
| Police / legal interview | Full verbatim | 3-5 hours per audio hour (often outsourced) |
For journalists
The workflow
- Record on phone (Voice Memos, Otter, Riverside) or a dedicated recorder (Zoom H1n, Tascam DR-05)
- Upload to a transcription service that produces speaker labels and timecodes
- Review the transcript looking for pull-quotes — the moments where the source said something quotable. Bookmark with timestamps.
- Drop quotes into your draft with the speaker name and a short context tag
- Always verify the quote against the audio before publication — AI transcripts mis-hear names, numbers, and technical terms most often
What matters for journalism
- Speed.A 1-hour interview should be transcribed in under an hour, not next day. AI services do this; human services don’t.
- Speaker labels. Multi-source interviews and roundtables fall apart without them.
- Clickable timestamps.When you’re fact-checking a quote against the audio at 3am before press, click-to-seek matters.
- Export to DOCX or plain text. Most CMSes and word processors handle these; SRT and VTT are video formats and not what you need.
Tool fit
DeluxeScribe is a good default — fast, cheap per minute, multi-language. Trint is purpose-built for newsroom workflows with stronger collaboration features. Otter has a good editor. Rev’s human-reviewed tier is overkill for most journalism but right for evidentiary quotes (court reporting, congressional testimony).
Consent reality
US journalism school standard is to disclose recording at the start of the interview and obtain verbal consent on the recording itself. State laws vary (one-party vs two-party consent — see the Reporters Committee state table). Off-the-record is a conversation, not a legal status — agree the ground rules before recording.
For qualitative researchers
Verbatim choice
See the verbatim section above. Pick the style your method requires before you upload, not after — re-cleaning a transcript from intelligent to full verbatim doubles the work.
CAQDAS handoff matrix
Each CAQDAS tool has different import tolerances. Plan your export format around the tool you’ll code in.
| Tool | Cleanest import format | Timestamp handling | Common breakage |
|---|---|---|---|
| NVivo | DOCX with embedded timecodes (e.g. [00:01:23]) | Imports inline timecodes; can sync to media | Raw auto-caption fragments confuse it; speaker labels need consistent prefix format |
| ATLAS.ti | VTT or SRT directly | Native timestamp support, sync to media | Very long cues may need splitting; speaker labels parsed from line prefixes |
| MAXQDA | SRT native, VTT supported | Native; auto-syncs to imported media | Long SRT lines may truncate at display layer — split before import for readability |
| Dedoose | DOCX or RTF | Timestamps treated as text; manual sync | No native media sync — keep timestamps in transcript for manual reference |
| Quirkos | Plain text or DOCX | Manual timestamp handling | Speaker labels work if consistent; no media sync |
Verify import before you start coding. A failed import three weeks into a project is the qualitative-research equivalent of losing your data.
Ethics, IRB, and consent
Not legal advice. Consult your IRB for your specific protocol.
- Sub-processor disclosure. Most IRBs now require naming the transcription tool in the consent form. Per Columbia TC IRB guidance and UMassD’s IRB protocol, protocols increasingly require listing which AI service processes the audio, where servers are located, and how long data is retained.
- US recording consent.Most US states are one-party consent; California, Florida, Pennsylvania, Washington, Illinois, Maryland, Massachusetts, Montana, New Hampshire, and a few others require all-party consent. For US-resident participants, default to all-party consent regardless of state — it’s standard for research ethics.
- EU residents and GDPR. Interview audio is personal data. You need a lawful basis (usually consent for research). Participants retain rights of access and erasure even mid-study. Document your processing, including sub-processors.
- Clinical research / HIPAA. If your interview captures Protected Health Information and you work under a HIPAA-covered IRB protocol, you need a Business Associate Agreement with your transcription vendor. DeluxeScribe does not offer a BAA. Use Rev (for some plans), a clinical-grade specialist, or self-hosted Whisper running on institutional hardware.
- Where DeluxeScribe processes.US-based servers; encryption in transit and at rest. We don’t train models on your audio. Transcripts retained per your account settings; deletable on demand.
Pseudonymization workflow
Before exporting to CAQDAS, replace identifying information with stable pseudonyms — names, places, employers, and anything that could re-identify the participant.
- List every identifier across the transcript set
- Assign a stable pseudonym for each (P01, P02, etc., or themed pseudonyms if your method calls for it)
- Use find-and-replace in a text editor (or a small Python script if you have many transcripts)
- Keep an audit log of the mapping in a separate secure file — you may need to re-identify for member-checking or follow-up
- Verify pseudonymization is complete before sharing the transcript with co-coders or exporting outside your secure workspace
Try multi-language interview transcription free
60 minutes free, no credit card. Speaker labels, word-level timestamps, and exports to VTT/SRT for ATLAS.ti and MAXQDA — or DOCX for NVivo.
Verbatim quality on hard audio
AI transcription struggles with the audio conditions qualitative interviews often produce:
- Cross-talk. When researcher and participant overlap, the transcript drops one or mis-attributes both.
- Accented speech. Modern models handle clear native English at 95%+ but degrade on heavy accents, particularly in supported-but-not-flagship languages.
- Jargon. Domain-specific terminology (medical, legal, technical) often gets mis-heard the same way humans do — by guessing a more common word.
- Background noise. Cafes, transit, and kitchens tank accuracy. Plan to clean.
Plan your time accordingly: figure 1.5-3 hours of cleanup per audio hour to reach publication-grade quality regardless of starting accuracy. This is the work; the AI just gives you a head start.
For recruiters and HR
The workflow
- Record candidate interview (with consent; check state law)
- Upload to a transcription service with speaker labels
- Review for hiring-team comparison — pull moments that show fit / concerns / standout responses
- Redact before sharing internally (more on this below)
- Attach to candidate record or paste excerpts into ATS notes
What matters for HR
- Speaker labels.Interviewer vs candidate must be clear, especially when sharing with hiring team members who didn’t attend.
- Redaction for protected-class slips. Candidates sometimes mention age, marital status, pregnancy, religion, national origin, disability, or other protected categories in small talk. Hiring decisions must not be based on these; sharing transcripts that expose them to decision-makers creates documentary evidence of awareness. Redact before sharing.
- Consent and notification. US one-party vs two-party consent applies. Best practice is to notify all candidates the interview is being recorded, regardless of state.
- Retention. Many HR teams retain transcripts for 1-2 years post-decision for defensibility. Document the policy and follow it consistently.
ATS integration reality
Most applicant tracking systems (Greenhouse, Lever, Workday) don’t have a built-in transcript field. Options: attach the DOCX to the candidate record, paste key excerpts into the notes section, or use a dedicated interview intelligence tool (BrightHire, Metaview, Pillar) that integrates with the ATS and records/transcribes in one flow.
For police and legal interviews
Not legal advice. Court-certified human transcription is the standard for evidentiary use.
Brief note since the query exists at meaningful volume: police interview transcription is its own discipline with specific requirements:
- Verbatim style: full verbatim, not intelligent — every word, pause, and verbal tic matters for evidentiary use
- Output format: DOCX with embedded timestamps; often page-numbered transcripts; sometimes line-numbered for court reference
- Human review strongly recommended. AI-only transcription carries risk for evidentiary use — a single mis-heard word can change the legal meaning of testimony. Rev offers a human-reviewed tier at $1.50/min; specialist court-reporting services charge more but provide certified transcripts.
- Chain of custody. Document who handled the audio, when, and where it was stored. This matters for evidentiary admissibility.
- DeluxeScribe is not a court-certified service. We can produce a first-pass AI transcript for review purposes, but for transcripts entering evidence, use a certified human service.
Honest tool comparison
Ranked by what each one actually does best. No tool is universally best.
| Tool | Free tier | Paid from | Best for interviews when… |
|---|---|---|---|
| DeluxeScribe | 60 min one-time | $10/mo · 1,200 min | Multi-language, lowest per-minute, batch jobs, qualitative research |
| Trint | None | $48/mo · 7 hours | Newsroom workflows, collaborative editing, journalist teams |
| Otter | 300 min/mo | $17/mo | Meeting-style interviews, calendar integration, lightweight notes |
| Sonix | 30 min trial | $22/mo · 7 hours | Multi-language with longer CAQDAS track record |
| Descript | 1 hour/mo | $24/mo · 30 hours | Text-based editing — useful for journalism where you cut and rearrange quotes |
| Rev (human-reviewed) | None | $1.50/min | Evidentiary-grade transcripts, legal use, accuracy-critical research |
| Whisper (self-hosted) | Free | Free | Strict IRB / privacy requirements, HIPAA via institutional hardware |
Pricing captured June 2026. Verify before committing.
Turnaround time reality
| Method | Time per 1-hour interview |
|---|---|
| AI services (DeluxeScribe, Otter, Sonix, Trint AI tier) | 3-10 minutes |
| Human-reviewed (Rev human, Trint enhanced) | 24-48 hours |
| Self-hosted Whisper on CPU | 10-30 hours |
| Self-hosted Whisper on GPU | ~1 hour or less |
| Manual transcription, intelligent verbatim | 3-4 hours of work |
| Manual transcription, full verbatim | 4-6 hours of work |
| Manual transcription, Jefferson notation | 6-10 hours of work |
Add cleanup time to the AI estimates if your output standard is research-grade verbatim: 1.5-3 hours per audio hour to polish a first-pass AI transcript to publication quality.
How this page was verified
Related guides
- Zoom TranscriptionIf your interview was a Zoom call — native vs third-party paths and the non-host workflow.
- Podcast TranscriptionLong-form interview recordings published as podcasts — speaker labels, show notes, fair use.
- MP3 to TextMost interview recordings land as MP3. Covers the per-format workflow and accuracy notes.
- How to Transcribe AudioThe pillar — every path (SaaS, free tools, self-hosted Whisper, native OS) and how to pick.