Interview Transcription: A Complete Guide for Journalists, Researchers, and HR

Three different jobs share one search. The right tool, format, and verbatim style depend on which one is yours.

Interview transcription is three different jobs. Journalists need fast turnaround with timecoded pull-quotes. Qualitative researchers need a verbatim style that fits their method and a transcript that imports cleanly into NVivo, ATLAS.ti, or MAXQDA. Recruiters need clean candidate-comparison output with sensitive content handled. DeluxeScribe transcribes interview audio in 99 languages with speaker labels and exports to TXT, DOCX, PDF, SRT, VTT, and JSON — 60 minutes free. Below: the verbatim-style decision tree, the CAQDAS handoff matrix nobody else publishes, IRB-grade ethics, and an honest tool ranking by use case.
  • 60 minutes free
  • No credit card
  • 99 languages
  • Speaker labels

Last verified June 26, 2026

Not job-interview prep

This page is about converting recorded interview audio to text. If you’re looking for job-interview prep (questions, behavioral frameworks, how to handle salary negotiation), this isn’t it — sites like LeetCode, Glassdoor, or Levels.fyi cover that ground.

Pick your path

Different jobs need different workflows. Pick the row that matches yours.

I’m a…Go to
Journalist transcribing source interviewsJournalist workflow
Qualitative researcher (PhD, UX, social science)Qualitative researcher workflow
Recruiter or HR person transcribing candidate interviewsRecruiter / HR workflow
Police investigator or legal teamPolice / legal interview note
Student transcribing for a class projectRead the researcher workflow

Verbatim vs intelligent verbatim — which to pick

The verbatim style decision is downstream of your method, not a personal preference. Pick the style your method requires; using the wrong style produces transcripts that won’t support your analysis.

Intelligent verbatim

Removes filler words (um, uh, like), false starts, repetitions, and verbal tics. Reads cleanly; preserves meaning, not exact speech. Right for: thematic analysis (Braun & Clarke 2006), journalism quotes, most market research, HR candidate review.

Full / strict verbatim

Every “um,” every pause, every false start, every “like.” Right for: conversation analysis (CA), phenomenological research, legal evidence, and any analysis where speech patterns are part of the data.

Jefferson notation

Full verbatim plus systematic notation for prosody, overlap, latching, intonation, and emphasis (e.g., [overlap], = for latching, (.) for micro-pause, (2.5) for timed pause, :: for elongation). Right for:conversation analysis (CA) specifically — Jefferson’s system is the canonical convention. Time-expensive; budget 6-10 hours per audio hour for full Jefferson transcripts.

Near-verbatim with pauses

Verbatim text plus systematic pause notation (e.g., (pause) or (2.5s)); some prosody optional. Right for: interpretative phenomenological analysis (IPA), where pauses and hesitations carry experiential meaning but full Jefferson would be overkill.

Decision table by method

MethodVerbatim styleEstimated cleanup time
Thematic analysis (Braun & Clarke)Intelligent verbatim1-2 hours per audio hour
Conversation analysis (CA)Jefferson notation6-10 hours per audio hour
Interpretative phenomenological analysis (IPA)Near-verbatim with pauses2-3 hours per audio hour
Discourse analysisFull verbatim with prosody4-6 hours per audio hour
Journalism quotesIntelligent verbatim0.5-1 hour per audio hour
HR candidate reviewIntelligent verbatim0.5-1 hour per audio hour
Police / legal interviewFull verbatim3-5 hours per audio hour (often outsourced)

For journalists

The workflow

  1. Record on phone (Voice Memos, Otter, Riverside) or a dedicated recorder (Zoom H1n, Tascam DR-05)
  2. Upload to a transcription service that produces speaker labels and timecodes
  3. Review the transcript looking for pull-quotes — the moments where the source said something quotable. Bookmark with timestamps.
  4. Drop quotes into your draft with the speaker name and a short context tag
  5. Always verify the quote against the audio before publication — AI transcripts mis-hear names, numbers, and technical terms most often

What matters for journalism

  • Speed.A 1-hour interview should be transcribed in under an hour, not next day. AI services do this; human services don’t.
  • Speaker labels. Multi-source interviews and roundtables fall apart without them.
  • Clickable timestamps.When you’re fact-checking a quote against the audio at 3am before press, click-to-seek matters.
  • Export to DOCX or plain text. Most CMSes and word processors handle these; SRT and VTT are video formats and not what you need.

Tool fit

DeluxeScribe is a good default — fast, cheap per minute, multi-language. Trint is purpose-built for newsroom workflows with stronger collaboration features. Otter has a good editor. Rev’s human-reviewed tier is overkill for most journalism but right for evidentiary quotes (court reporting, congressional testimony).

Consent reality

US journalism school standard is to disclose recording at the start of the interview and obtain verbal consent on the recording itself. State laws vary (one-party vs two-party consent — see the Reporters Committee state table). Off-the-record is a conversation, not a legal status — agree the ground rules before recording.

For qualitative researchers

Verbatim choice

See the verbatim section above. Pick the style your method requires before you upload, not after — re-cleaning a transcript from intelligent to full verbatim doubles the work.

CAQDAS handoff matrix

Each CAQDAS tool has different import tolerances. Plan your export format around the tool you’ll code in.

ToolCleanest import formatTimestamp handlingCommon breakage
NVivoDOCX with embedded timecodes (e.g. [00:01:23])Imports inline timecodes; can sync to mediaRaw auto-caption fragments confuse it; speaker labels need consistent prefix format
ATLAS.tiVTT or SRT directlyNative timestamp support, sync to mediaVery long cues may need splitting; speaker labels parsed from line prefixes
MAXQDASRT native, VTT supportedNative; auto-syncs to imported mediaLong SRT lines may truncate at display layer — split before import for readability
DedooseDOCX or RTFTimestamps treated as text; manual syncNo native media sync — keep timestamps in transcript for manual reference
QuirkosPlain text or DOCXManual timestamp handlingSpeaker labels work if consistent; no media sync

Verify import before you start coding. A failed import three weeks into a project is the qualitative-research equivalent of losing your data.

Ethics, IRB, and consent

Not legal advice. Consult your IRB for your specific protocol.

  • Sub-processor disclosure. Most IRBs now require naming the transcription tool in the consent form. Per Columbia TC IRB guidance and UMassD’s IRB protocol, protocols increasingly require listing which AI service processes the audio, where servers are located, and how long data is retained.
  • US recording consent.Most US states are one-party consent; California, Florida, Pennsylvania, Washington, Illinois, Maryland, Massachusetts, Montana, New Hampshire, and a few others require all-party consent. For US-resident participants, default to all-party consent regardless of state — it’s standard for research ethics.
  • EU residents and GDPR. Interview audio is personal data. You need a lawful basis (usually consent for research). Participants retain rights of access and erasure even mid-study. Document your processing, including sub-processors.
  • Clinical research / HIPAA. If your interview captures Protected Health Information and you work under a HIPAA-covered IRB protocol, you need a Business Associate Agreement with your transcription vendor. DeluxeScribe does not offer a BAA. Use Rev (for some plans), a clinical-grade specialist, or self-hosted Whisper running on institutional hardware.
  • Where DeluxeScribe processes.US-based servers; encryption in transit and at rest. We don’t train models on your audio. Transcripts retained per your account settings; deletable on demand.

Pseudonymization workflow

Before exporting to CAQDAS, replace identifying information with stable pseudonyms — names, places, employers, and anything that could re-identify the participant.

  1. List every identifier across the transcript set
  2. Assign a stable pseudonym for each (P01, P02, etc., or themed pseudonyms if your method calls for it)
  3. Use find-and-replace in a text editor (or a small Python script if you have many transcripts)
  4. Keep an audit log of the mapping in a separate secure file — you may need to re-identify for member-checking or follow-up
  5. Verify pseudonymization is complete before sharing the transcript with co-coders or exporting outside your secure workspace

Try multi-language interview transcription free

60 minutes free, no credit card. Speaker labels, word-level timestamps, and exports to VTT/SRT for ATLAS.ti and MAXQDA — or DOCX for NVivo.

Verbatim quality on hard audio

AI transcription struggles with the audio conditions qualitative interviews often produce:

  • Cross-talk. When researcher and participant overlap, the transcript drops one or mis-attributes both.
  • Accented speech. Modern models handle clear native English at 95%+ but degrade on heavy accents, particularly in supported-but-not-flagship languages.
  • Jargon. Domain-specific terminology (medical, legal, technical) often gets mis-heard the same way humans do — by guessing a more common word.
  • Background noise. Cafes, transit, and kitchens tank accuracy. Plan to clean.

Plan your time accordingly: figure 1.5-3 hours of cleanup per audio hour to reach publication-grade quality regardless of starting accuracy. This is the work; the AI just gives you a head start.

For recruiters and HR

The workflow

  1. Record candidate interview (with consent; check state law)
  2. Upload to a transcription service with speaker labels
  3. Review for hiring-team comparison — pull moments that show fit / concerns / standout responses
  4. Redact before sharing internally (more on this below)
  5. Attach to candidate record or paste excerpts into ATS notes

What matters for HR

  • Speaker labels.Interviewer vs candidate must be clear, especially when sharing with hiring team members who didn’t attend.
  • Redaction for protected-class slips. Candidates sometimes mention age, marital status, pregnancy, religion, national origin, disability, or other protected categories in small talk. Hiring decisions must not be based on these; sharing transcripts that expose them to decision-makers creates documentary evidence of awareness. Redact before sharing.
  • Consent and notification. US one-party vs two-party consent applies. Best practice is to notify all candidates the interview is being recorded, regardless of state.
  • Retention. Many HR teams retain transcripts for 1-2 years post-decision for defensibility. Document the policy and follow it consistently.

ATS integration reality

Most applicant tracking systems (Greenhouse, Lever, Workday) don’t have a built-in transcript field. Options: attach the DOCX to the candidate record, paste key excerpts into the notes section, or use a dedicated interview intelligence tool (BrightHire, Metaview, Pillar) that integrates with the ATS and records/transcribes in one flow.

For police and legal interviews

Not legal advice. Court-certified human transcription is the standard for evidentiary use.

Brief note since the query exists at meaningful volume: police interview transcription is its own discipline with specific requirements:

  • Verbatim style: full verbatim, not intelligent — every word, pause, and verbal tic matters for evidentiary use
  • Output format: DOCX with embedded timestamps; often page-numbered transcripts; sometimes line-numbered for court reference
  • Human review strongly recommended. AI-only transcription carries risk for evidentiary use — a single mis-heard word can change the legal meaning of testimony. Rev offers a human-reviewed tier at $1.50/min; specialist court-reporting services charge more but provide certified transcripts.
  • Chain of custody. Document who handled the audio, when, and where it was stored. This matters for evidentiary admissibility.
  • DeluxeScribe is not a court-certified service. We can produce a first-pass AI transcript for review purposes, but for transcripts entering evidence, use a certified human service.

Honest tool comparison

Ranked by what each one actually does best. No tool is universally best.

ToolFree tierPaid fromBest for interviews when…
DeluxeScribe60 min one-time$10/mo · 1,200 minMulti-language, lowest per-minute, batch jobs, qualitative research
TrintNone$48/mo · 7 hoursNewsroom workflows, collaborative editing, journalist teams
Otter300 min/mo$17/moMeeting-style interviews, calendar integration, lightweight notes
Sonix30 min trial$22/mo · 7 hoursMulti-language with longer CAQDAS track record
Descript1 hour/mo$24/mo · 30 hoursText-based editing — useful for journalism where you cut and rearrange quotes
Rev (human-reviewed)None$1.50/minEvidentiary-grade transcripts, legal use, accuracy-critical research
Whisper (self-hosted)FreeFreeStrict IRB / privacy requirements, HIPAA via institutional hardware

Pricing captured June 2026. Verify before committing.

Turnaround time reality

MethodTime per 1-hour interview
AI services (DeluxeScribe, Otter, Sonix, Trint AI tier)3-10 minutes
Human-reviewed (Rev human, Trint enhanced)24-48 hours
Self-hosted Whisper on CPU10-30 hours
Self-hosted Whisper on GPU~1 hour or less
Manual transcription, intelligent verbatim3-4 hours of work
Manual transcription, full verbatim4-6 hours of work
Manual transcription, Jefferson notation6-10 hours of work

Add cleanup time to the AI estimates if your output standard is research-grade verbatim: 1.5-3 hours per audio hour to polish a first-pass AI transcript to publication quality.

How this page was verified

Verbatim-style → method mapping references Braun & Clarke (2006) on thematic analysis, Jefferson (2004) transcription conventions for conversation analysis, and Smith, Flowers & Larkin (2009) for IPA. CAQDAS handoff specifics come from ATLAS.ti’s import documentation, MAXQDA’s transcription guide, and observed-in-practice NVivo behavior documented in the CAQDAS Networking Project blog. US recording-consent state list is from the Reporters Committee for Freedom of the Press recording guide. IRB sub-processor disclosure expectations are documented in Columbia TC IRB guidance and UMassD IRB protocols. Tool pricing was captured June 2026 from each vendor’s public pricing page. Accuracy ranges are framed conservatively (92-96% on clean interview audio) rather than citing the blanket “99%” that’s common in vendor copy.

Frequently Asked Questions

What is interview transcription?

Converting a recorded interview (audio or video) into written text. The output varies by purpose: journalists want pull-quotes with timestamps for articles, qualitative researchers want a verbatim transcript codable in NVivo / ATLAS.ti / MAXQDA, recruiters want clean candidate-comparison output, and legal use requires full verbatim with embedded timestamps for evidentiary review.

Verbatim vs intelligent verbatim — which should I use?

Intelligent verbatim removes filler words (um, uh, like), false starts, and repetitions; the transcript reads cleanly while preserving meaning. Use it for thematic analysis (Braun & Clarke 2006), most journalism, market research, and HR review. Full verbatim captures every word, pause, and false start; use it for conversation analysis (Jefferson notation), phenomenological research, and legal evidence. The right choice depends on your method.

How long does it take to transcribe a 1-hour interview?

AI services typically finish in 3-10 minutes. Human-reviewed transcription (Rev human tier, Trint enhanced) takes 24-48 hours. Manual transcription at full verbatim takes 4-6 hours of work per hour of audio. Self-hosted Whisper on a CPU is much slower (10-30× real-time); on a GPU it's near real-time. For qualitative research where you'll clean the AI output to publication standard, budget 1.5-3 hours per audio hour for clean-up regardless of starting accuracy.

What's the best transcript format for NVivo, ATLAS.ti, or MAXQDA?

Each CAQDAS tool has different preferences. NVivo imports DOCX with embedded timecodes most cleanly; raw auto-caption fragments can break it. ATLAS.ti accepts VTT or SRT directly. MAXQDA imports SRT natively and supports VTT — long SRT lines may need manual splitting. Plan to export from your transcription service in the format your CAQDAS tool prefers, then verify import before doing your coding work.

Do I need a HIPAA BAA for clinical research interviews?

If your interview contains Protected Health Information (PHI) and you're a covered entity, business associate, or working under an IRB protocol that requires it, then yes. Most consumer transcription tools (DeluxeScribe included) are not HIPAA-compliant and don't offer a BAA. Options: a vendor with a signed BAA (Rev offers this for some plans, others by enterprise contract), self-hosted Whisper running on your institution's hardware, or a clinical-grade specialist service. Always check what your IRB requires before you upload.

What about consent — can I legally record an interview?

In one-party-consent US states (most states), you only need your own consent to record a conversation you're part of. In two-party / all-party-consent states (California, Florida, Pennsylvania, Washington, Illinois, Maryland, Massachusetts, Montana, New Hampshire, and a few others), all participants must consent. Outside the US, rules vary widely — GDPR treats recordings as personal data with extra obligations. Best practice for journalism and research is always to disclose recording and obtain verbal consent on the recording itself. Reporters Committee for Freedom of the Press maintains a state-by-state table.

Should I use AI transcription or a human transcriptionist?

AI is fast (minutes vs days) and cheap ($0-0.25/min vs $1.50+/min for human review), and on clean interview audio it often hits 92-96% word accuracy. Where humans still win: heavily-accented speech, multi-speaker overlap, technical jargon dense enough to fool AI, and any use where evidentiary or publication-grade verbatim matters. Hybrid workflows — AI for the first pass, human review for accuracy-critical sections — are common in journalism and legal use.

How do I pseudonymize transcripts for research?

Replace identifying information (names, places, employers, identifying details) with stable pseudonyms before exporting to CAQDAS. Keep an audit log of the mapping in a separate secure file so you can re-link if needed for member-checking or follow-up. Most transcription tools don't do this automatically — it's a find-and-replace pass in a text editor, or a small Python/regex script if you have many transcripts. Some research-focused services offer this as a feature; check before assuming.