Captions and Transcripts

Who needs captions, when transcripts are required

Images & Media

Audio and video content must be accessible to all users - including people who are deaf or hard of hearing, users in sound-sensitive environments, and anyone who prefers to read rather than listen.

Captions vs transcripts

Captions are synchronized text overlaid on video, timed with the audio
Transcripts are a complete text version of the audio content, provided separately

Both serve different needs. Captions benefit users watching video without audio. Transcripts benefit users who cannot access the video at all, or who prefer to skim.

What captions must include

All spoken dialogue
Speaker identification when not visually obvious
Meaningful non-speech audio: [applause], [phone ringing]

Auto-generated captions are not sufficient

Auto-generated captions (YouTube, Zoom, Teams) are a starting point, not a finished product. They frequently mishear proper nouns, technical terms, and accented speech. Always review and correct before publishing.

Never publish unreviewed auto-captions

Publishing auto-generated captions without review can introduce errors that are worse than no captions - a screen reader user who cannot hear the audio may rely entirely on incorrect text.

When a transcript is required

Audio-only content (podcasts): a transcript is required
Video with audio: captions are required; a transcript is also strongly recommended
Live content: real-time captions are required for WCAG 1.2.4 Level AA

WCAG criteria

Referenced criteria

1.2.1 Audio-only and Video-only (opens in a new tab) - Prerecorded audio-only and video-only content must have alternatives. A

1.2.2 Captions (Prerecorded) (opens in a new tab) - Captions are provided for all prerecorded audio content in synchronized media. A

1.2.4 Captions (Live) (opens in a new tab) - Captions are provided for all live audio content in synchronized media. AA