Captions and Transcripts
Who needs captions, when transcripts are required
Audio and video content must be accessible to all users - including people who are deaf or hard of hearing, users in sound-sensitive environments, and anyone who prefers to read rather than listen.
Captions vs transcripts
- Captions are synchronized text overlaid on video, timed with the audio
- Transcripts are a complete text version of the audio content, provided separately
Both serve different needs. Captions benefit users watching video without audio. Transcripts benefit users who cannot access the video at all, or who prefer to skim.
What captions must include
- All spoken dialogue
- Speaker identification when not visually obvious
- Meaningful non-speech audio:
[applause],[phone ringing]
Auto-generated captions are not sufficient
Auto-generated captions (YouTube, Zoom, Teams) are a starting point, not a finished product. They frequently mishear proper nouns, technical terms, and accented speech. Always review and correct before publishing.
Publishing auto-generated captions without review can introduce errors that are worse than no captions - a screen reader user who cannot hear the audio may rely entirely on incorrect text.
When a transcript is required
- Audio-only content (podcasts): a transcript is required
- Video with audio: captions are required; a transcript is also strongly recommended
- Live content: real-time captions are required for WCAG 1.2.4 Level AA