Captions, Transcripts, and Audio Descriptions

First published: 26th March 2021

Last updated: 26th April 2021

Who is this page for?

This page is primarily for content creators and designers.

Summary

Accessible multimedia (visual and auditory content that is synchronized) must include captions—text versions of speech and other important audio content—allowing it to be accessible to people who can't hear all of the audio. © WebAIM

Captions

According to US government figures, one person in eight has some functional hearing limitation, and this number will increase as the average age of the population increases. Beyond people with disabilities, captioning helps people who only partially understand the language presented. Captions are also useful in noisy environments like airports, in quiet environments like libraries, and for multimodal learning.

All multimedia content with speech should have accessible captions that are:

Synchronized to appear at approximately the same time as the corresponding audio.
Equivalent to the spoken words and other audio information.
Accessible, or readily available, to those who need it.

Screenshot of The Tonight Show Starring Jimmy Fallon television broadcast. Captions display on the image

Captions as typically seen on television

The most common type of captions are "Closed" captions, which can be turned on or off. Most countries require most pre-recorded and live television programs to be closed-captioned.

Closed captioning of most pre-recorded television programs is now a legal requirement in most countries. Most live broadcasts (such as news and sports events) and most pre-recorded programs now include closed captions that can be easily enabled and viewed on screen.

Captions as seen on DVD or Blu-ray

On broadcast television, the style and location of the captions depend on the caption decoder built into the viewer's television receiver or streaming device. In online or streaming video, the browser or video player determines how captions will be displayed. Many decoders and video players allow the user to customize caption size, color, font, and location on the screen.

Screenshot of captions in a web media player

Captions as seen in a web media player

Note: Also see our article on real-time captions for information on captioning live web multimedia and broadcasts.

Transcripts

For multimedia, a transcript can also help users who can neither hear the audio nor see the video. Beyond the spoken words, a transcript should include descriptions of important audio information (like laughter) and visual information (such as someone entering the room). Transcripts help deaf/blind users interact with content using refreshable Braille devices.

Transcripts allow anyone that cannot access content from either web audio or video (or both) to read a text transcript instead. Beyond the spoken words, a transcript should include descriptions of important audio information (like laughter) and visual information (such as someone entering the room). Transcripts help deaf/blind users interact with content using refreshable Braille devices. For most web video, both captions and a text transcript should be provided. For content that is audio only, a transcript will usually suffice—captions are not necessary for audio-only media like a podcast.

Transcripts make multimedia content searchable by search engines and users. Screen reader users also may also prefer a transcript over real-time audio, since most proficient screen reader users set their assistive technology to read at a rate much faster than natural human speech.

Important: In order to be optimally accessible to users with auditory disabilities, web multimedia should include both synchronized captions and a transcript.

Audio Descriptions

Important: Visual content within multimedia must be described via audio in order for the multimedia to be optimally accessible to users with visual disabilities.

Audio descriptions help users with visual disabilities perceive content that is presented only visually, and are necessary for WCAG 2 Level AA conformance. On television, this is often called Descriptive Video Service (DVS). Typically, a narrator describes the visual-only content in the multimedia. Audio descriptions can be provided with the primary video, or in another audio track, or via an alternate version of the video that includes audio descriptions.

Here's a short example of an audio description that you might recognize. Can you visualize what is being described?

Producing audio descriptions can be expensive and time-consuming. However, they are unnecessary if the audio already presents the necessary visual content. If a video displays a list of five important items, the items should be read aloud instead of the audio presenting, "As you can see, there are five important points". Instead of, "Click here and then here," the presenter should describe what is being clicked. This way, no separate audio description track is necessary.

External links

Source: WebAim