Video Captions and Audio Transcripts

This information is also available in PDF format.

The University of Toronto is committed to the principles of the Accessibility for Ontarians with Disabilities Act (AODA). According to the Ontario Regulation 191/11, section 14:

  • By January 1, 2014, new internet websites and web content on those sites must conform with WCAG 2.0 Level A.
  • By January 1, 2021, all internet websites and web content must conform with WCAG 2.0 Level AA, other than, success criteria 1.2.4 Captions (Live), and success criteria 1.2.5 Audio Descriptions (Pre-recorded).

Universal Design for Learning (UDL)

Video captions and audio transcripts align with the UDL guideline: providing multiple means of representation. Captions and transcripts benefit all users with various auditory and learning abilities. Non-native speakers, viewers watching videos on low bandwidth or in noisy/quiet environments, and students learning new terminology can use the captions and transcripts to improve comprehension, information processing, and retention.

What is the difference between captions and transcripts?

Although the terms “captions” and “transcripts” are often used interchangeably, there are subtle differences between the two terms.

Captions are text versions of speech and other important audio content synchronized to the visual and auditory content. The most common type of captions is “Closed Captions,” which can be turned on or off via the “CC” button on video players.

Transcripts are text versions of speech and descriptions of important audio and visual information with no time information attached. Transcripts allow anyone who cannot access the web audio or video content to read a text transcript instead.

Subtitles, on the other hand, are text translations of speech and audio content.

Another term often used in conversation with captions and transcripts is “audio descriptions.” Audio descriptions are narrations that describe visual information needed to understand the content. These narrations inform those who cannot see the video.

For more information about captions, transcripts, subtitles, and audio descriptions, refer to the World Wide Web Consortium (W3C)’s pages on Captions/Subtitles, Transcripts, and Description.

Generating Automatic Captions on Microsoft Stream

What is Microsoft Stream?
Microsoft Stream, a part of Office 365, is a secure video service for uploading, viewing, and sharing videos. Microsoft Stream can generate captions in .vtt format. VTT files are “Web Video Text Tracks” and are the W3C standard for displaying timed text.

Where do I access Microsoft Stream?
Log in to your online Outlook/UTmail+ account and click on the waffle in the top left corner. Click “All apps.” Select the Stream tool.

How do I use Microsoft Stream to generate captions?

  1. Click on “+ Create” to upload video. Stream supports many file formats including .mp4, .avi, .flv, .mkv, .mov, .wav, .wmv.
  2. Set the “Video Language” to automatically generate a caption file.
  3. Set “Permissions.”
  4. In “Options,” check the box for “Autogenerate captions.”
  5. To check and edit the captions, click on “My content,” and click on the video to watch.
  6. The transcript box will appear to the right of the video. Click on the pen icon to edit the transcript. Tip: Insert punctuation to facilitate clarity and ease of reading.
  7. To download the captions, click on the three dots (or ellipsis) menu and select “Update video details.”
  8. Under the “Options” column, located on the far right, click on “Download file” next to the word “Captions.” The file will save in .vtt format.

Can I generate my own .vtt file?
Yes, however, .vtt formatting can be tricky. Visit W3C’s page on WebVTT: The Web Video Text Tracks Format for guidance.

Why can’t I use Microsoft Stream to share my videos?
Videos in Stream are searchable by the U of T community. MyMedia allows sharing with privacy. Who you want to view your video content will determine where your videos are hosted.

Generating Automatic Captions on YouTube

How do I use YouTube to generate captions?

  1. Upload the video, set the language, and set the visibility of the video.
  2. Once the video has uploaded, click on the pen icon to review the details of the video. From the left menu, select “Subtitles.” YouTube automatic captioning can take a while. Budget time for this task.
  3. To check and edit the captions, click on the three dots (or ellipsis) menu next to the word “Published” and select “Edit on Classic Studio.” Once completed, click on the “Return to YouTube Studio” button in the top right corner.
  4. To download the captions, find the video in YouTube Studio, and click on the pen icon to review the details of the video. From the left menu, select “Subtitles.” Click on the three dots (or ellipsis) menu next to the word “Published” and select “Download.” You can download the captions in .vtt, .srt., or .sbv formats. Sometimes YouTube will not automatically generate captions due to audio complexity and video length.

Why can’t I use YouTube to share my videos?
YouTube is blocked in various countries. MyMedia provides a secured and accessible space for all University of Toronto learners.

Adding Captions on MyMedia

What is MyMedia?
MyMedia is an archival storage and streaming solution for University academic media content. MyMedia does not create auto-captions but does allow for uploading captions and sharing with privacy.

Where do I access MyMedia?
To access MyMedia, you will be asked to log in with your UTORid.

How do I use MyMedia?

  1. Click on “New Upload” to upload a video. MyMedia supports most libavcodec video and audio formats.
  2. To add captions to your video, click on the pen icon and select the “Tracks” tab.
  3. Click on “Upload New Track,” then “Choose File,” and select type and language for the .vtt file generated from Microsoft Stream. Tip: Select “Captions” as MyMedia can use the same .vtt file for closed captioning and transcripts.
  4. Click on the video to check if a “CC” button is available on the bottom right corner.
  5. Click on “Transcript” to view the text next to the video.
  6. To edit the captions, return to your dashboard by clicking on the “MyMedia” on the upper left corner. Select the pen icon and select the “Tracks” tab. Download the .vtt file and re-upload when editing is complete.

Generating Transcripts on

What is is an application that generates speech to text transcriptions using artificial intelligence and machine learning. Otter is useful for generating transcripts for audio clips or for videos that have no captions.

Where do I access
Otter is accessible on the web and via a mobile app. The free plan allows 600 minutes of transcription per month, with a maximum of 40 minutes per sitting. Please note: Otter is owned by AISense. Before using the service, read AISense’s Privacy Policy to ensure you are okay with their data collection.

How do I use

  1. Create an Otter account and click the blue microphone button to record.
  2. As you speak, Otter picks up the audio. When you are done, click on the stop button.
  3. Otter will process the recording. Once completed, click on the note to view the transcription.
  4. To edit the transcript, click on the pen icon.
  5. To save the transcript, click on the three dots (or ellipsis) menu and select “Export,” then “Export text.” The transcript can be exported as plain text or a .txt file under the free plan.

What are some alternatives to
Windows and Mac OSX have built-in speech recognition software. Use a microphone to improve the accuracy of the transcriptions.

Using Real-Time Automatic Subtitles in PowerPoint

Which version of Microsoft PowerPoint has real-time subtitles?
PowerPoint for Microsoft 365 can generate real-time automatic subtitles in Windows 10, Mac, and Microsoft Edge, Google Chrome 34+, Mozilla Firefox 25+ web browsers.

How do I use real-time subtitles?

  1. On the “Slide Show” (sometimes “View”) ribbon tab, check “Use Subtitles,” and select “Subtitle Settings” to set the spoken language, subtitle language, subtitle placement.
  2. If the real-time automatic subtitle feature does not turn on while you are presenting, click on the “Toggle Subtitles” button (looks like a rectangle with dashes near the bottom) on the toolbar below the slide. Real-time automatic subtitles depend on a cloud-based service, which requires a fast and reliable internet connection.

Can I ask a student to provide real-time captioning?
Real-time captioning requires a skilled transcriber. No student should bear the responsibility of providing accurate captioning service in real-time.

Using Real-Time Captioning in a Teams Meeting

When can I use real-time captioning in Microsoft Teams?
Teams can detect what is said in a meeting and present live captioning (only in English (US) for now).

How do I use live captioning?
On the “Meeting Controls bar” (located on the bottom center), click on the three dots (or ellipsis) menu and select “Turn on live captions (preview).” The live captions will appear in the lower left of the meeting screen.