Back to all guides

TTML File Format Reference

The subset of the TTML specification you need in practice. Tags, namespaces, attributes, and examples for Apple Music.

This guide covers the TTML profile used by Apple Music, Spotify, and most modern synced-lyrics platforms. It is not the full W3C spec. It is the subset you actually need to ship.

Root element and namespaces

Every TTML file starts with a <tt> element that declares the TTML and metadata namespaces:

<tt xmlns="http://www.w3.org/ns/ttml"
    xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
    xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
    ttp:timeBase="media"
    xml:lang="en">

The xmlns attribute is required. The metadata and parameter namespaces are only needed if you use the features that live in them (agents, background vocals, timing configuration). xml:lang tells the player what language the lyrics are in.

Head and metadata

The <head> section carries song metadata and agent declarations:

<head>
  <metadata>
    <ttm:title>Example song</ttm:title>
    <ttm:agent xml:id="v1" type="person">
      <ttm:name>Lead vocalist</ttm:name>
    </ttm:agent>
    <ttm:agent xml:id="v2" type="person">
      <ttm:name>Featured artist</ttm:name>
    </ttm:agent>
  </metadata>
</head>

Every agent gets an xml:id you reference later. The type is usually person, group, character, or other.

Body and paragraphs

The <body> holds a single <div> with one <p> per lyric line:

<body>
  <div>
    <p begin="00:00:12.000" end="00:00:15.200" ttm:agent="v1">
      First lyric line
    </p>
  </div>
</body>

The begin and end attributes are required. Time is in HH:MM:SS.mmm format. The ttm:agent attribute points at an agent id declared in the head.

Word-level spans

Replace the plain text inside a paragraph with timed <span> elements for per-word animation:

<p begin="00:00:12.000" end="00:00:15.200" ttm:agent="v1">
  <span begin="00:00:12.000" end="00:00:12.400">First</span>
  <span begin="00:00:12.400" end="00:00:13.000">lyric</span>
  <span begin="00:00:13.000" end="00:00:15.200">line</span>
</p>

Every word has its own begin and end. Whitespace between spans is preserved as the spacing you see in the output. If two words should not have a space between them, put them in the same span or skip the whitespace in the source.

Background vocals

Background vocals and ad libs use a wrapper span with ttm:role="x-bg":

<p begin="00:00:12.000" end="00:00:15.200" ttm:agent="v1">
  <span begin="00:00:12.000" end="00:00:13.500">Main lyric</span>
  <span ttm:role="x-bg">
    <span begin="00:00:13.000" end="00:00:14.000">oh yeah</span>
  </span>
</p>

The outer x-bg span carries no timing. Inner spans carry the timing. The platform renders the x-bg content as a smaller secondary lyric under the main line.

Time format

Use HH:MM:SS.mmm for every timestamp. Zero padding is required. Examples: 00:00:12.000 is twelve seconds. 00:03:45.500 is three minutes, forty-five and a half seconds. Some parsers also accept MM:SS.mmm but the three-part form is safer.

Character escaping

TTML is XML, so the standard XML escapes apply. Replace & with &amp; and < with &lt;. CallEditor handles this automatically on export.

Ready to try it?

Better Lyrics

A browser extension that adds time-synced, animated lyrics to YouTube Music. Free, open source, and the reason CallEditor exists.

Visit better-lyrics.boidu.dev