Karaoke-Style Lyrics Guide
How to produce the bouncing-word karaoke effect: syllable splitting, pacing to the vocal onset, and held-note handling.
Karaoke-style lyrics highlight each word or syllable as the singer sings it. The bouncing-ball effect, the word-by-word color change, the syllable-level fill: they all come from fine-grained timing inside a TTML file. This guide covers the techniques that make karaoke lyrics feel right.
Word timing vs syllable timing
Word timing means one span per word. Syllable timing means one span per syllable. Platforms that render "bouncing" or "filling" lyrics usually support both, though each word-span still animates as a unit.
Use word timing for most content. Reserve syllable timing for slow ballads, stretched vowels, or sections where the artist clearly pronounces each syllable beat by beat.
Splitting a word into syllables
In CallEditor, use the split action on any word to break it into smaller timed units. The split distributes the word's duration proportionally based on syllable length. Fine tune the boundary in the timeline.
<!-- Before: one word span --> <span begin="00:00:15.000" end="00:00:17.000">beautiful</span> <!-- After: three syllable spans --> <span begin="00:00:15.000" end="00:00:15.500">beau</span> <span begin="00:00:15.500" end="00:00:16.200">ti</span> <span begin="00:00:16.200" end="00:00:17.000">ful</span>
Keep the total span of the original word intact. Starting at 15.0 and ending at 17.0 is the same total; you are just subdividing.
Pacing and the vocal onset
Each word's begin time should match the vocal onset, not the beat. Singers often lag slightly behind the beat or push slightly ahead. Match the voice, not the drum.
A good rule: if you close your eyes and listen, the word that is currently highlighted should be the word you hear being sung right now. If the highlight is slightly ahead, the begin time is too early.
Held notes and sustains
When a singer sustains a word, extend its end time rather than creating a gap. The word stays highlighted until the next word begins. Gaps between words feel wrong during slow sections.
For a word that bleeds into the next line, end it at the next line's begin time. Never overlap timings.
Breaths and non-lyrical sounds
Do not create spans for breaths, "ooh", or "ah" unless they are in the lyric sheet. Keep the lyrics to what is actually written. The animation flows better with accurate lyric lines than with captured ad libs.
If an ad lib is a meaningful part of the song, add it as a background vocal with an x-bg span so it renders as a secondary line.
Testing your karaoke timing
Use the preview view in CallEditor. Watch the animation as the audio plays. The highlighted word should match the word being sung at all times.
Common issues to catch:
- Words that highlight late: begin time is too high
- Words that snap off before the singer finishes: end time is too early
- Long gaps between words during sustained notes: end the sustained word at the next word's begin
- Multiple words highlighted at once: overlapping spans; re-check boundaries
Keyboard flow for fast syncing
CallEditor is built for keyboard-first syncing. Bind the "next word" shortcut to a comfortable key and leave the mouse alone while the track plays. You will sync an entire song in one playthrough once the keybinding feels natural.
The Apple Music workflow guide walks through the full sync process. The background vocals guide covers the x-bg usage in detail.