Introduction

Effortlessly sync text with audio in Unity — ideal for dialogues, subtitles, karaoke, interactive books, and more. Create dynamic, eye-catching text effects with ease!

Requirements

For the correct work, «Audio-Text Synchronizer» requires:

Audio file;
Text of the audio file (can be optional in case of using whisper.unity).

How do I upgrade?

First, back up your project.
Next, update the asset to the latest version from Asset Store using Package Manager (Window -> Package Manager).
The asset has a comparison list of APIs with changes. To check it, open the v.2.0_API_Changes.pdf file.
Also, check out this section: Upgrading from version 1.x to 2.x.
If you have any issues or errors, contact me. I'll be happy to help!

Quick Start

«Audio-Text Synchronizer» contains two parts:

Editor Window for creating timings. Timing is a time segment on the audio timeline, which contains the text. They allow getting high accuracy of synchronization of audio and text. Timings can be generated in 3 ways:
- Automatically from AudioClip using whisper.unity package;
- From subtitles file in .srt format;
- Manually in the TimingsEditor.
Runtime part: to synchronize text with audio, you need to add one component and set up a few parameters.

Timings Editor

For text and audio synchronization, you need to create timings of AudioClip, add the TextSynchronizer component, and setup its parameters.
To create timings user can open Timings Editor from «Window -> Audio Text Synchronizer -> Timings Editor» or create PhraseAsset using menu Assets/Audio Text Synchronizer/Audio Timings and open it using double-click:
Create Audio Timings

You can choose the AudioClip from the window using the AudioClip field: Audio Clip
After that Timeline of the current AudioClip is displayed:
A red vertical line on the left of the timeline is a cursor of the current position of playing audio. By double-clicking on the timeline user can move the cursor. At the bottom of the cursor is the current audio time position. Let's look at the elements of the window:
Timings Editor items

Pointer 1 contains buttons - New, Load, Save, Language popup, Generate timings and Validate buttons.
- New button creates new timings;
- Load button loads existing or creates timings from subtitles (.srt) files;
- Save button save current timings to file;
- Language popup allows to select AudioClip language to provide more accuracy timings generation;
- Max Length sets the maximum timing length, if the Max Limit is set to 0, then the limit is disabled;
- Generate timings button generates timings using whisper.unity package.
- Validate button validates timings and print result of the validation to the Console tab.
Pointer 2 contains information about zooming of the timeline and the auto-save feature. You can zoom the timeline using a mouse scroll-wheel or touchpad. For zoom-in, scroll up. For zoom out, scroll down. When checkbox «Auto Save» is on, saved timings data will be saved automatically in case of timings were saved to the file;
Pointer 3 contains buttons - Add Timing, Remove Timing, Play/Pause and Play Selected Timing:

Add Timing button adds new timing into the current position of the cursor on the timeline;
Remove Timing button remove selected timings;
Play/Pause button plays AudioClip in the current position of the cursor on the timeline;
Play Selected Timing button plays a segment of audio containing current timing.

Pointer 4 contains a text field. In this text should be all text of AudioClip;
Pointer 5 contains the AudioClip timeline.

Manual creating timings

Let’s add timings to the timeline. To add timing, click the Add Timing button or the right mouse button and choose the item «Insert new timing». Timing will be added to the current cursor position on the timeline:

New timing will be added and selected. At the bottom of the window, you can find information about new timing. This box contains information about «Timing name», «Start» position of timing, «End» position of timing, «Duration» of timing, «Color» of timing, and «Timing Text» - text, which timing contains.
Now let’s add text to timings and move them to the correct positions. To change the timing duration, drag-n-drop timing left and right borders.

Use mouse scroll to zoom in/out the timeline.

Hold the «Shift» key and drag-n-drop its borders to move timing faster.

Hold the «Alt» key and drag-n-drop its borders to get better precision and move timing slower.

For example, let's resize this timing in that way: start position at 0.0052 sec, end position at 0.0842 sec, and add the text «A squirrel»:

To make sure that it’s correct, press the button «Play Selected Timing» to play the AudioClip segment:

In this way, add timings for all AudioClip timeline. It is also possible to choose a Color and Name of timings for convenience. Note that setting up the Color and Name is unnecessary and displays only in Timings Editor. To select many timings and move them, use the selection rectangle by mouse: Timings added
Also, Timings should contain the main text at the bottom of the Window - it is the whole text of AudioClip: Timings text
After finishing work with timings, timings can be saved in the file using the button Save. Timings Editor will ask to choose a filename and path, where the timings asset will be saved: Timings saving
For loading timings from the file, double-click with the left mouse button on the timings asset (PhraseAsset), or select the timings asset and press Open in Timings Editor button:
To resize timeline height, you can drag-n-drop slider: Timings Editor size
Saved Timings can be displayed in the Inspector Window. You can edit it in Inspector instead of Timings Editor: PhraseAsset in Inspector Window
Timings can be separated by word or characters. User can divide audio on timing as necessary. For better synchronization accuracy, add the appropriate number of timings. It is recommended to add timings to phrases or words, depending on AudioClip.

Automatic generating timings

Timings can be generated automatically using whisper.unity package. It provides high-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model running on your local machine (no internet required).
This works with 90+ languages and require Whisper model weight. The smallest model means worse quality of automatic speech recognition but works faster compared to other models. The bigger model size means better quality but works slower and uses more memory. Using «whisper.unity», you can generate timings in runtime, and recognize text using microphone record.
To install «whisper.unity», open «Window -> Audio Text Synchronizer -> Whisper Installer»:
Press Install button to install «whisper.unity» package. It will download and install the latest compatible version from the github. The progress of the installation will be displayed in the Console tab.

If you're getting a message in Console: No 'git' executable was found. Please install Git on your system and restart Unity. Please install git and restart your system.

You can also install «whisper.unity» manually using Package Manager from github. I recommend using Whisper Installer as it installs compatible version of «whisper.unity» with «Audio-Text Synchronizer».
To uninstall «whisper.unity», open Package Manager, select «whisper.unity» package and click on Remove button.
After installation you need to download at least one Whisper model. Choose the model you want to download and press Download button: Whisper Installer

Downloaded models are located in Assets/StreamingAssets/Whisper/ folder. You can choose and download the available models in the WhisperSettings that is located at: Assets/AudioTextSynchronizer/Resources/WhisperSettings.asset: Whisper Installer

After that you can generate timings in TimingsEditor. For that open Timings Editor from «Window -> Audio Text Synchronizer -> Timings Editor», choose the AudioClip, Language (optional) and click on the Generate timings button:

Please not that AudioClip Load Type should be set as "Decompress On Load".

That's all! Now you can save the timings using Save button. Please note that speech recognition accuracy may vary with different model weights. To get the best speech recognition result use the biggest model.
Asset also supports batch processing of the AudioClips. For that, select AudioClips in the Project tab and choose in the menu «Assets -> Create -> Audio Text Synchronizer -> Generate Timings»:

Please note that I added ability to use «whisper.unity» in «Audio-Text Synchronizer», but I cannot guarantee flawless performance and work of «whisper.unity» package. Using the «whisper.unity» package is optional.

Timings Validation

Timings Validation - it’s functional that helps to validate timings text corresponding main text of the audio clip. After validation, it reports if there are problems with timings to the «Console» window. To use it, fill timings with corresponding text and press Validate button in the left top corner of the «Timings Editor» window: Timings validation
Timings validation result
From the message, we can see that concatenation of all timings text doesn't match with all timings text and the second timing has less length than expected. Now we can fix the text of the second timing and validate it to be sure that it is correct: Fixed timing

TextSynchronizer

TextSynchronizer is the main runtime script of the asset. TextSynchronizer component gets the current position of AudioSource playing, finds the appropriate timing, and sends the data (current timing, percent of finish of current timing) to TextEffectBase class. It also requires GameObject with Text component and works using System.Reflection, which allows to use of custom Text components. So, you need to choose a GameObject, Component and Property name from the dropdown list and the Text Effect. To start working, add the component TextSynchronizer to some GameObject. TextSynchronizer component
The next step is to choose a GameObject, Component, or Property of the text component. If your scene doesn't have a text component, create a new GameObject and add UI.Text or TextMesh Pro component:

Note that if you use RTL (right to the left) text, you need to check "Enable RTL Editor" in the TextMeshPro component parameters.
Then set AudioSource with AudioClip, Phrase Asset (timings), and add Text Effect: TextSynchronizer setup

That's all, your TextSynchronizer component is ready. Note that AudioSource should play for synchronization.

Text Effects

Text Effect - is ScriptableObject that contains the logic for modifying text components during synchronization.
Asset has two type of text effects: Rich and Mesh text effect.
1. Rich Effect (Assets/AudioTextSynchronizer/TextEffects/Rich/RichTextEffect.asset) uses RichText that markups format for text highlighting. RichText can provide such features as changing text color, bolding text, resizing, etc. Rich Effect parameters:

Text Split Config - reference to TextSplitter ScriptableObject;
Text Highlight Config - reference to TextHighlighter ScriptableObject;
Start Tag - start tag, which is inserted into the text component;
End Tag - tag, which closes the StartTag;
Highlight Color - the color of highlighting the rich tag;
Char Index Offset - highlight position offset in characters count;
Effect Finished Action - an action that will be performed after effect finish sync: None, Clear Text, or Set Current Part Text.

2. Mesh Effect (Assets/AudioTextSynchronizer/TextEffects/Mesh/MeshTextEffect.asset) - the effect that uses text component mesh data - position, UV, color, etc. Mesh Effect parameters:

Text Split Config - reference to TextSplitter ScriptableObject;
Text Highlight Config - reference to TextHighlighter ScriptableObject;
Mesh Animation - a reference to the Mesh Animation config. Mesh Animation config it is a ScriptableObject that contains mesh animation logic of the text component. The asset has the following Mesh Animation configs:
AlphaColorMeshAnimation - change vertices color and alpha with specified speed. Parameters:
- Speed - speed of animation;
- Use Alpha - whether to use color alpha;
- Alpha Curve - AnimationCurve of changing alpha ratio from 0 to 1;
- Text Color - the color of vertices.
AlphaCenterMeshAnimation - change vertices positions and alpha with specified speed. Parameters:
- Speed - speed of animation;
- Scale Ratio - the ratio of vertices positions while scaling;
- Use Alpha - whether to use color alpha;
- Alpha Curve - AnimationCurve of changing alpha ratio from 0 to 1.
AlphaCharScaleMeshAnimation - scale vertices positions from mesh center and change alpha with specified speed. Parameters:
- Speed - speed of animation;
- Use Alpha - whether to use color alpha;
- Alpha Curve - AnimationCurve of changing alpha ratio from 0 to 1;
- Vertices Curve - AnimationCurve of changing vertices positions ratio from 0 to 1.
Char Index Offset - current highlight position offset in characters count;
Animate Text Instantly - whether to animate the text part instantly. If unchecked, the text will be animated according to TextHighlighter config.
Text Part Finished Action - an action that will be performed after text part finish: Clear Text or Set Current Part Text.

To create a custom effect, create a class that inherits from «TextEffectBase» class and create a ScriptableObject. See the «RichTextEffect» and «MeshTextEffect» classes to understand better how it works.

Text Splitters

Text Split Config (Text Splitter) is a ScriptableObject that contains the logic of text splitting by text parts. Using Text Splitter, you can choose how to display text: all timings text, per timing, or separated by strings.
Asset has tree type of text splitters:

DefaultTextSplitter (Assets/AudioTextSynchronizer/TextSplitters/DefaultTextSplitter.asset) - shows all timings text (don't split the text).
StringTextSplitter (Assets/AudioTextSynchronizer/TextSplitters/StringTextSplitter.asset) - divides text using string list. Parameters:
- Split Text By Strings - strings for text splitting. For example, if it contains an element with a dot (".") symbol, the text will be divided into sentences;
- Trim Characters - the characters that will be used for trimming text parts;
For example, Using separated by StringTextSplitter, you can display the text in sentences:
TimingsTextSplitter (Assets/AudioTextSynchronizer/TextSplitters/TimingsTextSplitter.asset) - shows current timing text.

Text Highlighters

Text Highlight Config (Text Highlighter) is a ScriptableObject that contains the logic of text highlighting. For example, with Text Highlighter, you can choose how to highlight text: per each character, per timing, or separated by strings.

The asset has the following highlight configs:
- DefaultHighlightConfig - highlight text per character:
- StringHighlightConfig - highlight text part that is divided by indents characters:
- TimingHighlightConfig - highlights text per timing:

API Help

Timing

Timing class contains timing data:
public float StartPosition - start position on the timeline in seconds;
public float EndPosition - end position on the timeline, in seconds;
public string Name - the name of timings, used by Timings Editor;
public string Text - the text which contains timing;
public Rect Rectangle - rectangle, used by Timings Editor;
public Color Color - the color of the current rectangle, used by Timings Editor;
public float Size - duration of timing in seconds.

PhraseAsset

PhraseAsset class – it is ScriptableObject that contains timings data:
public AudioClip Clip - AudioClip, which will be used to synchronize timings;
public string Text - Text of AudioClip;
public List<Timing> Timings - a list of Timings.

TextSynchronizer

TextSynchronizer component synchronizes text effect with AudioSource. Main public fields, properties, and methods:
public GameObject GameObjectWithTextComponent { ... } - GameObject, for which effect will be applied, should contain text component;
public Component TextComponent { ... } - text component, for which effect will be applied;
public AudioSource Source { ... } - AudioSource to synchronize text effect with audio playing;
public PhraseAsset Timings { ... } - reference to the asset with timings;
public TextEffectBase TextEffect { ... } - reference to the TextEffect asset, which be applied to text component;
public bool IsRunning { ... } - whether to sync is active;
public event Action OnSyncFinished - event that will be invoked after the synchronization is finished;
public event Action <string, int> OnWordReached - event that will be invoked when word reached while synchronization, second argument - index (position) of the word in the text;
public void Play(bool initializeEffect = false) - run synchronization, optional argument - whether to initialize the effect;
public void Pause() - pause synchronization;
public void Stop(bool resetCurrentTiming = true, bool bool initializeEffect = false) - stop synchronization, first argument - whether to reset current timing, second argument - whether to initialize effect;
public void SkipPhrase() - skips current timing/text part;
public void SplitWords() - get all words from phrases asset main text;
public int GetCharProgress() - return index of text before current timing text;
public int GetCharProgress(char[] trim) - return index of text before trimmed current timing text.

TextEffectBase

TextEffectBase base class of the text effect that interacts with text component while AudioSource is playing. Using this base class, you can create new text effects;
Main public fields, properties, and methods:
public event Action<Timing> OnTimingEnter - event, invokes when audio is playing, and TextSynchronizer sets the new timing (previous timing finished);
public event Action<Timing> OnTimingStart - event, invokes when audio is playing and TextSynchronizer the new timing starts playing;
public event Action<Timing, float> OnTimingProgress - event, invokes when audio is playing, and timing is in sync, second argument - progress of current timing from 0 to 1;
public event Action<Timing> OnTimingEnd - event, invokes when audio playing of current timing is finished;
protected TextSynchronizer TextSynchronizer - reference to TextSynchronizer component;
public virtual void Init(TextSynchronizer textSynchronizer) - initialize the effect, argument - TextSynchronizer instance;
public virtual void OnTimingEntered(Timing timing) - invoke when audio is playing, and TextSynchronizer sets the new timing (previous timing finished);
public virtual void OnTimingStarted(Timing timing) - invoke, when audio is playing and TextSynchronizer the new timing starts playing;
public virtual void OnTimingMoving(Timing timing, float progress) - invoke when audio is playing, and timing is in sync, second argument - progress of current timing from 0 to 1;
public virtual void OnTimingFinished(Timing timing) - invoke, when audio playing of current timing is finished;
public virtual void OnEffectFinished() - invoked when the effect is finished;
protected virtual void SetTextToComponent(string text) - sets text to text component;
protected virtual void SkipPart() - skip current timing/text part.

MeshTextEffect

MeshTextEffect - the effect that uses text component mesh data - position, UV, color, etc. It is used to change the mesh of UI.Text and TextMeshPro components;
Main public fields, properties, and methods:
public MeshAnimationBase MeshAnimation - reference to the MeshAnimation;
public int CharIndexOffset - highlight position offset in characters count;
public bool AnimateTextInstantly - whether to animate the text part instantly. If unchecked, the text will be animated according to the TextHighlighter config;
public OnTextPartAction TextPartFinishedAction - an action that will be performed after text part finish: Clear Text or Set Current Part Text.

RichTextEffect

TextRichEffectBase class sends data for effects while AudioSource is playing. It is used for rich text effects, which are supported by many assets with custom text components;
Main public fields, properties, and methods:
public string StartTag - tag, which inserted to text component;
public string EndTag - tag, which closes the StartTag;
public Color32 HighlightColor - highlighting color;
public int CharIndexOffset - highlight position offset in characters count;
public OnTextPartAction TextPartFinishedAction - an action that will be performed after text part finish: Clear Text or Set Current Part Text;
public OnTextAction EffectFinishedAction - an action that will be performed after effect finish sync: None, Clear Text, or Set Current Part Text.

Frequently used methods

Pause synchronization: [SerializeField] private TextSynchronizer textSynchronizer; //assign reference to the TextSynchronizer component in the Inspector ... textSynchronizer.Pause();
Play/Resume synchronization:
[SerializeField] private TextSynchronizer textSynchronizer; //assign reference to the TextSynchronizer component in the Inspector ... textSynchronizer.Play();
Stop synchronization:
[SerializeField] private TextSynchronizer textSynchronizer; //assign reference to the TextSynchronizer component in the Inspector ... textSynchronizer.Stop();
Change timings in runtime:
[SerializeField] private TextSynchronizer textSynchronizer; //assign reference to the TextSynchronizer component in the Inspector [SerializeField] private PhraseAsset timings; //assign reference to the PhraseAsset asset in the Inspector ... textSynchronizer.Timings = timings;
Subscribe to the synchronization complete event:
[SerializeField] private TextSynchronizer textSynchronizer; //assign reference to the TextSynchronizer component in the Inspector ... textSynchronizer.OnSyncFinished += OnSyncFinished; ... private void OnSyncFinished() { Debug.Log("Synchronization finished!"); textSynchronizer.OnSyncFinished -= OnSyncFinished; }
Subscribe to the synchronization word reached event:
[SerializeField] private TextSynchronizer textSynchronizer; //assign reference to the TextSynchronizer component in the Inspector ... textSynchronizer.OnWordReached += OnWordReached; ... private void OnWordReached(string word, int position) { Debug.Log($"Reached word: {word} at position {position}"); if (word == "SomeWord") { Debug.Log("SomeWord reached!"); textSynchronizer.OnWordReached -= OnWordReached; } }
Subscribe to the text effect progress event:
[SerializeField] private TextSynchronizer textSynchronizer; //assign reference to the TextSynchronizer component in the Inspector ... textSynchronizer.TextEffect.OnTimingProgress += OnTimingProgress; ... private void OnTimingProgress(Timing timing, float progress) { Debug.Log($"Timing text: {timing.Text}, progress: {progress}"); }
Generating timings using «whisper.unity» in runtime:
#if WHISPER_UNITY using AudioTextSynchronizer.Whisper; using Whisper.Utils; #endif ... [SerializeField] private TextSynchronizer textSynchronizer; //assign reference to the TextSynchronizer component in the Inspector [SerializeField] private AudioClip audioClip; //assign reference to the AudioClip component in the Inspector #if WHISPER_UNITY private WhisperHelper whisper; #endif ... private async Init() { #if WHISPER_UNITY whisper = new WhisperHelper(); await whisper.LoadModel(); #endif } ... GenerateFromClip(audioClip); ... private async void GenerateFromClip(AudioClip clip) { #if WHISPER_UNITY var phraseAsset = await whisper.GenerateTimings(clip); textSynchronizer.Stop(); textSynchronizer.Timings = phraseAsset; textSynchronizer.Source.clip = phraseAsset.Clip; #endif }

Upgrading from version 1.x to 2.x

Starting from version 2.x, Text Effects were rewritten from scratch and isn't compatible with effects from version 1.x. If you update an asset in the project that already has 1.x version, please do the following steps to make the asset work properly:

Make sure that you back up your project;
Remove the Assets/Audio-Text Synchronizer/ folder from your project;
Re-import the latest version from the Asset Store using Package Manager (Window -> Package Manager);
Set up the TextSynchronizer component. You can refer to the TextSynchronizer paragraph for a more detailed explanation of how to do that.

Contacts

Please let me know if you have any questions, ideas, or suggestions.
If you're asking for support, please send your invoice number. The more details you give, the better I'll be able to help.
E-mail: unitymedved@gmail.com