We examine the problem of automatically aligning acoustic musical audio and textual lyric in popular songs. Existing works have tackled the problem using computationally-expensive audio processing techniques, resulting in solutions unsuitable for any real-time application. In contrast, our work features only lightweight signal processing and is capable of real-time alignment.
We investigate in repetition-based techniques and alignment algorithms to obtain a baseline alignment. A key extension of our work is to derive and utilize additional segmentation knowledge on both modalities to significantly enhance alignment performance by 34.85% and 8.18% in start and duration time errors. We conclude by suggesting a new repetition-based framework for lyric alignment together with a modular system design, where each module is independent and feasibly-extendable to improve the overall performance.
Download Paper |