Dynamic Time Warping’s New Youth
recorded by: IEEE ICME
published: Sept. 18, 2012, recorded: July 2012, views: 5444
Slides
Related content
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
Before the use of Hidden Markov Models (HMM) became ubiquitous in speech‐related applications, pattern matching algorithms like the well known Dynamic Time Warping (DTW) algorithm [1] were extensively used for applications such as spoken keyword recognition [2]. At the time, the main drawbacks of this technology were its computational cost (given the machinery available at the time) and the lack of generalization when matching acoustic sequences from different speakers or different acoustic contexts. The availability of labeled datasets used for training pushed pattern matching techniques aside in favor of HMMs. Still, HMMs have several well known weaknesses, such as overgeneralization given the training data, lack of robustness to changing noise conditions and the need to have large corpora of well‐labeled training data, limiting their suitability for some speech applications. For this reason, recently some research groups started to look again at DTW as a plausible alternative, and worked on smoothing those issues that made it unsuitable in the past. On the one hand, new acoustic features are being researched [3] to make the matching as independent as possible to the speaker, while keeping the content. On the other hand, although computing power is much improved from the 70’s, DTW several enhancements have been proposed [4,5] in order to allow for more challenging tasks than in the past. Some of the tasks where pattern‐matching (and in particular DTW) approaches are currently applied are: automatic discovery of repeated patterns in speech, query‐by‐example voice search, pattern‐based speech recognition and low‐resource languages analysis.
References:
[1] H. Sakoe and S. Chiba, “Dynamic programming algorithm
optimiza‐ tion for spoken word recognition,” IEEE Transactions on
Acoustics, Speech and Signal Processing, vol. 26, pp. 43–49, 1978.
[2] Alan L. Higgins and Robert E. Wohlford, “Keyword recognition
using template concatenation,” in In Proc. ICASSP 1985, 1985.
[3] G. Aradilla, “Using Posterior‐Based Features in Template
Matching for Speech Recognition,” in ICSLP, 2006.
[4] X. Anguera, R. Macrae, and N. Oliver, “Partial Sequence
Matching using an Unbounded Dynamic Time Warping Algorithm,”
ICASSP, 2010.
[5] A. Jansen and B. V. Durme, “Efficient Spoken Term Discovery
Using Randomized Algorithms,” in ASRU, 2011.
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !
Write your own review or comment: