Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural 'networks

Alex Graves, Santiago Fernández, Faustino Gomez, Jürgen Schmidhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1306 Scopus citations

Abstract

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.
Original languageEnglish (US)
Title of host publicationICML 2006 - Proceedings of the 23rd International Conference on Machine Learning
Pages369-376
Number of pages8
StatePublished - Oct 6 2006
Externally publishedYes

Fingerprint

Dive into the research topics of 'Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural 'networks'. Together they form a unique fingerprint.

Cite this