The core training used an extended version of 1. The front-end used was PLP with a mel-spectra based filterbank. Models for noisy environments were trained using single-pass retraining present in V2. Speech was first segmented using Gaussian mixture models and a phone recogniser. The Hub 3 evaluation focussed on large vocabulary transcription of clean and noisy speech. It used MLLR before lattice generation and then rescored the lattices with adapted quinphone models. Separate LMs were built for different sources and interpolated to form a single model.
|Date Added:||23 December 2011|
|File Size:||67.56 Mb|
|Operating Systems:||Windows NT/2000/XP/2003/2003/7/8/10 MacOS 10/X|
|Price:||Free* [*Free Regsitration Required]|
Separate LMs were built for different sources and interpolated to form a single model. For a full description and results see Rich Transcription workshop presentation. For the Hub 1 evaluation a number of other features were used including maximum hfk linear regression adaptation and the use of quinphone models in a lattice-rescoring pass using a 65k 4-gram language model.
The conversational speech evaluation Hub 5 required the transcription of telephone conversations.
Contents of HTKBook for HTK3
Groups of clustered segments were then used for MLLR adaptation and word lattices htkk 4-gram interpolated with class-trigram with triphone HMMs trained on 70 hours of broadcast news data. The broadcast news evaluation Hub 4 transcribed pre-segmented and labelled portions of broadcast news audio. A less than 10xRT CTS system was developed which employed 2-way system combination and lattice-based adaptation. However, from the sections below it hyk be seen that there are many other features that have been incorporated into the CUED HTK systems.
The core training used an extended version of 1.
HTK users meeting at ICASSP 2001
Speech was first segmented using Gaussian mixture models and a phone recogniser. In future we hope to make many of these available in released versions of HTK. The unlimited compute conversational telephone speech CTS, previously known as Switchboard or Hub5 was similar in structure to the system, but utilised improved acoustic and language models and performed automatic segmentation of the audio data.
A faster version of the full system that ran in less than 10 times realtime was developed.
HTKBook for HTK3
The system developed for the Switchboard part of the April Rich Transcription evaluation used acoustic models trained using Minimum Phone Error training.
Again combined triphone and quinphone rescoring passes were used.
The broadcast news evaluation Hub 4. The front-end used was PLP with a mel-spectra based filterbank. The broadcast news evaluation Hub 4 was an evolution of the system.
The Hub 3 evaluation focussed on large vocabulary transcription of clean and noisy speech. Models for noisy environments were trained using single-pass retraining present in V2.
It used MLLR before lattice generation and then rescored the lattices with adapted quinphone models. The major tool currently lacking from the distributed HTK releases to reproduce these systems is a capable large vocabulary decoder supporting trigrams and 4-grams and cross-word triphones and quinphones. This section gives a brief overview of the features of these systems and how they relate to the features present in released versions of HTK.
The Hgk March evaluation data included data recorded over conventional telephone lines as well as data from calls over cellular channels. Both triphone and quinphone HMMs were trained on hours of data and used in a multi-stage recognition process, first generating lattices with MLLR-adapted triphones and then rescoring these with adapted quinphones.
Each of these systems described below has represented the state-of-the-art when it was produced either the lowest error rate in the evaluation or not a statistically significant difference to the lowest error rate system.
The full system included cluster-based variance normalistaion and vocal-tract length normalisation VTLN 2010 full-variance transforms none of these are included in released versions of HTK up to 3.