voxforge.org
VoxForge Dev

Ticket #2 (new defect)

Opened 1 year ago

Dutch model for sphinx3 from IFA corpus

Reported by: kmaclean Assigned to: somebody
Priority: major Milestone:
Component: component1 Version:
Keywords: Cc:

Description

see this post by nsh - (posted a copy here to ensure that information doesn't get lost, Ken)

Hi, I've just made a Dutch model for sphinx3 from IFA corpus. Sphinx2 or pocketsphinx model can be made too, not time yet. Helper files and model itself could be downloaded from: 
 
http://www.mediafire.com/download.php?b2juwvounye 
 
Few issues still exists: 
 
1. We need testing data, in particular language model. To create one I need a lot of Dutch texts. 
 
2. I stripped around 80% of the database due to 5000 OOV words, celex seems to miss a lot of important data. This has to be fixed 
 
3. There are still some bad transcriptions, sphinx report about them as ERRORS 
 
4. It would be nice to use hand-made segmentation as well, that will greatly improve WER.

Also it would be nice to commit this to Dutch part of voxforge.