Ticket #52 (new defect)

Opened 13 years ago

Last modified 12 years ago

Update How-to and Tutorial dictionnary to use CMU dictionnary

Reported by: kmaclean Owned by: kmaclean
Priority: major Milestone: WebSite 0.3
Component: Acoustic Model Version: 0.1-alpha
Keywords: Cc:

Description

whereas the How-to and Tutorial uses the smaller Switchboard dictionnary, which has slightly different pronunciations than CMU's dictionnary.

Change History

comment:1 Changed 13 years ago by kmaclean

Link to where having different dictionnaries is causing problems Adapting Acoustic Models to your voice

comment:2 Changed 13 years ago by kmaclean

Will need to update Julius grammars to use CMU spelling

comment:3 Changed 13 years ago by kmaclean

  • Priority changed from minor to major

comment:4 Changed 12 years ago by kmaclean

  • Milestone changed from Unassigned to WebSite 0.3

comment:5 Changed 12 years ago by kmaclean

The pronunciation dictionary used in the Tutorial and How-to is based on the ISIP Switchboard corpus (contains around 27,500 words). Whereas the QuickStart and nightly AM builds is based on version 0.6 of the CMU Pronunciation Dictionary (contains around 130,000 words). Unfortunately, the Switchboard and CMU pronunciation dictionaries use slightly different phoneme syntax. This is enough to make them incompatible from a Grammar and Acoustic Model testing perspective.

When testing an AM using the VoxForge Testing Tutorial (Step 2 - Create Test Prompts), this difference in pronunciation dictionaries may cause the following error, if the user does not select the right voca file - as set out in the instructions:

--------------------------------

###### check configurations
###### initialize input device
###### build up system
Reading in HMM definition...(ascii)...limit check passed
   defined HMMs:    50
  logical names:   506 in HMMList
    base phones:    44 used in logical
done
Making pseudo bi/mono-phone for IW-triphone...369 added as logical...done
Reading in dictionary...
line 18: triphone "*-z+ih" or biphone "z+ih" not found
line 18: triphone "z-ih+r" not found
line 18: triphone "ih-r+ow" not found
> 6     [ZERO]  z ih r ow
error in reading sample.dict: 1 words failed out of 18 words
ERROR: failed to read dictionary, terminated
Terminated

Because:

ZERO is defined as:

z iy r ow  (in voxforge_lexicon)

z ih r ow  (in my original sample.voca) 
Note: See TracTickets for help on using tickets.