Ticket #57 (new defect)

Opened 14 years ago

Last modified 13 years ago

Numbered words with different spellings causing interpretation problems

Reported by: kmaclean Owned by: kmaclean
Priority: minor Milestone: Acoustic Model 1.0
Component: Speech Rec Engine Version: 0.1-alpha
Keywords: Cc:

Description

CMU Dictionnary has words with different pronunciations; for example:

ABABA           [ABABA]         ax b aa b ax
ABABA(2)        [ABABA(2)]      aa b ax b ax
ZERO            [ZERO]          z ih r ow
ZERO'S          [ZERO'S]        z ih r ow z
ZERO'S(2)       [ZERO'S(2)]     z iy r ow z
ZERO(2)         [ZERO(2)]       z iy r ow

The word in the square bracket is returned from the recognizer, and this causes problems with test scores since recognition may return the alternate pronunication.

For example, if the speech rec engine recognizes zero(2), it returns 'zero(2)', but the mlf has the word 'zero', and HTK's HResults does not recognize that they should be the same, and marks it as a misrecognized words and lower recognition scores accordingly.

Need to update the dictionnary so that the words with more than one pronunciation return the correct word, i.e. the dictionnary should look like this:

ABABA           [ABABA]         ax b aa b ax
ABABA(2)        [ABABA]      aa b ax b ax
ZERO            [ZERO]          z ih r ow
ZERO'S          [ZERO'S]        z ih r ow z
ZERO'S(2)       [ZERO'S]     z iy r ow z
ZERO(2)         [ZERO]       z iy r ow

Change History

comment:1 Changed 13 years ago by kmaclean

  • Milestone changed from Unassigned to Acoustic Model 1.0
Note: See TracTickets for help on using tickets.