CMU Dictionnary has words with different pronunciations; for example:
ABABA [ABABA] ax b aa b ax
ABABA(2) [ABABA(2)] aa b ax b ax
ZERO [ZERO] z ih r ow
ZERO'S [ZERO'S] z ih r ow z
ZERO'S(2) [ZERO'S(2)] z iy r ow z
ZERO(2) [ZERO(2)] z iy r ow
The word in the square bracket is returned from the recognizer, and this causes problems with test scores since recognition may return the alternate pronunication.
For example, if the speech rec engine recognizes zero(2), it returns 'zero(2)', but the mlf has the word 'zero', and HTK's HResults does not recognize that they should be the same, and marks it as a misrecognized words and lower recognition scores accordingly.
Need to update the dictionnary so that the words with more than one pronunciation return the correct word, i.e. the dictionnary should look like this:
ABABA [ABABA] ax b aa b ax
ABABA(2) [ABABA] aa b ax b ax
ZERO [ZERO] z ih r ow
ZERO'S [ZERO'S] z ih r ow z
ZERO'S(2) [ZERO'S] z iy r ow z
ZERO(2) [ZERO] z iy r ow