Running the same acoustic model under Julian 3.5 and 3.5.1, and everything seems to work OK with 3.5, but I get no recognition with at all with 3.5.1. Looking at the console output, Julian 3.5.1 doesn't seem to be picking up the end silence tag </s>, and I am not sure why.
Here is part of the console output for recognition of the phrase "call Steve" under 3.5.1:
$/usr/local/julius/julius-3.5.1-linuxbin/bin/julian-3.5.1-std -input mic -C julian.jconf
...
### read waveform input
pass1_best: <s> DIAL
pass1_best_wordseq: 0 3
pass1_best_phonemeseq: sil | d ay l
pass1_best_score: -102358.328125
length: 593 frames (1.97 sec.)
### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=593
stack empty, search terminate now
0 sentences have found
got no candidates, output 1st pass result as a final result
sentence1: <s> DIAL
wseq1: 0 3
phseq1: sil | d ay l
cmscore1: 0.000 0.000
score1: -102358.328125
0 generated, 0 pushed, 0 nodes popped in 593
<<< please speak >>>
Here is the console output for the same utterance and julian configuration file under 3.5:
$ /usr/local/julius/julius-3.5-linuxbin/bin/julian-3.5-std -input mic -C julian.jconf
...
### read waveform input
pass1_best: <s> CALL STEVE </s>
pass1_best_wordseq: 0 2 4 1
pass1_best_phonemeseq: sil | k ao l | s t iy v | sil
pass1_best_score: -14968.178711
length: 542 frames (1.80 sec.)
### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=542
stack empty, search terminate now
2 sentences have found
sentence1: <s> PHONE STEVE </s>
wseq1: 0 2 4 1
phseq1: sil | f ow n | s t iy v | sil
cmscore1: 1.000 0.000 1.000 1.000
score1: -15512.497070
14 generated, 14 pushed, 16 nodes popped in 542
<<< please speak >>>
I am using the precompiled Julius/Julian binaries on Fedora Core 4 (64bit) on an AMD64 PC.
Solution:
- use Julius 3.5 for Acoustic Model creation
- apply patch to Julius 3.5.1
- wait for Julius 3.5.2