voxforge.org
VoxForge Dev

Ticket #1 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

Julian 3.5.1 problems with VoxForge Tutorial Acoustic Models

Reported by: kmaclean Assigned to: kmaclean
Priority: minor Milestone: 0.1-beta
Component: Speech Rec Engine Version: 0.1-alpha
Keywords: Speech Recognition Engin Julius 3.5.1 Cc:

Description (Last modified by kmaclean)

Running the same acoustic model under Julian 3.5 and 3.5.1, and everything seems to work OK with 3.5, but I get no recognition with at all with 3.5.1. Looking at the console output, Julian 3.5.1 doesn't seem to be picking up the end silence tag </s>, and I am not sure why.

Here is part of the console output for recognition of the phrase "call Steve" under 3.5.1:

$/usr/local/julius/julius-3.5.1-linuxbin/bin/julian-3.5.1-std -input mic -C julian.jconf
...

### read waveform input


pass1_best: <s> DIAL
pass1_best_wordseq: 0 3
pass1_best_phonemeseq: sil | d ay l
pass1_best_score: -102358.328125

length: 593 frames (1.97 sec.)
### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=593
stack empty, search terminate now
0 sentences have found
got no candidates, output 1st pass result as a final result
sentence1: <s> DIAL
wseq1: 0 3
phseq1: sil | d ay l
cmscore1: 0.000 0.000
score1: -102358.328125
0 generated, 0 pushed, 0 nodes popped in 593

<<< please speak >>>

Here is the console output for the same utterance and julian configuration file under 3.5:

$ /usr/local/julius/julius-3.5-linuxbin/bin/julian-3.5-std -input mic -C julian.jconf
...

### read waveform input


pass1_best: <s> CALL STEVE </s>
pass1_best_wordseq: 0 2 4 1
pass1_best_phonemeseq: sil | k ao l | s t iy v | sil
pass1_best_score: -14968.178711

length: 542 frames (1.80 sec.)
### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=542
stack empty, search terminate now
2 sentences have found
sentence1: <s> PHONE STEVE </s>
wseq1: 0 2 4 1
phseq1: sil | f ow n | s t iy v | sil
cmscore1: 1.000 0.000 1.000 1.000
score1: -15512.497070
14 generated, 14 pushed, 16 nodes popped in 542

<<< please speak >>>

I am using the precompiled Julius/Julian binaries on Fedora Core 4 (64bit) on an AMD64 PC.

Solution:

  • use Julius 3.5 for Acoustic Model creation
  • apply patch to Julius 3.5.1
  • wait for Julius 3.5.2

Change History

07/12/06 15:59:59 changed by kmaclean

  • status changed from new to assigned.

Reply from Julius support:

We now found a small bug that causes wrong feature extraction when using microphone input with 0'th cepstral parameter.

The attached file is a patch for Julius-3.5.1 to fix the bug. This patch is now applied to the current development source on CVS, and will be released as part of 3.5.2 in near future.

LEE Akinobu

diff -crN julius-3.5.1/julius/realtime-1stpass.c julius-3.5.1-fix-c0-mic/julius/realtime-1stpass.c
*** julius-3.5.1/julius/realtime-1stpass.c    2006-03-28 16:29:01.000000000 +0900
--- julius-3.5.1-fix-c0-mic/julius/realtime-1stpass.c    2006-06-12 18:22:06.000000000 +0900
***************
*** 461,467 ****
        memcpy(&(tmpmfcc[para.baselen*2]), &(ab->vec[para.baselen*3]), sizeof(VECT) * para.baselen);
      }
 
!     if (para.delta && para.energy && para.absesup) {
        /* ÀäÂÐÃͥѥ¤ò½üµî */
        /* suppress absolute power */
        memmove(&(tmpmfcc[para.baselen-1]), &(tmpmfcc[para.baselen]), sizeof(VECT) * (para.vecbuflen - para.baselen));
--- 461,467 ----
        memcpy(&(tmpmfcc[para.baselen*2]), &(ab->vec[para.baselen*3]), sizeof(VECT) * para.baselen);
      }
 
!     if (para.delta && (para.energy || para.c0) && para.absesup) {
        /* ÀäÂÐÃͥѥ¤ò½üµî */
        /* suppress absolute power */
        memmove(&(tmpmfcc[para.baselen-1]), &(tmpmfcc[para.baselen]), sizeof(VECT) * (para.vecbuflen - para.baselen));

07/12/06 16:00:38 changed by kmaclean

Getting similar problem with Julius 3.5.1 on Windows

Ken

07/12/06 16:03:16 changed by kmaclean

  • description changed.

07/17/06 08:03:28 changed by kmaclean

  • priority changed from major to minor.

07/18/06 16:37:35 changed by kmaclean

  • component changed from Acoustic Model to Speech Rec Engine.

07/18/06 20:56:54 changed by kmaclean

  • status changed from assigned to closed.
  • resolution set to fixed.