Ticket #1 (closed defect: fixed)

Opened 14 years ago

Last modified 14 years ago

Julian 3.5.1 problems with VoxForge Tutorial Acoustic Models

Reported by: kmaclean Owned by: kmaclean
Priority: minor Milestone: 0.1-beta
Component: Speech Rec Engine Version: 0.1-alpha
Keywords: Speech Recognition Engin Julius 3.5.1 Cc:

Description (last modified by kmaclean) (diff)

Running the same acoustic model under Julian 3.5 and 3.5.1, and everything seems to work OK with 3.5, but I get no recognition with at all with 3.5.1. Looking at the console output, Julian 3.5.1 doesn't seem to be picking up the end silence tag </s>, and I am not sure why.

Here is part of the console output for recognition of the phrase "call Steve" under 3.5.1:

$/usr/local/julius/julius-3.5.1-linuxbin/bin/julian-3.5.1-std -input mic -C julian.jconf
...

### read waveform input


pass1_best: <s> DIAL
pass1_best_wordseq: 0 3
pass1_best_phonemeseq: sil | d ay l
pass1_best_score: -102358.328125

length: 593 frames (1.97 sec.)
### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=593
stack empty, search terminate now
0 sentences have found
got no candidates, output 1st pass result as a final result
sentence1: <s> DIAL
wseq1: 0 3
phseq1: sil | d ay l
cmscore1: 0.000 0.000
score1: -102358.328125
0 generated, 0 pushed, 0 nodes popped in 593

<<< please speak >>>

Here is the console output for the same utterance and julian configuration file under 3.5:

$ /usr/local/julius/julius-3.5-linuxbin/bin/julian-3.5-std -input mic -C julian.jconf
...

### read waveform input


pass1_best: <s> CALL STEVE </s>
pass1_best_wordseq: 0 2 4 1
pass1_best_phonemeseq: sil | k ao l | s t iy v | sil
pass1_best_score: -14968.178711

length: 542 frames (1.80 sec.)
### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=542
stack empty, search terminate now
2 sentences have found
sentence1: <s> PHONE STEVE </s>
wseq1: 0 2 4 1
phseq1: sil | f ow n | s t iy v | sil
cmscore1: 1.000 0.000 1.000 1.000
score1: -15512.497070
14 generated, 14 pushed, 16 nodes popped in 542

<<< please speak >>>

I am using the precompiled Julius/Julian? binaries on Fedora Core 4 (64bit) on an AMD64 PC.

Solution:

  • use Julius 3.5 for Acoustic Model creation
  • apply patch to Julius 3.5.1
  • wait for Julius 3.5.2

Change History

comment:1 Changed 14 years ago by kmaclean

  • Status changed from new to assigned

Reply from Julius support:

We now found a small bug that causes wrong feature extraction when using microphone input with 0'th cepstral parameter.

The attached file is a patch for Julius-3.5.1 to fix the bug. This patch is now applied to the current development source on CVS, and will be released as part of 3.5.2 in near future.

LEE Akinobu

diff -crN julius-3.5.1/julius/realtime-1stpass.c julius-3.5.1-fix-c0-mic/julius/realtime-1stpass.c
*** julius-3.5.1/julius/realtime-1stpass.c    2006-03-28 16:29:01.000000000 +0900
--- julius-3.5.1-fix-c0-mic/julius/realtime-1stpass.c    2006-06-12 18:22:06.000000000 +0900
***************
*** 461,467 ****
        memcpy(&(tmpmfcc[para.baselen*2]), &(ab->vec[para.baselen*3]), sizeof(VECT) * para.baselen);
      }
 
!     if (para.delta && para.energy && para.absesup) {
        /* ÀäÂÐÃͥѥ¤ò½üµî */
        /* suppress absolute power */
        memmove(&(tmpmfcc[para.baselen-1]), &(tmpmfcc[para.baselen]), sizeof(VECT) * (para.vecbuflen - para.baselen));
--- 461,467 ----
        memcpy(&(tmpmfcc[para.baselen*2]), &(ab->vec[para.baselen*3]), sizeof(VECT) * para.baselen);
      }
 
!     if (para.delta && (para.energy || para.c0) && para.absesup) {
        /* ÀäÂÐÃͥѥ¤ò½üµî */
        /* suppress absolute power */
        memmove(&(tmpmfcc[para.baselen-1]), &(tmpmfcc[para.baselen]), sizeof(VECT) * (para.vecbuflen - para.baselen));

comment:2 Changed 14 years ago by kmaclean

Getting similar problem with Julius 3.5.1 on Windows

Ken

comment:3 Changed 14 years ago by kmaclean

  • Description modified (diff)

comment:4 Changed 14 years ago by kmaclean

  • Priority changed from major to minor

comment:5 Changed 14 years ago by kmaclean

  • Component changed from Acoustic Model to Speech Rec Engine

comment:6 Changed 14 years ago by kmaclean

  • Status changed from assigned to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.