voxforge.org
VoxForge Dev
Show
Ignore:
Timestamp:
05/26/08 14:11:03 (6 months ago)
Author:
kmaclean
Message:

AudioSegmentation scripts -add POD docs

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • Trunk/Scripts/Audio_scripts/AudioSegmentation/AudioBook.pm

    r2591 r2593  
    11#! /usr/bin/perl 
    2 #################################################################### 
    3 ### 
    4 ### script name : AudioBook.pm 
    5 ### version: 0.2 
    6 ### created by: Ken MacLean 
    7 ### mail: contact@voxforge.org 
    8 ### Date: 2008.01.31 
    9 ### Command: perl ./AudioBook.pm 
    10 ###    
    11 ### Copyright (C) 2008 Ken MacLean 
    12 ### 
    13 ### This program is free software; you can redistribute it and/or 
    14 ### modify it under the terms of the GNU General Public License 
    15 ### as published by the Free Software Foundation; either version 2 
    16 ### of the License, or (at your option) any later version. 
    17 ### 
    18 ### This program is distributed in the hope that it will be useful, 
    19 ### but WITHOUT ANY WARRANTY; without even the implied warranty of 
    20 ### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
    21 ### GNU General Public License for more details. 
    22 ###  
    23 ### Changes:     
    24 ### 2008/05/02 - 0.2 -  convert to calss; major refacture ; renamed fullrun.pl to AudioBook.pm                                                        
    25 #################################################################### 
     2$VERSION = 0.2; 
     3 
     4=head1 NAME 
     5 
     6AudioBook - Convert a single transcribed audio file into an average of 15 word audio segments   
     7 
     8=cut  
     9 
    2610package AudioBook; 
    2711use strict; 
     
    3519use AudioBook::Text; 
    3620use AudioBook::Dictionary; 
     21 
     22=head1 SYNOPSIS 
     23 
     24 $./AudioBook -h                                                                        display help 
     25 $./AudioBook -a speechfile.wav -t text.txt             minimal run configuration 
     26 
     27=head1 DESCRIPTION 
     28 
     29This program segments a speech audio file into 15 word (on average) speech segments.  It is executable from the command line and uses  
     30the following configuration options to help in segmenting speech: 
     31 
     32        -a      * audio file name (WAV format only) 
     33        -b      beam width for Forced Alignment with HVit (default = 250) 
     34        -d      pronunciation dictionary  (default = AudioBook/input_files/VoxforgeDict) 
     35        -h      show help 
     36        -l      LICENSE file (default = AudioBook/input_files/LICENCE) 
     37        -m      Maximum sentence length (default = 20 words) 
     38        -p      Minimum pause for sentence break (default = 2000000 in units of 100ns) 
     39        -q      log words with single quotes (default = yes) 
     40        -r      README file (default = AudioBook/input_files/README) 
     41        -s      Average sentence length (default = 15 words) 
     42        -t      * text file name (containing transcriptions of speech in audio file) 
     43        -u      username or name you want file stats collected by on VoxForge Metrics  
     44                page:   (http://www.voxforge.org/home/downloads/metrics) 
     45        -v      verify segments created from first pass Forced Alignment 
     46        -x      unique tar file suffix (max 3 characters - remainder is truncated) 
     47        -S      run sanity test 
     48        -T      create gzipped/tar file 
     49 
     50                * required for script to run 
     51 
     52=head2 NOTES 
     53 
     54=head3 Text Does not Match Audio 
     55 
     56If the contents of the text file do not *exactly* match the contents of the speech audio file, the segmentation process necessarily becomes  
     57a manual, iterative process. 
     58 
     59If there are a large divergence in the text from the speech audio, then you will have to manually listen to the speech audio to determine  
     60where the biggest transcription errors lie, and then modify the original text file to match the speech audio file.   
     61 
     62If the transcription errors are minor, then the first pass forced alignment usually completes successfully.  However, if you see "No tokens survived to final node of network at beam" errors in the  
     63HVite log (located in interim_files/logs), then using the "-v" verify switch might be helpful in determining where transcription problems  
     64might exist. 
     65 
     66The verify switch performs a forced alignment on the individual segments generated from the first pass forced alignment.  Low scores  
     67(i.e. the lowest average log likelihood per frame score) indicate that the transcription text might not match the corresponding audio  
     68file.  Look at the segment text and listen to the corresponding audo file to determine if they match.  If they do not match, then fix the  
     69text in your original text transcription file, repeat this process (i.e. running the AudioBook program again with the verify switch on)  
     70until you can get a clean run. 
     71 
     72=head3 Segmenting large audio files 
     73 
     74For larger files (i.e. greater than 30 minutes of audio), you *may* need to manually segment the audio file into 30 minute segments. 
     75 
     76=head3 Automatically Adding Out-of-vocabulary words to pronunciation dictionary  
     77 
     78The pronunciations generated by the Sequitor G2P scripts need to be manually reviewed before any new pronunciations are added to the 
     79pronunciation dictionary.  Make sure you review the pronunciation before commiting these changes to SVN.     
     80 
     81=head1 REQUIREMENTS 
     82 
     83=item 1 - Sequitor G2P trainable Grapheme-to-Phoneme converter (which requires Python to be installed) 
     84                http://www-i6.informatik.rwth-aachen.de/web/Software/g2p.html 
     85 
     86=item 2 - HTK Hidden Markov Model Toolkit - note: the source is "open", but there are distribution restrictions 
     87                http://htk.eng.cam.ac.uk/ 
     88 
     89=head1 ALGORITHM 
     90 
     91This program tries to segments the speech audio file into 15 word sentences.  However, if the pause following the 15th word relative to the current  
     92sentence start position is too short, the algorithm looks at the previous word (i.e. word 14) to see if it has a pause of suitable duration.   
     93If not, it then looks at word following the current start position (i.e. word 16), and so one until a pause of suitable 
     94duration can be found, increasing the number of words to look behind and ahead each time.  
     95 
     96The default pause duration is 2000000 in units of 100ns.  This can be changed (using the "-p" switch") if the speech audio file does segment well  
     97enough with this default. 
     98 
     99=head1 METHODS (not user accessible) 
     100 
     101=cut 
     102 
    37103#################################################################### 
    38104### Class Variables 
     
    58124 
    59125#################################################################### 
    60 ### Main 
    61 #################################################################### 
     126### Methods 
     127#################################################################### 
     128 
     129=head2 process 
     130 
     131Segement the user designated speech audio file (-a) sing the supplied text file (-t)  
     132 
     133=cut 
     134 
    62135sub process { 
    63136        my ($self)= @_; 
     
    91164                close LOG 
    92165        }  
    93         $command = ("cp AudioBook/interim_files/dict AudioBook/output_files"); print "cmd:$command\n" if $debug; system($command);       
     166        # dict may get manually updated; dict only includes suggested prompts 
     167        #... $command = ("cp AudioBook/interim_files/dict AudioBook/output_files"); print "cmd:$command\n" if $debug; system($command);          
     168        $command = ("cp AudioBook/interim_files/prompts AudioBook/output_files/prompts"); print "cmd:$command\n" if $debug; system($command);    
     169 
    94170        my $audio = AudioBook::Audio->new($self); 
    95171        $audio->segment($audiofile,$textContents); 
     
    102178} 
    103179 
     180=head2 cleanupFiles 
     181 
     182Removes any old files in the AudioBook/interim_files/ and AudioBook/output_files/ directories. 
     183 
     184=cut 
     185 
    104186sub cleanupFiles { 
    105187        my ($self)= @_; 
     
    121203} 
    122204 
    123 sub _createTarFile { 
     205=head2 _createTarFile  
     206 
     207creates a GZipped Tar file form files contained in AudioBook/output_files 
     208 
     209=item * -T      Switch to turn on this functionality 
     210 
     211=item * -u      username used in name of the tar file  
     212 
     213=cut 
     214 
     215sub _createTarFile { # private 
    124216        my ($self)= @_; 
    125217        my $debug = $self->{'debug'}; 
     
    166258        return $randomString; 
    167259} 
     260 
     261=head2 getOptions  
     262 
     263Get the user submitted options ('a:b:d:hl:m:p:r:s:t:u:x:q:v:ST') 
     264 
     265=cut 
    168266 
    169267sub getOptions { 
     
    290388                print "-r\tREADME file (default = AudioBook/input_files/README)\n";                              
    291389                print "-s\tAverage sentence length (default = $default_average_sentence_length words)\n";                                
    292                 print "-t\t* text file name\n"; 
     390                print "-t\t* text file name (containing transcriptions of speech in audio file)\n"; 
     391                 
    293392                print "-u\tusername or name you want file stats collected by on VoxForge Metrics \n"; 
    294393                print "\tpage:\t(http://www.voxforge.org/home/downloads/metrics)\n";     
     394                 
     395                print "-v\tverify segments created from first pass Forced Alignment\n"; 
    295396                print "-x\tunique tar file suffix (max 3 characters - remainder is truncated)\n"; 
    296397                print "-S\trun sanity test\n";           
     
    302403                print "\nVoxForge Audio Segmentation Script\n";  
    303404                print   "==================================\n";  
    304                 print "parms -a, -t, -d need to be defined, use -h parameter for more information\n\n"; 
     405                print "Parms -a and -t need to be defined. Use -h parameter for more information\n\n"; 
    305406                exit; 
    306407        } 
     
    310411} 
    311412 
    312 #################################################################### 
    313 ### Gettors - Public 
    314 #################################################################### 
     413=head2 Gettors - Public (used by methods in other sub-classes) 
     414 
     415=item * getAverage_sentence_length() 
     416 
     417=cut 
     418 
    315419sub getAverage_sentence_length { 
    316420        my $self = shift; 
     
    318422} 
    319423 
     424=item * getMax_sentence_length() 
     425 
     426=cut 
     427 
    320428sub getMax_sentence_length { 
    321429        my $self = shift; 
     
    323431} 
    324432 
     433=item * getMin_pause_for_sentence_break() 
     434 
     435=cut 
     436 
    325437sub getMin_pause_for_sentence_break { 
    326438        my $self = shift; 
    327439        return $self->{"max_sentence_length"}; 
    328440} 
    329  
    330 1; 
     441     
     442=head1 Change Log     
     443 
     444        2008/05/02 - 0.2 - convert to class; major refacture ; renamed fullrun.pl to AudioBook.pm                                                        
     445        2008/01/31 - 0.1 - created 
     446         
     447=cut 
     448 
     449=head1 AUTHOR 
     450     
     451    Ken MacLean 
     452    contact@voxforge.org 
     453       
     454=head1 COPYRIGHT AND LICENSE        
     455       
     456Copyright (C) 2008 Ken MacLean 
     457    
     458This program is free software; you can redistribute it and/or 
     459modify it under the terms of the GNU General Public License 
     460as published by the Free Software Foundation; either version 2 
     461of the License, or (at your option) any later version. 
     462    
     463This program is distributed in the hope that it will be useful, 
     464but WITHOUT ANY WARRANTY; without even the implied warranty of 
     465MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
     466GNU General Public License for more details.