voxforge.org
VoxForge Dev

Changeset 2616

Show
Ignore:
Timestamp:
06/18/08 21:59:52 (2 months ago)
Author:
kmaclean
Message:

AudioSegmentation script - snapshot re: interactive Missingword update

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • Trunk/Scripts/Audio_scripts/AudioSegmentation/AudioBook.pm

    r2613 r2616  
    3737It is executable from the command line and uses the following configuration options to help in segmenting speech: 
    3838 
    39 VoxForge Audio Segmentation Script Parameters 
    40 ============================================= 
    41 -a      * audio file name (WAV format only) 
    42 -b      notify if beam width for Forced Alignment exceeds a certain level (default = 250) 
    43         (does not set HVite's beam width parameter) 
    44 -d      pronunciation dictionary  (default = AudioBook/input_files/VoxforgeDict) 
    45 -h      show help 
    46 -i      interactive validation of missing word pronunciations 
    47 -l      LICENSE file (default = AudioBook/input_files/LICENCE) 
    48 -m      Target maximum sentence length (default = 20 words) 
    49 -p      Minimum pause for sentence break (default = 2000000 in units of 100ns) 
    50 -q      log words with single quotes (default = yes) 
    51 -r      README file (default = AudioBook/input_files/README) 
    52 -s      Average sentence length (default = 15 words) 
    53 -t      * text file name (containing transcriptions of speech in audio file) 
    54 -u      username or name you want file stats collected by on VoxForge Metrics  
    55         page:   (http://www.voxforge.org/home/downloads/metrics) 
    56 -v      validate segment audio files to prompt text using forced Aligment 
    57 -w      validate missing word pronunciations to audio recordings 
    58 -x      unique tar file suffix (max 3 characters - remainder is truncated) 
    59 -S      run sanity test 
    60 -T      create gzipped/tar file 
     39       VoxForge Audio Segmentation Script Parameters 
     40       ============================================= 
     41       -a      * audio file name (WAV format only) 
     42       -b      notify if beam width for Forced Alignment exceeds a certain level (default = 250) 
     43               (does not set HVite's beam width parameter) 
     44       -d      pronunciation dictionary  (default = AudioBook/input_files/VoxforgeDict) 
     45       -h      show help 
     46       -i      interactive validation of missing word pronunciations 
     47       -l      LICENSE file (default = AudioBook/input_files/LICENCE) 
     48       -m      Target maximum sentence length (default = 20 words) 
     49       -p      Minimum pause for sentence break (default = 2000000 in units of 100ns) 
     50       -q      log words with single quotes (default = yes) 
     51       -r      README file (default = AudioBook/input_files/README) 
     52       -s      Average sentence length (default = 15 words) 
     53       -t      * text file name (containing transcriptions of speech in audio file) 
     54       -u      username or name you want file stats collected by on VoxForge Metrics  
     55               page:   (http://www.voxforge.org/home/downloads/metrics) 
     56       -v      validate segment audio files to prompt text using forced Aligment 
     57       -w      validate missing word pronunciations to audio recordings 
     58       -x      unique tar file suffix (max 3 characters - remainder is truncated) 
     59       -S      run sanity test 
     60       -T      create gzipped/tar file 
    6161 
    6262        * minimum required for script to run 
     
    6666 
    6767 
    68 =head1 Step 1 - First Pass Forced Alignment - Getting it to Run Completely Without Errors 
     68=head2 Step 1 - First Pass Forced Alignment - Getting it to Run Completely Without Errors 
    6969 
    7070Execute the script as follows using only the '-a' and '-t' parameters: 
     
    7676of the sentence, and put an entry into the prompts file. 
    7777 
    78 =head2 NOTES 
    79  
    80 =head3 Text Does not Match Audio 
     78=head3 NOTES 
     79 
     80=head4 Text Does not Match Audio 
    8181 
    8282The text file *must exactly* match the contents of the speech audio file. 
     
    8989where the biggest transcription errors lie, and then modify the original text file to match the speech audio file.   
    9090 
    91 =head3 Dealing With Out-of-vocabulary Words  
     91=head4 Dealing With Out-of-vocabulary Words  
    9292 
    9393Forced Alignment is performed with HTK's HVite tool.  HVite requires that each word in the text to be forced aligned have a pronunciation entry 
     
    9696of reasonable lengths.  Using this information, the script can create a prompt entries and corresponding audio segment.   
    9797 
    98 =head3 Segmenting Large Audio Files 
     98=head4 Segmenting Large Audio Files 
    9999 
    100100For larger files (i.e. greater than 30 minutes of audio), you *may* need to manually split the audio file into 30 minute segments, with  
    101101corresponding text files. 
    102102 
    103 =head1 Step 2 - First Pass Forced Alignment - Runs OK, but there are Errors  
     103=head2 Step 2 - First Pass Forced Alignment - Runs OK, but there are Errors  
    104104 
    105105If the transcription errors are minor, then the first pass forced alignment usually completes successfully.   
     
    108108Ensure that the prompt text matches the prompt audio. 
    109109 
    110 =head1 Step 3 - First Pass Forced Alignment - Verify the Segments 
     110=head2 Step 3 - First Pass Forced Alignment - Verify the Segments 
    111111 
    112112Get the script to perform a forced alignment on each of the segments, and display the worst 15 "average log likelihood per frame" 
     
    123123the AudioBook program again with the verify switch on) until you can get a clean run.   
    124124 
    125 =head1 Step 4 - First Pass Forced Alignment - Adjusting Prompt Length 
     125=head2 Step 4 - First Pass Forced Alignment - Adjusting Prompt Length 
    126126 
    127127After you can get the First Pass Forced Alignment to run without errors, check the AudioBook.log log file (in the output_files directory) and  
     
    133133Continue making adjustments until you can get reasonable prompt lengths. 
    134134 
    135 =head3 Note 
     135=head4 Note 
    136136 
    137137The worst case scenario is that you cannot segment your audio because it does not have any pauses that are long enough to use for a  
     
    139139segments because the person spoke continuously for a long period of time.  You will likely have to segment these longer prompts manually. 
    140140 
    141 =head1 Step 5 - Validate Suggested Out-of-Vocabulary Word Pronunciations  
     141=head2 Step 5 - Validate Suggested Out-of-Vocabulary Word Pronunciations  
    142142 
    143143The pronunciations generated by the Sequitor G2P scripts need to be manually reviewed before any new pronunciations are added to the 
     
    151151recognition), so you can manually validate the final pronunciations.  
    152152 
    153 =head2 Note  
     153=head4 Note  
    154154 
    155155That this approach is only as good as the acoustic model you are using.  The pronunciations still need to be validated against the Sequitor G2P recommended  
     
    158158Please donate some speech to Voxforge to help improve our acoutic models. 
    159159 
    160 =head1 Step 6 - Update Pronunciation Lexicon 
     160=head2 Step 6 - Update Pronunciation Lexicon 
    161161 
    162162If you are submitting your segmented audio to VoxForge, please include your validated Out-of-Vocabulary word pronunciations 
    163163with your submission as a separate file called: "OOV_pron.txt". 
    164164 
    165 Thanks. 
     165 
     166=head2 Step 7 - Missing word processing 
     167 
     168Use interactive command line tool (using the -i switch, after having run with -v and -w swtiches - this class requires the missingword.xml to  
     169work properly) to line to generate suggested pronunciations (phone lists) using Sequitor G2P and HVite forced alignment to generate most  
     170probable pronunciation. 
    166171 
    167172=head1 ALGORITHM 
     
    197202        The HTK toolkit needs to be in your path  
    198203        (see http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/tutorial/download) 
     204 
     205=item 3 - Perl packages 
     206 
     207        Term::ReadLine::Gnu 
     208 
     209 
    199210 
    200211=cut  
     
    684695=head1 Change Log     
    685696 
    686 2008/06/09 - 0.2.1 - refacture to create Chapter, Segments & MissingWords classes 
    687 2008/05/02 - 0.2 - convert to class; major refacture ; renamed fullrun.pl to AudioBook.pm                                                        
    688 2008/01/31 - 0.1 - created 
     697  2008/06/12 - 0.1 - created CommandLine class to permit interactive validation of missing word pronunciations 
     698  2008/06/1 - 0.2.1 - refacture to create Chapter, Segments & MissingWords classes 
     699  2008/06/09 - 0.2.1 - refacture to create Chapter, Segments & MissingWords classes 
     700  2008/05/02 - 0.2 - convert to class; major refacture ; renamed fullrun.pl to AudioBook.pm                                                        
     701  2008/01/31 - 0.1 - created 
    689702         
    690 =cut 
    691  
    692703=head1 AUTHOR 
    693704     
    694     Ken MacLean 
    695     contact@voxforge.org 
     705  Ken MacLean 
     706  contact@voxforge.org 
    696707       
    697708=head1 COPYRIGHT AND LICENSE        
  • Trunk/Scripts/Audio_scripts/AudioSegmentation/AudioBook/MissingWords/CommandLine.pm

    r2615 r2616  
    44=head1 NAME 
    55 
    6 MissingWords::CommandLine - Command line update of Pronunciation Dictionary
     6MissingWords::CommandLine - Command line validation and update of missing word pronunciations
    77 
    88You can listen to each audio segment corresponding to the missing word, and select from a Sequitor G2P generated pronunciation, 
     
    1010acoustic model needs improvement), or create your own.   
    1111 
    12 You can then automatically update the pronunciations dictionary. 
     12You can then automatically update the pronunciation dictionary. 
    1313 
    1414=head1 Requirements 
     
    282282=head1 Change Log     
    283283 
    284 2008/08/12 - 0.1 - created 
     2842008/06/12 - 0.1 - created 
    285285         
    286286=cut