voxforge.org
VoxForge Dev

root/Tags/AudioSegmentation/AudioBook.pm

Revision 2604, 18.5 kB (checked in by kmaclean, 7 months ago)

AudioSegmentation scripts - snapshot

Line 
1 #! /usr/bin/perl
2 $VERSION = 0.2;
3
4 =head1 NAME
5
6 AudioBook - Convert a single transcribed audio file into 15 word audio segments (approximately) 
7
8 =cut
9
10 package AudioBook;
11 use strict;
12 use diagnostics;
13 use Carp;
14 use Getopt::Std;
15 use File::Basename;
16 use File::Copy;
17 use lib '/home/kmaclean/VoxForge-dev/Main/Scripts/Audio_scripts/AudioSegmentation';
18 use AudioBook::Audio;
19 use AudioBook::Text;
20 use AudioBook::Dictionary;
21
22 =head1 SYNOPSIS
23
24  $./AudioBook -h                                                                        display help
25  $./AudioBook -a speechfile.wav -t text.txt             minimal run configuration
26
27 =head1 DESCRIPTION
28
29 This is a command line program that segments a speech audio file into 15 word (on average) speech segments. 
30 It is executable from the command line and uses the following configuration options to help in segmenting speech:
31
32         -a      * audio file name (WAV format only)
33         -b      notify if beam width for Forced Alignment exceeds a certain level (default = 250)
34                 (does not set HVite's beam width parameter)
35         -d      pronunciation dictionary  (default = AudioBook/input_files/VoxforgeDict)
36         -h      show help
37         -l      LICENSE file (default = AudioBook/input_files/LICENCE)
38         -m      Target maximum sentence length (default = 20 words)
39         -p      Minimum pause for sentence break (default = 2000000 in units of 100ns)
40         -q      log words with single quotes (default = yes)
41         -r      README file (default = AudioBook/input_files/README)
42         -s      Average sentence length (default = 15 words)
43         -t      * text file name (containing transcriptions of speech in audio file)
44         -u      username or name you want file stats collected by on VoxForge Metrics
45                 page:   (http://www.voxforge.org/home/downloads/metrics)
46         -v      validate segment audio files to prompt text using forced Aligment
47         -w      validate missing word pronunciations to audio recordings
48         -x      unique tar file suffix (max 3 characters - remainder is truncated)
49         -S      run sanity test
50         -T      create gzipped/tar file
51
52                  * required for script to run
53
54
55 =head1 NOTES
56
57 =head3 Text Does not Match Audio
58
59 If the contents of the text file do not *exactly* match the contents of the speech audio file, the segmentation process necessarily becomes
60 a manual, iterative process.
61
62 If there are a large divergence in the text from the speech audio, then you will have to manually listen to the speech audio to determine
63 where the biggest transcription errors lie, and then modify the original text file to match the speech audio file. 
64
65 If the transcription errors are minor, then the first pass forced alignment usually completes successfully.  However, if you see "No tokens survived to final node of network at beam" errors in the
66 HVite log (located in interim_files/logs), then using the "-v" verify switch might be helpful in determining where transcription problems
67 might exist.
68
69 The verify switch performs a forced alignment on the individual segments generated from the first pass forced alignment.  Low scores
70 (i.e. the lowest average log likelihood per frame score) indicate that the transcription text might not match the corresponding audio
71 file.  Look at the segment text and listen to the corresponding audo file to determine if they match.  If they do not match, then fix the
72 text in your original text transcription file, repeat this process (i.e. running the AudioBook program again with the verify switch on)
73 until you can get a clean run.
74
75 =head3 Segmenting large audio files
76
77 For larger files (i.e. greater than 30 minutes of audio), you *may* need to manually segment the audio file into 30 minute segments.
78
79 =head3 Automatically Adding Out-of-Vocabulary Words to Pronunciation Dictionary
80
81 The pronunciations generated by the Sequitor G2P scripts need to be manually reviewed before any new pronunciations are added to the
82 pronunciation dictionary.  Make sure you review the pronunciation before commiting these changes to SVN.   
83
84 =head1 REQUIREMENTS
85
86 =item 1 - Sequitor G2P trainable Grapheme-to-Phoneme converter (GPL v2; requires Python to be installed)
87
88         http://www-i6.informatik.rwth-aachen.de/web/Software/g2p.html
89
90 =item 2 - HTK Hidden Markov Model Toolkit (note: the source is "open", but there are distribution restrictions)
91
92         http://htk.eng.cam.ac.uk/
93        
94         The HTK toolkit needs to be in your path (see http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/tutorial/download)
95
96 =head1 ALGORITHM
97
98 This program tries to segments the speech audio file into 15 word sentences.  However, if the pause following the 15th word relative to the current
99 sentence start position is too short, the algorithm looks at the previous word (i.e. word 14) to see if it has a pause of suitable duration. 
100 If not, it then looks at word following the current start position (i.e. word 16), and so one until a pause of suitable
101 duration can be found, increasing the number of words to look behind and ahead each time.
102
103 The default pause duration is 2000000 in units of 100ns.  This can be changed (using the "-p" switch") if the speech audio file does segment well
104 enough with this default.
105
106 =cut
107
108 ####################################################################
109 ### Class Variables
110 ####################################################################
111 our($opt_a,$opt_b,$opt_d,$opt_h,$opt_l,$opt_m,$opt_p,$opt_r,$opt_s,$opt_t,$opt_x,$opt_q,$opt_S,$opt_T,$opt_u,$opt_v,$opt_w); # need to define these because using strict.
112 my %self;
113 $self{'debug'} = 0;
114 $self{'g2p_model'} = "AudioBook/input_files/g2p/models/model-5";
115 $self{'htk_files'} = "AudioBook/input_files/htk";
116 $self{'log'} = "AudioBook/output_files/AudioBook_Log";
117 my $self=\%self;
118 bless($self,"AudioBook");
119
120 my $default_average_sentence_length = 15;
121 my $default_max_sentence_length = 20;
122 my $default_min_pause_for_sentence_break = 2000000;
123 my $command;
124
125 ####################################################################
126 ### Main
127 ####################################################################
128 $self->cleanupFiles();
129 $self->getOptions();
130 $self->process();
131 print "completed!\n";
132
133 ####################################################################
134 ### Methods
135 ####################################################################
136
137 =head1 METHODS (not user accessible)
138
139 =head2 process
140
141 Segement the user designated speech audio file (-a) sing the supplied text file (-t)
142
143 =cut
144
145 sub process {
146         my ($self)= @_;
147         my $debug = $self->{'debug'};
148         my $audiofile = $self->{"audiofile"};
149         my $textfile = $self->{"textfile"};
150         my $username = $self->{"username"};
151         my $tarSuffix = $self->{"tarSuffix"};
152         my $pronDict = $self->{"pronDict"};
153         my $htk_files = $self->{'htk_files'};
154         my $log = $self{'log'};
155         my $dict = "AudioBook/interim_files/dict";
156         my $originalDict = "AudioBook/interim_files/originalDict";
157         my $altDict = "AudioBook/interim_files/altDict";       
158         my $prompts = "AudioBook/interim_files/prompts";       
159        
160         my $tempPronDict = "AudioBook/interim_files/pronDict";
161         copy($pronDict,$tempPronDict); 
162
163         my $textContents = AudioBook::Text->new($self,$textfile);
164         $textContents->createWLISTFile("AudioBook/interim_files/wlist");
165        
166         my $dictionary = AudioBook::Dictionary->new($self);
167         my $missingwordfound = $dictionary->findOutOfVocabularyWords($pronDict,"AudioBook/interim_files/MissingWords");
168         if ($missingwordfound) {
169                 $dictionary->getRecommendedPronunciations("AudioBook/interim_files/MissingWords_out"); # uses g2p
170                 $dictionary->updatePronDict($tempPronDict);
171                 copy($dict,$originalDict); # save dict before suggested pronunications are added - only need these pronunciations for segmentation of audio     
172                 # need to update dict with missing words
173                 # can't seem to change default HDMan log file with "-l" parameter
174                 $command = ("HDMan -A -D -T 1 -g $htk_files/global.ded -m -w AudioBook/interim_files/wlist -i -l AudioBook/interim_files/dlog $dict $tempPronDict"); system($command) == 0 or confess "fullrun $command failed: $?";
175                 $command = ("mv AudioBook/interim_files/dlog AudioBook/interim_files/logs/dlog2"); print "cmd:$command\n" if $debug; system($command);
176                 # no longer required$command = ("cp AudioBook/interim_files/MissingWords_out AudioBook/output_files/MissingWords"); print "cmd:$command\n" if $debug; system($command);
177         } else {
178                 open(LOG,">>$log") or confess ("cannot open AudioBook/output_files/MissingWords file");
179                 print LOG "\nMissing Words that need to be added to Pronunciation Dictionary, with suggested pronunciations::\n";       
180                 print LOG "------------------------------------------------\n";                         
181                 print LOG "no missing words\n";
182                 close LOG
183         }
184         # dict may get manually updated; dict only includes suggested prompts, therefore do not copy to output - suggested pronunications are in the log regardless ...
185         #... $command = ("cp AudioBook/interim_files/dict AudioBook/output_files"); print "cmd:$command\n" if $debug; system($command);         
186
187         my $audio = AudioBook::Audio->new($self);
188         $audio->segment($audiofile,$textContents);
189         if ($self->{"verify_segments"}) {
190                 $audio->verifySegments;
191         }       
192         if ($missingwordfound) {
193                 if ($self->{"verify_out_of_vocabulary_pronunciations"}) {
194                         $dictionary->getAlternatePronunciations("AudioBook/interim_files/MissingWords_alt",15); # uses Sequitor g2p to get top N pronunication vairations
195                         $dictionary->createAltDict($originalDict,$altDict);     # merge & sort missing_words_alt and originalDict into altDict
196                         $dictionary->validateAlternatePronunciations($originalDict,$altDict,$prompts);
197                 }
198                 $dictionary->updatePronDict($pronDict);
199         }       
200        
201         if (defined($tarSuffix)){
202                 _createTarFile($self);
203         }
204 }
205
206 =head2 cleanupFiles
207
208 Removes any old files in the AudioBook/interim_files/ and AudioBook/output_files/ directories, prior to processing.
209
210 =cut
211
212 sub cleanupFiles {
213         my ($self)= @_;
214         if (defined(<AudioBook/interim_files/*>)) {
215                 unlink (<AudioBook/interim_files/*>);
216         }
217         if (defined(<AudioBook/interim_files/logs/*>)) {
218                 unlink (<AudioBook/interim_files/logs/*>);     
219         }
220         if (defined(<AudioBook/interim_files/missingWordsFolder/*>)) {
221                 unlink (<AudioBook/interim_files/missingWordsFolder/*>);       
222         }
223         if (defined(<AudioBook/interim_files/wav/*>)) {
224                 unlink (<AudioBook/interim_files/wav/*>);       
225         }       
226         if (defined(<AudioBook/output_files/wav/*>)) {
227                 unlink (<AudioBook/output_files/wav/*>);       
228         }
229         if (defined(<AudioBook/output_files/*>)) {     
230                 unlink (<AudioBook/output_files/*>);
231         }
232 }
233
234 =head2 _createTarFile
235
236 creates a GZipped Tar file form files contained in AudioBook/output_files
237
238 =item * -T      Switch to turn on this functionality
239
240 =item * -u      username used in name of the tar file
241
242 =cut
243
244 sub _createTarFile { # private
245         my ($self)= @_;
246         my $debug = $self->{'debug'};
247         my $username = $self->{"username"};
248         my $tarSuffix = $self->{"tarSuffix"};
249         my $readme = $self->{"README"};
250         my $license = $self->{"LICENSE"};
251        
252         my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
253         $year += 1900;
254         $mon = sprintf("%02d", $mon);
255         $mday = sprintf("%02d", $mday);
256         print "creating gzipped tar file:$username\-$year$mon$mday\-$tarSuffix\.tgz \n";
257         if (defined($readme)) {
258                 copy("$readme","AudioBook/output_files/README");
259         } else {
260                 print "Warning: no README file to copy\n";
261         }
262         if (defined($license)) {
263                 copy("$license","AudioBook/output_files/LICENSE");
264         } else {
265                 print "Warning: no LICENSE file to copy\n";
266         }
267         copy("AudioBook/interim_files/prompts","AudioBook/output_files/prompts");
268         $command = ("cp AudioBook/interim_files/wav/* AudioBook/output_files/wav/"); print "cmd:$command\n" if $debug; system($command);       
269         if ($debug) {
270                 $command = ("tar -zcvf $username\-$year$mon$mday\-$tarSuffix\.tgz AudioBook/output_files --exclude \"\.svn\" "); print "cmd:$command\n" if $debug; system($command);
271         } else {
272                 $command = ("tar -zcf $username\-$year$mon$mday\-$tarSuffix\.tgz AudioBook/output_files --exclude \"\.svn\" "); print "cmd:$command\n" if $debug; system($command);
273         }
274         print "please submit your tar file to: www.voxforge.org\n";     
275 }
276
277 sub _random_characters {
278         my ($length) = @_;     
279         my @chars=('a'..'z');
280         my $randomString;
281         foreach (1..$length){
282                 $randomString.=$chars[rand @chars];
283         }
284         return $randomString;
285 }
286
287 =head2 getOptions
288
289 Get the user submitted options ('a:b:d:hl:m:p:r:s:t:u:x:q:vwST')
290
291 =cut
292
293 sub getOptions {
294         my ($self)= @_;
295         my $debug = $self->{'debug'};   
296         getopts('a:b:d:hl:m:p:r:s:t:u:x:q:vwST');    #  sets $opt_* as a side effect.
297         if ($opt_S) { # Sanity test switch
298                 $self->{"audiofile"}="AudioBook/test/audio.wav";
299                 #$self->{"textfile"}="AudioBook/test/text-simple.txt";
300                 $self->{"textfile"}="AudioBook/test/text-original.txt";
301                 $command = ("cp AudioBook/input_files/VoxForgeDict AudioBook/interim_files/VoxForgeDict"); print "cmd:$command\n"; system($command);
302                 $self->{"pronDict"}="AudioBook/interim_files/VoxForgeDict";
303                 $self->{"tarSuffix"}=_random_characters(3);
304                 $self->{"username"}="test";
305                 $self->{"average_sentence_length"}= $default_average_sentence_length;
306                 $self->{"max_sentence_length"}= $default_max_sentence_length;
307                 $self->{"min_pause_for_sentence_break"}=$default_min_pause_for_sentence_break;
308                
309                 $self->{"log_single_quotes"}= 1;
310                 $self->{"verify_segments"}=1;   
311                 $self->{"verify_out_of_vocabulary_pronunciations"}=1;           
312                 $self->{"README"}="AudioBook/input_files/README";
313                 $self->{"LICENSE"}="AudioBook/input_files/LICENSE";
314         } elsif ($opt_a and $opt_t) {   
315                 if (-r $opt_a) {
316                         $self->{"audiofile"}=$opt_a;
317                 } else {
318                         die "can't open -a" . $self->{"audiofile"} . "\n";             
319                 }
320                 if (-r $opt_t) {
321                         $self->{"textfile"}=$opt_t;
322                 } else {
323                         die "can't open -t" . $self->{"textfile"} . "\n";               
324                 }
325                 if (defined($opt_d)) {
326                         if (-r $opt_d) {
327                                 $self->{"pronDict"}=$opt_d;
328                         } else {
329                                 die "can't open -d" . $self->{"pronDict"} . "\n";       
330                         }
331                 } else {
332                         $self->{"pronDict"}="AudioBook/input_files/VoxForgeDict";       
333                 }
334                 ### Audio Processing
335                 if ($opt_s) {
336                         $self->{"average_sentence_length"}=$opt_s;
337                 } else {
338                         $self->{"average_sentence_length"}= $default_average_sentence_length;   
339                 }
340                 if ($opt_m) {
341                         $self->{"max_sentence_length"}=$opt_m;
342                 } else {
343                         $self->{"max_sentence_length"}= $default_max_sentence_length;   
344                 }
345                 if ($opt_p) {
346                         $self->{"min_pause_for_sentence_break"}=$opt_p;
347                 } else {
348                         $self->{"min_pause_for_sentence_break"}= $default_min_pause_for_sentence_break;
349                 }       
350                 if ($opt_q) {
351                         if ($opt_v =~ /^n|no$/i){
352                                 $self->{"log_single_quotes"}= 0;
353                         } else {
354                                 $self->{"log_single_quotes"}= 1;       
355                         }
356                 } else {
357                         $self->{"log_single_quotes"}= 1;       
358                 }       
359                 if ($opt_b) {
360                         $self->{"beam_width"}=$opt_b;
361                 } else {
362                         $self->{"beam_width"}=250;     
363                 }
364                 if ($opt_v) {
365                         $self->{"verify_segments"}=1;
366                 } else {
367                         $self->{"verify_segments"}=0;   
368                 }       
369                 if ($opt_w) {
370                         $self->{"verify_out_of_vocabulary_pronunciations"}=1;
371                 } else {
372                         $self->{"verify_out_of_vocabulary_pronunciations"}=0;   
373                 }       
374                 ### Tar file processing
375                 if (defined($opt_T)) {
376                         if ($opt_x) {
377                                 $self->{"tarSuffix"}=substr($opt_x,0,3); # only use 1st 3 characters.                   
378                         }else {
379                                 $self->{"tarSuffix"}=_random_characters(3);
380                         }
381                         if ($opt_u) {
382                                 $self->{"username"}=$opt_u;     
383                         }else {
384                                 $self->{"username"}="anonymous";
385                         }       
386                         if ($opt_r) {
387                                 if (-r $opt_r) {
388                                         $self->{"README"}=$opt_r;       
389                                 } else {
390                                         die "can't open -r" . $self->{"README"} . "\n";                         
391                                 }
392                         } else {
393                                 $self->{"README"}="AudioBook/input_files/README";
394                         }               
395                         if ($opt_l) {
396                                 if (-r $opt_l) {
397                                         $self->{"LICENSE"}=$opt_l;     
398                                 } else {
399                                         die "can't open -l" . $self->{"LICENSE"} . "\n";                               
400                                 }
401                         } else {
402                                 $self->{"LICENSE"}="AudioBook/input_files/LICENSE";
403                         }
404                 }
405         } elsif ($opt_h) {
406                 print "\nVoxForge Audio Segmentation Script Parameters\n";     
407                 print   "=============================================\n";     
408                 print "-a\t* audio file name (WAV format only)\n";
409                 print "-b\tnotify if beam width for Forced Alignment exceeds a certain level (default = 250)\n";
410                 print "\t(does not set HVite's beam width parameter)\n";
411                 print "-d\tpronunciation dictionary  (default = AudioBook/input_files/VoxforgeDict)\n";
412                 print "-h\tshow help\n";       
413                 print "-l\tLICENSE file (default = AudioBook/input_files/LICENCE)\n";
414                 print "-m\tTarget maximum sentence length (default = $default_max_sentence_length words)\n";
415                 print "-p\tMinimum pause for sentence break (default = $default_min_pause_for_sentence_break in units of 100ns)\n";             
416                 print "-q\tlog words with single quotes (default = yes)\n";             
417                 print "-r\tREADME file (default = AudioBook/input_files/README)\n";                             
418                 print "-s\tAverage sentence length (default = $default_average_sentence_length words)\n";                               
419                 print "-t\t* text file name (containing transcriptions of speech in audio file)\n";
420                
421                 print "-u\tusername or name you want file stats collected by on VoxForge Metrics \n";
422                 print "\tpage:\t(http://www.voxforge.org/home/downloads/metrics)\n";   
423                
424                 print "-v\tvalidate segment audio files to prompt text using forced Aligment\n";
425                 print "-w\tvalidate missing word pronunciations to audio recordings\n";         
426                 print "-x\tunique tar file suffix (max 3 characters - remainder is truncated)\n";
427                 print "-S\trun sanity test\n";         
428                 print "-T\tcreate gzipped/tar file\n";
429                 print "\n\t* required for script to run\n";     
430                 print "\n";     
431                 print "--\n";                   
432                 print "Free Speech... Recognition\n";
433                 print "http://www.voxforge.org\n\n";
434                 exit;
435         } else {
436                 print "\nVoxForge Audio Segmentation Script\n";
437                 print   "==================================\n";
438                 print "Parms -a and -t need to be defined. Use -h parameter for more information\n\n";
439                 print "--\n";                   
440                 print "Free Speech... Recognition\n";
441                 print "http://www.voxforge.org\n\n";
442                 exit;
443         }
444         print "audiofile:" . $self->{"audiofile"}. "\n";
445         print "textfile:" . $self->{"textfile"}. "\n";
446         print "pronDict:" . $self->{"pronDict"} . "\n\n";       
447 }
448
449 =head2 Gettors - Public (used by methods in other sub-classes)
450
451 =item * getAverage_sentence_length()
452
453 =cut
454
455 sub getAverage_sentence_length {
456         my $self = shift;
457         return $self->{"average_sentence_length"};
458 }
459
460 =item * getMax_sentence_length()
461
462 =cut
463
464 sub getMax_sentence_length {
465         my $self = shift;
466         return $self->{"max_sentence_length"};
467 }
468
469 =item * getMin_pause_for_sentence_break()
470
471 =cut
472
473 sub getMin_pause_for_sentence_break {
474         my $self = shift;
475         return $self->{"max_sentence_length"};
476 }
477    
478 =head1 Change Log   
479
480         2008/05/02 - 0.2 - convert to class; major refacture ; renamed fullrun.pl to AudioBook.pm                                                       
481         2008/01/31 - 0.1 - created
482        
483 =cut
484
485 =head1 AUTHOR
486    
487     Ken MacLean
488     contact@voxforge.org
489      
490 =head1 COPYRIGHT AND LICENSE       
491      
492 Copyright (C) 2008 Ken MacLean
493    
494 This program is free software; you can redistribute it and/or
495 modify it under the terms of the GNU General Public License
496 as published by the Free Software Foundation; either version 2
497 of the License, or (at your option) any later version.
498    
499 This program is distributed in the hope that it will be useful,
500 but WITHOUT ANY WARRANTY; without even the implied warranty of
501 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
502 GNU General Public License for more details.
Note: See TracBrowser for help on using the browser.