- Timestamp:
- 06/18/08 21:59:52 (7 months ago)
- Files:
Legend:
- Unmodified
- Added
- Removed
- Modified
- Copied
- Moved
Trunk/Scripts/Audio_scripts/AudioSegmentation/AudioBook.pm
r2613 r2616 37 37 It is executable from the command line and uses the following configuration options to help in segmenting speech: 38 38 39 VoxForge Audio Segmentation Script Parameters40 =============================================41 -a * audio file name (WAV format only)42 -b notify if beam width for Forced Alignment exceeds a certain level (default = 250)43 (does not set HVite's beam width parameter)44 -d pronunciation dictionary (default = AudioBook/input_files/VoxforgeDict)45 -h show help46 -i interactive validation of missing word pronunciations47 -l LICENSE file (default = AudioBook/input_files/LICENCE)48 -m Target maximum sentence length (default = 20 words)49 -p Minimum pause for sentence break (default = 2000000 in units of 100ns)50 -q log words with single quotes (default = yes)51 -r README file (default = AudioBook/input_files/README)52 -s Average sentence length (default = 15 words)53 -t * text file name (containing transcriptions of speech in audio file)54 -u username or name you want file stats collected by on VoxForge Metrics55 page: (http://www.voxforge.org/home/downloads/metrics)56 -v validate segment audio files to prompt text using forced Aligment57 -w validate missing word pronunciations to audio recordings58 -x unique tar file suffix (max 3 characters - remainder is truncated)59 -S run sanity test60 -T create gzipped/tar file39 VoxForge Audio Segmentation Script Parameters 40 ============================================= 41 -a * audio file name (WAV format only) 42 -b notify if beam width for Forced Alignment exceeds a certain level (default = 250) 43 (does not set HVite's beam width parameter) 44 -d pronunciation dictionary (default = AudioBook/input_files/VoxforgeDict) 45 -h show help 46 -i interactive validation of missing word pronunciations 47 -l LICENSE file (default = AudioBook/input_files/LICENCE) 48 -m Target maximum sentence length (default = 20 words) 49 -p Minimum pause for sentence break (default = 2000000 in units of 100ns) 50 -q log words with single quotes (default = yes) 51 -r README file (default = AudioBook/input_files/README) 52 -s Average sentence length (default = 15 words) 53 -t * text file name (containing transcriptions of speech in audio file) 54 -u username or name you want file stats collected by on VoxForge Metrics 55 page: (http://www.voxforge.org/home/downloads/metrics) 56 -v validate segment audio files to prompt text using forced Aligment 57 -w validate missing word pronunciations to audio recordings 58 -x unique tar file suffix (max 3 characters - remainder is truncated) 59 -S run sanity test 60 -T create gzipped/tar file 61 61 62 62 * minimum required for script to run … … 66 66 67 67 68 =head 1Step 1 - First Pass Forced Alignment - Getting it to Run Completely Without Errors68 =head2 Step 1 - First Pass Forced Alignment - Getting it to Run Completely Without Errors 69 69 70 70 Execute the script as follows using only the '-a' and '-t' parameters: … … 76 76 of the sentence, and put an entry into the prompts file. 77 77 78 =head 2NOTES79 80 =head 3Text Does not Match Audio78 =head3 NOTES 79 80 =head4 Text Does not Match Audio 81 81 82 82 The text file *must exactly* match the contents of the speech audio file. … … 89 89 where the biggest transcription errors lie, and then modify the original text file to match the speech audio file. 90 90 91 =head 3Dealing With Out-of-vocabulary Words91 =head4 Dealing With Out-of-vocabulary Words 92 92 93 93 Forced Alignment is performed with HTK's HVite tool. HVite requires that each word in the text to be forced aligned have a pronunciation entry … … 96 96 of reasonable lengths. Using this information, the script can create a prompt entries and corresponding audio segment. 97 97 98 =head 3Segmenting Large Audio Files98 =head4 Segmenting Large Audio Files 99 99 100 100 For larger files (i.e. greater than 30 minutes of audio), you *may* need to manually split the audio file into 30 minute segments, with 101 101 corresponding text files. 102 102 103 =head 1Step 2 - First Pass Forced Alignment - Runs OK, but there are Errors103 =head2 Step 2 - First Pass Forced Alignment - Runs OK, but there are Errors 104 104 105 105 If the transcription errors are minor, then the first pass forced alignment usually completes successfully. … … 108 108 Ensure that the prompt text matches the prompt audio. 109 109 110 =head 1Step 3 - First Pass Forced Alignment - Verify the Segments110 =head2 Step 3 - First Pass Forced Alignment - Verify the Segments 111 111 112 112 Get the script to perform a forced alignment on each of the segments, and display the worst 15 "average log likelihood per frame" … … 123 123 the AudioBook program again with the verify switch on) until you can get a clean run. 124 124 125 =head 1Step 4 - First Pass Forced Alignment - Adjusting Prompt Length125 =head2 Step 4 - First Pass Forced Alignment - Adjusting Prompt Length 126 126 127 127 After you can get the First Pass Forced Alignment to run without errors, check the AudioBook.log log file (in the output_files directory) and … … 133 133 Continue making adjustments until you can get reasonable prompt lengths. 134 134 135 =head 3Note135 =head4 Note 136 136 137 137 The worst case scenario is that you cannot segment your audio because it does not have any pauses that are long enough to use for a … … 139 139 segments because the person spoke continuously for a long period of time. You will likely have to segment these longer prompts manually. 140 140 141 =head 1Step 5 - Validate Suggested Out-of-Vocabulary Word Pronunciations141 =head2 Step 5 - Validate Suggested Out-of-Vocabulary Word Pronunciations 142 142 143 143 The pronunciations generated by the Sequitor G2P scripts need to be manually reviewed before any new pronunciations are added to the … … 151 151 recognition), so you can manually validate the final pronunciations. 152 152 153 =head 2Note153 =head4 Note 154 154 155 155 That this approach is only as good as the acoustic model you are using. The pronunciations still need to be validated against the Sequitor G2P recommended … … 158 158 Please donate some speech to Voxforge to help improve our acoutic models. 159 159 160 =head 1Step 6 - Update Pronunciation Lexicon160 =head2 Step 6 - Update Pronunciation Lexicon 161 161 162 162 If you are submitting your segmented audio to VoxForge, please include your validated Out-of-Vocabulary word pronunciations 163 163 with your submission as a separate file called: "OOV_pron.txt". 164 164 165 Thanks. 165 166 =head2 Step 7 - Missing word processing 167 168 Use interactive command line tool (using the -i switch, after having run with -v and -w swtiches - this class requires the missingword.xml to 169 work properly) to line to generate suggested pronunciations (phone lists) using Sequitor G2P and HVite forced alignment to generate most 170 probable pronunciation. 166 171 167 172 =head1 ALGORITHM … … 197 202 The HTK toolkit needs to be in your path 198 203 (see http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/tutorial/download) 204 205 =item 3 - Perl packages 206 207 Term::ReadLine::Gnu 208 209 199 210 200 211 =cut … … 684 695 =head1 Change Log 685 696 686 2008/06/09 - 0.2.1 - refacture to create Chapter, Segments & MissingWords classes 687 2008/05/02 - 0.2 - convert to class; major refacture ; renamed fullrun.pl to AudioBook.pm 688 2008/01/31 - 0.1 - created 697 2008/06/12 - 0.1 - created CommandLine class to permit interactive validation of missing word pronunciations 698 2008/06/1 - 0.2.1 - refacture to create Chapter, Segments & MissingWords classes 699 2008/06/09 - 0.2.1 - refacture to create Chapter, Segments & MissingWords classes 700 2008/05/02 - 0.2 - convert to class; major refacture ; renamed fullrun.pl to AudioBook.pm 701 2008/01/31 - 0.1 - created 689 702 690 =cut691 692 703 =head1 AUTHOR 693 704 694 Ken MacLean695 contact@voxforge.org705 Ken MacLean 706 contact@voxforge.org 696 707 697 708 =head1 COPYRIGHT AND LICENSE