| 2 | | #################################################################### |
|---|
| 3 | | ### |
|---|
| 4 | | ### script name : Text.pm |
|---|
| 5 | | ### version: 0.2 |
|---|
| 6 | | ### created by: Ken MacLean |
|---|
| 7 | | ### mail: contact@voxforge.org |
|---|
| 8 | | ### Date: 2007.3.20 |
|---|
| 9 | | ### |
|---|
| 10 | | ### Copyright (C) 2007 Ken MacLean |
|---|
| 11 | | ### |
|---|
| 12 | | ### This program is free software; you can redistribute it and/or |
|---|
| 13 | | ### modify it under the terms of the GNU General Public License |
|---|
| 14 | | ### as published by the Free Software Foundation; either version 2 |
|---|
| 15 | | ### of the License, or (at your option) any later version. |
|---|
| 16 | | ### |
|---|
| 17 | | ### This program is distributed in the hope that it will be useful, |
|---|
| 18 | | ### but WITHOUT ANY WARRANTY; without even the implied warranty of |
|---|
| 19 | | ### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
|---|
| 20 | | ### GNU General Public License for more details. |
|---|
| 21 | | ### |
|---|
| 22 | | ### Change History: |
|---|
| 23 | | ### 2008/05/02 - 0.2 - Convert to class; major refacture; renamed from etext2wlist.pl to Text.pm |
|---|
| 24 | | #################################################################### |
|---|
| | 2 | $VERSION = 0.2; |
|---|
| | 3 | |
|---|
| | 4 | =head1 NAME |
|---|
| | 5 | |
|---|
| | 6 | AudioBook::Text - Text transcription processing |
|---|
| | 7 | |
|---|
| | 8 | =cut |
|---|
| | 9 | |
|---|
| 45 | | #################################################################### |
|---|
| 46 | | ### Class Methods |
|---|
| 47 | | #################################################################### |
|---|
| 48 | | ### Cleans up eText |
|---|
| | 38 | |
|---|
| | 39 | =head2 _clean |
|---|
| | 40 | |
|---|
| | 41 | Called by the "new" constructor - removes many (not all) non-alphanumeric characters. |
|---|
| | 42 | |
|---|
| | 43 | $line =~ s/\n/ /g; # remove all line feeds from the text file |
|---|
| | 44 | $line =~ s/\r/ /g; # remove all carriage returns from the text file |
|---|
| | 45 | $line =~ tr/a-z/A-Z/; # change to uppercase |
|---|
| | 46 | $line =~ s/\.\"//g; # period followed by double quote |
|---|
| | 47 | $line =~ s/\,\"//g; # comma followed by double quote |
|---|
| | 48 | $line =~ s/\?\"//g; # question mark followed by double quote |
|---|
| | 49 | $line =~ s/\!\"//g; # exclamation mark followed by double quote |
|---|
| | 50 | $line =~ s/\.\'//g; # period followed by single quote |
|---|
| | 51 | $line =~ s/\,\'//g; # comma followed by single quote |
|---|
| | 52 | $line =~ s/\?\'//g; # question mark followed by single quote |
|---|
| | 53 | $line =~ s/\!\'//g; # exclamation mark followed by single quote |
|---|
| | 54 | $line =~ s/\"//g; # remove all double quotes |
|---|
| | 55 | $line =~ s/,//g; # remove commas |
|---|
| | 56 | $line =~ s/://g; # remove colon |
|---|
| | 57 | $line =~ s/--/ /g; #double dash |
|---|
| | 58 | $line =~ s/ - / /g; # dash punctuation |
|---|
| | 59 | $line =~ s/ -/ /g; # dash punctuation |
|---|
| | 60 | $line =~ s/-/ /g; # dash - compound word; replace with space, so they can be looked up in pronunciation dictionary |
|---|
| | 61 | $line =~ s/;//g; # semi-colon |
|---|
| | 62 | $line =~ s/!//g; # exclamation mark |
|---|
| | 63 | $line =~ s/\?//g; # question mark |
|---|
| | 64 | $line =~ s/ / /g; # cleanup double spaces |
|---|
| | 65 | $line =~ s/=//g; # remove equal sign |
|---|
| | 66 | $line =~ s/\(//g; # remove parenthesis |
|---|
| | 67 | $line =~ s/\)//g; # remove parenthesis |
|---|
| | 68 | $line =~ s/_//g; # remove underscore |
|---|
| | 69 | $line =~ s/\[//g; # remove left bracket |
|---|
| | 70 | $line =~ s/\]//g; # remove right bracket |
|---|
| | 71 | $line =~ s/\*//g; # remove star |
|---|
| | 72 | $line =~ s/&/AND/g; |
|---|
| | 73 | |
|---|
| | 74 | =cut |
|---|
| | 75 | |
|---|
| | 510 | |
|---|
| | 511 | =head1 Change Log |
|---|
| | 512 | |
|---|
| | 513 | 2008/05/02 - 0.2 - Convert to class; major refacture; renamed from etext2wlist.pl to Text.pm |
|---|
| | 514 | |
|---|
| | 515 | =head1 AUTHOR |
|---|
| | 516 | |
|---|
| | 517 | Ken MacLean |
|---|
| | 518 | contact@voxforge.org |
|---|
| | 519 | |
|---|
| | 520 | =head1 COPYRIGHT AND LICENSE |
|---|
| | 521 | |
|---|
| | 522 | Copyright (C) 2007 Ken MacLean |
|---|
| | 523 | |
|---|
| | 524 | This program is free software; you can redistribute it and/or |
|---|
| | 525 | modify it under the terms of the GNU General Public License |
|---|
| | 526 | as published by the Free Software Foundation; either version 2 |
|---|
| | 527 | of the License, or (at your option) any later version. |
|---|
| | 528 | |
|---|
| | 529 | This program is distributed in the hope that it will be useful, |
|---|
| | 530 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
|---|
| | 531 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
|---|
| | 532 | GNU General Public License for more details. |
|---|
| | 533 | |
|---|
| | 534 | =cut |
|---|
| | 535 | |
|---|