Ticket #473 (closed defect: fixed)

Opened 11 years ago

Last modified 11 years ago

Submission tar files without submission name as root directory

Reported by: kmaclean Owned by: kmaclean
Priority: major Milestone: WebSite 0.2.1
Component: Audio Version: Website 0.2
Keywords: Cc:

Description

From this postfrom nsh: Packing of the audio files

Hi Ken

Recently I tried to start retraining of the sphinx models with recent improvements that were made. The hardest step in training is actually preparation of data, putting it into right folders and organizing in proper format.

The first issue I've met is the following: some archives available for download has name as topfolder:

Aaron-20080318-liy

Aaron-20080318-liy/etc

Aaron-20080318-liy/wav

Some others have etc and wav as topfolders directly like

AdrianMcNear?-20091016-psv

This creates some trouble for scripts it's better to avoid. What's the best way to fix that, should we just modify the script and repackage everything?

Change History

comment:1 Changed 11 years ago by kmaclean

Hi nsh,

What's the best way to fix that, should we just modify the script and

repackage everything?

The problem originates with the move from a set of scripts containing a hideous combination of Perl and make commands (to execute Linux Gzip/Tar? commands), to a Perl script that only uses the Perl Tar/GZip/Zip packages for creating tar files (revision 2691) on April 19, 2009.

Therefore anything before April 19, 2009 has the submission name as a root directory (and etc & wav as subdirectories), whereas anything on or after that date has etc and wav as root directories.

I am assuming that this makes things a bit more complicated if you want to extract a bunch of files all at once in the same directory, so the preferred approach would be to have the submission name as the root directory for all submissions...

Should not be a big change, but the uploading could take a long time (a few days to a week at a throttled bandwidth so as not to kill response time on the VoxForge? webserver, and I'll have to watch my upload bandwidth limits... might have to split it across Jan/Feb?).

Please let me know if this makes sense,

thanks,

Ken

comment:2 Changed 11 years ago by kmaclean

As a quick work-around to this issue: use Nautilus to extract the tarfiles that don't have a root directory... Nautilus will create one for you. You can do a multiple select and extract (right-click) for multiple tarfiles.

I can't figure out a way to do this from the command line using the tar command (i.e. something like "tar -zcf"), so I will fix the ones on the repository server using a script (so they will be consistent), and rsync them with the acoustic model creation server some other time.

Ken

comment:3 Changed 11 years ago by kmaclean

  • Status changed from new to closed
  • Resolution set to fixed

Script to fix this problem: fixTarfileDirectory.pl

Logs of run on prod: Changeset 2819

comment:4 Changed 11 years ago by kmaclean

Note: See TracTickets for help on using tickets.