voxforge.org
VoxForge Dev

Bittorrent TV shows

Possible Audio Sources

(Here is a list of possible sources of Spoken Audio files that might be used for the creation of GPL Acoustic Models)

  • Collaborative subtitling of videos

== Off the wall audio sources ... ===

  • reCAPTCHA - use a similar approach, but get users to transcribe speech audio as the Captcha mechanism - they were talking about usng this approach to transcribe radio programs

Other Possible sources, but with licensing issues:

Articles

Corpora for Non-Commercial use

  • IViE (non-commercial purposes)
  • The Speech Accent Archive (Creative Commons - non-commercial)
  • AMI Meeting Corpus - CC Attribution NonCommercial ShareAlike 2.5 Licence
  • EUSTACE - non-commercial use
  • MOCHATIMIT non-commercial use
  • c-span - CC non-commercial use; includes all congressional hearings and press briefings, federal agency hearings, and presidential events at the White House.
  • Podiobooks.com - Creative Commons Non-Commercial, No-Derivative, Attribution required license.

(Also see Ticket #22)