voxforge.org
VoxForge Dev

DIY corpus using Search Engines (like Google)

Natural Language Toolkit (NLTK)

ARPA

Possible sources of written data (written corpora) for the creation of Language Models

Other Sources but with Licensing Restrictions

  • TAPoR Text Analysis Portal for Research at the University of Alberta
    • WestburyLAB USENET corpus - Creative Commons Attribution-Non Commercial-No Derivs
  • Google

Multilingual Copora