wiki:LanguageModelSources

Version 26 (modified by kmaclean, 12 years ago) (diff)

--

DIY corpus using Search Engines (like Google)

Natural Language Toolkit (NLTK)

ARPA

Possible sources of written data (written corpora) for the creation of Language Models

Other Sources but with Licensing Restrictions

Multilingual Copora