Natural Language Text Classifier

This is a demonstration of a Natural Language Processing (NLP) algorithm to determine the natural language (English, French, etc) that a document is written in.

I leaned of this algorithm from Louis Monier, who said he originally used this in AltaVista. The algorithm seems to be ~90% accurate with 100 or more characters of text, and is no more accurate after ~2000 characters. It's currently limited to 11 common European languages, but could be trained to identify more.

To try the code, either enter a URL or the text of a document below and click 'Submit'.

Test Links for known languages: en nl fr de it pt es
Test Links for unknown languages: pl ru ja


Input

text...


Output

The natural language is not en, with margin of confidence 0.000. (Took 0.000 milliseconds.)
The natural language is unknown, with confidence 0.000. (Took 0.000 milliseconds.)
Note: Natural languages known about are (it, fr, de, sv, fi, es, pt, el, da, nl, en) and confidence ranges from 0.0 to 1.0.