Using Python’s machine learning (deep learning) natural language processing and natural language recognition library “fastText” to automatically recognize what language a text is.
Using the fastText tool, we determine whether text is in Japanese, Chinese, English, French, or what language it is.
- fastText/language-identification.md at master · facebookresearch/fastText
- facebookresearch/fastText: Library for fast text representation and classification.
- Language detection · Issue #878 · facebookresearch/fastText

You can do something like this. “hola” is Spanish, but it’s concerning that it’s showing as English.
For English speakers, Spanish and English might seem similar.

Installation for Python
git clone https://github.com/facebookresearch/fastText.git
cd fastText
pip install . # sudo pip install .
Alternative installation method
wget https://github.com/facebookresearch/fastText/archive/v0.9.1.zip
unzip v0.9.1.zip
cd fastText-0.9.1
make
./fasttext
Download dataset
wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin
Python script
# coding:utf-8
import fasttext
import json
# model = fasttext.load_model("/home/yuis/pg/python/fastText/lid.176.bin")
model = fasttext.load_model("/mnt/c/pg/python/fasttext/lid.176.bin")
def predict_language(text, model, k=1):
label, prob = model.predict(text, k)
return list(zip([l.replace("__label__", "") for l in label], prob))
print( json.dumps(predict_language(u'{{text}}', model, k=2)) )
Sample
print( json.dumps(predict_language(u'こんにちは', model, k=2)) )
print( json.dumps(predict_language(u'hello', model, k=2)) )
print( json.dumps(predict_language(u'Zürich', model, k=2)) )
Bash function (feel free to use)
predict_language(){
: <<<'
e.g. predict_language "hello"
e.g. predict_language "こんにちは"
'
export TEXT="${1}"
cat << 'EOT' | mo > "$PGDIR/python/fasttext/predict_language.py"
# coding:utf-8
import fasttext
import json
# model = fasttext.load_model("/home/yuis/pg/python/fastText/lid.176.bin")
model = fasttext.load_model("/mnt/c/pg/python/fasttext/lid.176.bin")
def predict_language(text, model, k=1):
label, prob = model.predict(text, k)
return list(zip([l.replace("__label__", "") for l in label], prob))
print( json.dumps(predict_language(u'{{TEXT}}', model, k=2)) )
EOT
# python "$PGDIR/python/fasttext/predict_language.py" | parsejson "[0][0]"
python "$PGDIR/python/fasttext/predict_language.py"
}
: Shorthand
predict_language "hello"
predict_language "こんにちは"
predict_language "你好"
predict_language "hola"
predict_language "Zürich"
: JSON parsing
predict_language "hello" | parsejson "[0][0]"
predict_language "こんにちは" | parsejson "[0][0]"
: Usage example
if [[ "$( predict_language "hello" | parsejson "[0][0]" )" == "en" ]]; then echo "This language is English." ; fi
if [[ "$( predict_language "hello" | parsejson "[0][0]" )" == "ja" ]]; then echo "This language is Japanese." ; fi