python AI

Detecting Natural Language Text Languages [fastText]

Using Python's machine learning (deep learning) natural language processing and natural language recognition library "fastText" to automatically recognize what language a text is. Using the fastText tool, we determine whether text is in Japanese, Chinese, English, French, or what language it is.

Shou Arisaka
2 min read
Nov 25, 2025

Using Python’s machine learning (deep learning) natural language processing and natural language recognition library “fastText” to automatically recognize what language a text is.

Using the fastText tool, we determine whether text is in Japanese, Chinese, English, French, or what language it is.

Image

You can do something like this. “hola” is Spanish, but it’s concerning that it’s showing as English.

For English speakers, Spanish and English might seem similar.

Image

Installation for Python

git clone https://github.com/facebookresearch/fastText.git
cd fastText
pip install .  # sudo pip install . 

Alternative installation method

wget https://github.com/facebookresearch/fastText/archive/v0.9.1.zip
unzip v0.9.1.zip
cd fastText-0.9.1
make

./fasttext 

Download dataset

wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin

Python script

# coding:utf-8

import fasttext
import json 

# model = fasttext.load_model("/home/yuis/pg/python/fastText/lid.176.bin") 
model = fasttext.load_model("/mnt/c/pg/python/fasttext/lid.176.bin") 

def predict_language(text, model, k=1):
  label, prob = model.predict(text, k)
  return list(zip([l.replace("__label__", "") for l in label], prob))

print( json.dumps(predict_language(u'{{text}}', model, k=2)) )

Sample

print( json.dumps(predict_language(u'こんにちは', model, k=2)) )
print( json.dumps(predict_language(u'hello', model, k=2)) )
print( json.dumps(predict_language(u'Zürich', model, k=2)) )

Bash function (feel free to use)

predict_language(){

  : <<<'
  e.g. predict_language "hello"
  e.g. predict_language "こんにちは"
  '

export TEXT="${1}"

cat << 'EOT' | mo > "$PGDIR/python/fasttext/predict_language.py"
# coding:utf-8

import fasttext
import json 

# model = fasttext.load_model("/home/yuis/pg/python/fastText/lid.176.bin") 
model = fasttext.load_model("/mnt/c/pg/python/fasttext/lid.176.bin") 

def predict_language(text, model, k=1):
  label, prob = model.predict(text, k)
  return list(zip([l.replace("__label__", "") for l in label], prob))

print( json.dumps(predict_language(u'{{TEXT}}', model, k=2)) )
EOT

# python "$PGDIR/python/fasttext/predict_language.py" | parsejson "[0][0]"
python "$PGDIR/python/fasttext/predict_language.py" 

}

: Shorthand 
predict_language "hello" 
predict_language "こんにちは" 
predict_language "你好" 
predict_language "hola" 
predict_language "Zürich" 

: JSON parsing 
predict_language "hello" | parsejson "[0][0]"
predict_language "こんにちは" | parsejson "[0][0]"

: Usage example 
if [[ "$( predict_language "hello" | parsejson "[0][0]" )" == "en" ]]; then echo "This language is English." ; fi 
if [[ "$( predict_language "hello" | parsejson "[0][0]" )" == "ja" ]]; then echo "This language is Japanese." ; fi 

Share this article

Shou Arisaka Nov 25, 2025

🔗 Copy Links