Skip to main content

Text language detection basic on trigrams.

Project description

Pyfranc

Text language detection basic on trigrams. Support 403 language from franc-all

Install

This package is tested in Python 3.8, but should work on the whole 3rd revision of Python.

pip:

pip install pyfranc

Use

How library

from pyfranc import franc

franc.land_detect('Alle menslike wesens word vry')[0][0] # 'afr'
franc.lang_detect('এটি একটি ভাষা একক IBM স্ক্রিপ্ট')[0][0]  # 'ben'
franc.lang_detect('Alle menneske er fødde til fridom')[0][0] # 'nno'
franc.lang_detect('')[0][0] # 'und'

# You can change what’s too short (default: 10):
franc.lang_detect('the')[0][0] # 'und'
franc('the', minlength=3)[0][0] # 'sco'

[0][0] has taken first value (iso code lang) in first element in output array.

whitelist

franc.lang_detect('Considerando ser essencial que os direitos humanos', whitelist = ['por', 'spa'])
# [['por', 1], ['spa', 0.6034146900423971]]

blacklist

franc.lang_detect('Considerando ser essencial que os direitos humanos', blacklist = ['src', 'glg'])

#[['por', 1],
# ['ina', 0.6211756617394293], 
# ['spa', 0.6034146900423971], 
# ['ast', 0.5628509224246592], 
# ['oci', 0.5583820327718574]
# ... 310 more items]

How CLI

CLI to detect the language of text.

usage: pyfranc_cli [-h] --string STRING [--top TOP] [--minlength MINLENGTH]
                   [--whitelist [WHITELIST [WHITELIST ...]]]
                   [--blacklist [BLACKLIST [BLACKLIST ...]]] [--percentage]

optional arguments:
  -h, --help            	show this help message and exit
  -s, --string 		string	Input string.
  -t, --top   		int  	Print top results (Default: 5)).
  -m, --minlength 	int		Minimum string length to accept (Default: 10).
  -w, --whitelist   [WHITELIST [WHITELIST ...]]
							Allow languages.
  -b, --blacklist   [BLACKLIST [BLACKLIST ...]]
							Disallow languages.
  -p, --percentage  bool     	Print relative match value (in percent).

usage:

# output language
$ pyfranc_cli -t1 -s "Alle menslike wesens word vry"
# 'afr' : 1.0

# output language from stdin (expects utf8)
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | pyfranc_cli -t 1 -s $0
# 'ben' : 1.0

# ignore certain languages
$ pyfranc_cli --blacklist por glg "O Brasil caiu 26 posições"
# 'vec' : 1.0

# output language from stdin with only
$ echo "Alle mennesker er født frie og" | pyfranc_cli -t 1 --whitelist nob dan -s $0
# 'nob' : 1.0'

Derivation

Pyfranc is a outright port from Franc (JavaScript, MIT), trigram-utils (JavaScript, MIT), collapse-white-space (JavaScript, MIT), and n-gram (JavaScript, MIT). All this by Titus Wormer.

License

MIT © cyb3rk0tik

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfranc-0.1.1.tar.gz (264.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyfranc-0.1.1-py3-none-any.whl (262.9 kB view details)

Uploaded Python 3

File details

Details for the file pyfranc-0.1.1.tar.gz.

File metadata

  • Download URL: pyfranc-0.1.1.tar.gz
  • Upload date:
  • Size: 264.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.0

File hashes

Hashes for pyfranc-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cd82193013820948382d3fadbed5e34147d7dd3d1f69c625b20b5dcee6b92a9a
MD5 6205bee6dad1d585ce8d12cc5ff7621a
BLAKE2b-256 9486d74c4c8f9222803078626ff42910f80177fd9438fbe7fa454ded480a2d49

See more details on using hashes here.

File details

Details for the file pyfranc-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyfranc-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 262.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.0

File hashes

Hashes for pyfranc-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b3b37376743b9919f9a04276c545a1f99a5532db58788eef30104457a1d12e07
MD5 5d70e5cad787de79cf4d0fa01e95e918
BLAKE2b-256 bbcf8b663acdd2267c3a88f4fc198bcd42363855495ff4cf42b1e6c926d27eeb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page