Text language detection basic on trigrams.
Project description
Pyfranc
Text language detection basic on trigrams. Support 403 language from franc-all
Install
This package is tested in Python 3.8, but should work on the whole 3rd revision of Python.
pip:
pip install pyfranc
Use
How library
from pyfranc import franc
franc.land_detect('Alle menslike wesens word vry')[0][0] # 'afr'
franc.lang_detect('এটি একটি ভাষা একক IBM স্ক্রিপ্ট')[0][0] # 'ben'
franc.lang_detect('Alle menneske er fødde til fridom')[0][0] # 'nno'
franc.lang_detect('')[0][0] # 'und'
# You can change what’s too short (default: 10):
franc.lang_detect('the')[0][0] # 'und'
franc('the', minlength=3)[0][0] # 'sco'
[0][0] has taken first value (iso code lang) in first element in output array.
whitelist
franc.lang_detect('Considerando ser essencial que os direitos humanos', whitelist = ['por', 'spa'])
# [['por', 1], ['spa', 0.6034146900423971]]
blacklist
franc.lang_detect('Considerando ser essencial que os direitos humanos', blacklist = ['src', 'glg'])
#[['por', 1],
# ['ina', 0.6211756617394293],
# ['spa', 0.6034146900423971],
# ['ast', 0.5628509224246592],
# ['oci', 0.5583820327718574]
# ... 310 more items]
How CLI
CLI to detect the language of text.
usage: pyfranc_cli [-h] --string STRING [--top TOP] [--minlength MINLENGTH]
[--whitelist [WHITELIST [WHITELIST ...]]]
[--blacklist [BLACKLIST [BLACKLIST ...]]] [--percentage]
optional arguments:
-h, --help show this help message and exit
-s, --string string Input string.
-t, --top int Print top results (Default: 5)).
-m, --minlength int Minimum string length to accept (Default: 10).
-w, --whitelist [WHITELIST [WHITELIST ...]]
Allow languages.
-b, --blacklist [BLACKLIST [BLACKLIST ...]]
Disallow languages.
-p, --percentage bool Print relative match value (in percent).
usage:
# output language
$ pyfranc_cli -t1 -s "Alle menslike wesens word vry"
# 'afr' : 1.0
# output language from stdin (expects utf8)
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | pyfranc_cli -t 1 -s $0
# 'ben' : 1.0
# ignore certain languages
$ pyfranc_cli --blacklist por glg "O Brasil caiu 26 posições"
# 'vec' : 1.0
# output language from stdin with only
$ echo "Alle mennesker er født frie og" | pyfranc_cli -t 1 --whitelist nob dan -s $0
# 'nob' : 1.0'
Derivation
Pyfranc is a outright port from Franc (JavaScript, MIT), trigram-utils (JavaScript, MIT), collapse-white-space (JavaScript, MIT), and n-gram (JavaScript, MIT). All this by Titus Wormer.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyfranc-0.1.1.tar.gz.
File metadata
- Download URL: pyfranc-0.1.1.tar.gz
- Upload date:
- Size: 264.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd82193013820948382d3fadbed5e34147d7dd3d1f69c625b20b5dcee6b92a9a
|
|
| MD5 |
6205bee6dad1d585ce8d12cc5ff7621a
|
|
| BLAKE2b-256 |
9486d74c4c8f9222803078626ff42910f80177fd9438fbe7fa454ded480a2d49
|
File details
Details for the file pyfranc-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pyfranc-0.1.1-py3-none-any.whl
- Upload date:
- Size: 262.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3b37376743b9919f9a04276c545a1f99a5532db58788eef30104457a1d12e07
|
|
| MD5 |
5d70e5cad787de79cf4d0fa01e95e918
|
|
| BLAKE2b-256 |
bbcf8b663acdd2267c3a88f4fc198bcd42363855495ff4cf42b1e6c926d27eeb
|