Skip to main content

Fast and accurate language identifier

Project description

heliport

A language identification tool that aims to be both fast and accurate. Originally started as a HeLI-OTS port to Rust.

Installation

From PyPi (not available yet)

Install it in your environment

pip install heliport

then download the model

heliport-download

From source

Install the requirements:

Clone the repo, build the package and compile the model

git clone https://github.com/ZJaume/heliport
cd heliport
pip install .
heliport-convert

Usage

CLI

Just run the heliport command that reads lines from stdin

cat sentences.txt | heliport
eng_latn
cat_latn
rus_cyrl
...

Python package

>>> from heliport import Identifier
>>> i = Identifier()
>>> i.identify("L'aigua clara")
'cat_latn'

Rust crate

use std::sync::Arc;
use heliport::identifier::Identifier;
use heliport::lang::Lang;
use heliport::load_models;

let (charmodel, wordmodel) = load_models("/dir/to/models")
let identifier = Identifier::new(
    Arc::new(charmodel),
    Arc::new(wordmodel),
    );
let lang, score = identifier.identify("L'aigua clara");
assert_eq!(lang, Lang::cat_Latn);

Benchmarks

Speed benchmarks with 100k random sentences from OpenLID, all the tools running single-threaded:

tool time (s)
CLD2 1.12
HeLI-OTS 60.37
lingua all high preloaded 56.29
lingua all low preloaded 23.34
fasttext openlid193 8.44
heliport 4.72

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

heliport-0.5.0.tar.gz (49.3 MB view details)

Uploaded Source

Built Distributions

heliport-0.5.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

heliport-0.5.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

heliport-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

heliport-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

heliport-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

heliport-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

heliport-0.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

File details

Details for the file heliport-0.5.0.tar.gz.

File metadata

  • Download URL: heliport-0.5.0.tar.gz
  • Upload date:
  • Size: 49.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.7.1

File hashes

Hashes for heliport-0.5.0.tar.gz
Algorithm Hash digest
SHA256 904a42839b9225f78ab0492083f068b7d9756f1943013f14a2605e4b839b7286
MD5 c7c0b44508a3c7613c7895ef79481568
BLAKE2b-256 2771e37c419b6cccdc53b4e474ef9af9637668aaf881e1ed3b2b044663e7a691

See more details on using hashes here.

File details

Details for the file heliport-0.5.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.5.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8cd45ed73bd2f2b6a81b16cffabcd26fb9c9e3a1e024333083d9ccd218bcb011
MD5 4994ddce4c404189b3bfd35ef7ac0a44
BLAKE2b-256 131bd03517f3c3013fc045e247f5bfb298286eab81b318e71e9247dc9f14f996

See more details on using hashes here.

File details

Details for the file heliport-0.5.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.5.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e8dcc6e48306c103f29ce85cddb3289f5b8b694c919017a99529e5d016c5ad6a
MD5 4ac141006b5d83be773fff1295878c2e
BLAKE2b-256 5b1f94857403dd6aaf9fa4edfac7768e677e16e93ac6a13addfc222a96c2251f

See more details on using hashes here.

File details

Details for the file heliport-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a71cc32b48381918cdaf700ab5fabf5cd2442242853ac7b644e889e6d03336c3
MD5 ec920dbb2ce7404e1c3431e1298a0eeb
BLAKE2b-256 29259d3951c5f782698defc117132f56b4ad0a7aca9e0e6dcae77ba1afa2db1a

See more details on using hashes here.

File details

Details for the file heliport-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 76e52f00457d42c96c659685e3e0b8ad62e534e265ac201f4bc26d96d70dcc5d
MD5 674c694dc24e6f4d4c94a092dfb4b713
BLAKE2b-256 98c6c8c4299ff04f42012b5ae1d53f8640ceb95cefaff28e028d3e80c9a5d234

See more details on using hashes here.

File details

Details for the file heliport-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f2077158441777e4dce4c3dcda5c90c812e5ceea60b631ccda72b4450858e6c2
MD5 18555f8012818303f68911cfc142a912
BLAKE2b-256 3e35f08317b75df27c43e02ef7283ee3e8ed369bab1c25ad177ff018d2031e17

See more details on using hashes here.

File details

Details for the file heliport-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 90c5561c7f5f8400f25f66ff9dd938fd33b9e61359d4fc618ee19709616ffebb
MD5 6e4d0f5f3865dcf4c1ddb1a2aecf2abf
BLAKE2b-256 5b6bfef7b5de53009cc8b1c235931eb8f0ce1d7d146f8b14687e4acc99da22b6

See more details on using hashes here.

File details

Details for the file heliport-0.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 aa5fc32ff807040283d206ac0971bec77513e28ff3e09e32d21dee35f0db77d7
MD5 e4888850275d7452b6c27831690bf314
BLAKE2b-256 b1bc8f46423f88aa8a180b8acf6f651b3beb562a88d238d53c263258d0dd0fc2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page