Skip to main content

Fast and accurate language identifier

Project description

heliport

A language identification tool that aims to be both fast and accurate. Originally started as a HeLI-OTS port to Rust.

Installation

From PyPi

Install it in your environment

pip install heliport

then download the model

heliport-download

From source

Install the requirements:

Clone the repo, build the package and compile the model

git clone https://github.com/ZJaume/heliport
cd heliport
pip install .
heliport-convert

Usage

CLI

Just run the heliport command that reads lines from stdin

cat sentences.txt | heliport
eng_latn
cat_latn
rus_cyrl
...

Python package

>>> from heliport import Identifier
>>> i = Identifier()
>>> i.identify("L'aigua clara")
'cat_latn'

Rust crate

use std::sync::Arc;
use heliport::identifier::Identifier;
use heliport::lang::Lang;
use heliport::load_models;

let (charmodel, wordmodel) = load_models("/dir/to/models")
let identifier = Identifier::new(
    Arc::new(charmodel),
    Arc::new(wordmodel),
    );
let lang, score = identifier.identify("L'aigua clara");
assert_eq!(lang, Lang::cat_Latn);

Benchmarks

Speed benchmarks with 100k random sentences from OpenLID, all the tools running single-threaded:

tool time (s)
CLD2 1.12
HeLI-OTS 60.37
lingua all high preloaded 56.29
lingua all low preloaded 23.34
fasttext openlid193 8.44
heliport 2.33

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

heliport-0.6.0.tar.gz (49.3 MB view details)

Uploaded Source

Built Distributions

heliport-0.6.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

heliport-0.6.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

heliport-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

heliport-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

heliport-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

heliport-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

heliport-0.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

File details

Details for the file heliport-0.6.0.tar.gz.

File metadata

  • Download URL: heliport-0.6.0.tar.gz
  • Upload date:
  • Size: 49.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.7.1

File hashes

Hashes for heliport-0.6.0.tar.gz
Algorithm Hash digest
SHA256 c626a5457b06a6ba129fdb300a67671da1f0b7a32673930220ac1ddc693085d1
MD5 50f822abec84afc726e65ab7ddcc7558
BLAKE2b-256 38970e7bb92759022da9f89754372c4883ddb2f2d492ec0eb24ba520059f66c4

See more details on using hashes here.

File details

Details for the file heliport-0.6.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.6.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1efe0012d061a06f70ff1c1d1ca06c2b0ee8698434d507ef49c4045d6d3bcf1b
MD5 e4ad78a7039bed039e2cd8c3fd7ebc33
BLAKE2b-256 078caa4c8de0d5d374684b5d36ab58a7bfee2f11c759f857e3eca3d5cf76a6f5

See more details on using hashes here.

File details

Details for the file heliport-0.6.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.6.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b6b0e53e2dec4956914f833817062372d81deb2df7ca04f0b5f04ac2f0ec7401
MD5 ff4d262df6c581c2fc86d1812fd531d6
BLAKE2b-256 5d3a48aee4c61acc13c05ff4d6ac9a2ebb8b8d964b543c29d81b0b7ede20ef76

See more details on using hashes here.

File details

Details for the file heliport-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 36ee3d1fbea81161f7aa64aa83cf9dbd58a5fd8fbb6f34819badfe769b1a69d1
MD5 197fea0566128e8efd9bef7466bffb32
BLAKE2b-256 5aba62571508589df58bf69665d62508c9bdc63ac1f17a89b6deca3b37a4b0cb

See more details on using hashes here.

File details

Details for the file heliport-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a0c29e4c6fb1d37ef969f1b5beb2c6fdebc41f2159516a2b759bc47fbb0a2896
MD5 7955e197a0ca8ccb52c5b17adf8f5379
BLAKE2b-256 034df1b011e4f487606177679cf5b0b7e7d093169bd863a14922b07dd5519d5f

See more details on using hashes here.

File details

Details for the file heliport-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a8a77a9eb7a3af7f2dba44c1eb5208b58eee319fcdfe81796198d4c3373102ab
MD5 f8dd7a2963a03c3093be770ccdaaef28
BLAKE2b-256 d0b838fe54935898c97b0cff54bfb8b2a80998b8c35299ea835ee001c4f34159

See more details on using hashes here.

File details

Details for the file heliport-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9d94eae8d24cc62b3506cdfbc7f58bb1c87755677c09d512ef250847615a7e7c
MD5 11280e572ca5e2f9b795063f23356dd3
BLAKE2b-256 6d9dbf5dd8ab21441bb501080bed240b8bdde81f7022908e8cb414da8db673b2

See more details on using hashes here.

File details

Details for the file heliport-0.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for heliport-0.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 96d1d3d1665b2e7acd2902d9fa5cde0cd475afb5266edba1c6684c2780d1a14a
MD5 b2cacc8faa30ba564f55b3e868fe230d
BLAKE2b-256 ba06aadcff9cbbb0c4ea681ee6e9274c6aa7972d9da627d1db3cfc98edc237de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page