Skip to main content

FastText-based multilingual language identification with HuggingFace integration

Project description

Gherbal

FastText-based multilingual language identification with HuggingFace Hub integration.

Supports 200+ languages including fine-grained Arabic dialect detection.

Installation

pip install gherbal

Quick Start

from gherbal import Gherbal

# Load from HuggingFace Hub
model = Gherbal.from_pretrained("omarkamali/gherbal")

# Predict language
model.predict("Hello, how are you?")
# => [('eng_Latn', 0.99)]

model.predict("مرحبا كيف حالك")
# => [('arb_Arab', 0.95)]

Loading a Local Model

model = Gherbal.from_pretrained("./path/to/model")

Training

import pandas as pd
from gherbal import Gherbal

df = pd.DataFrame({"text": [...], "label": [...]})
model = Gherbal.train(df, save_path="./my_model")

Preprocessing

from gherbal import preprocess_text, create_clean_script_function

text = preprocess_text("Hello @user https://example.com 🎉")
# => "hello"

# Script-aware cleaning
clean = create_clean_script_function()
clean("Latn", "Hello World 123")
# => "Hello World"

Pushing to HuggingFace Hub

model.push_to_hub("username/my-gherbal-model")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gherbal-1.0.1.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gherbal-1.0.1-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file gherbal-1.0.1.tar.gz.

File metadata

  • Download URL: gherbal-1.0.1.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gherbal-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0cff6694f84b11315ead028e965f1e46cf59fb9ea64bafed5788e07f15b84b2c
MD5 02394fa30d26c326b1afa8e3545f06ff
BLAKE2b-256 45e1c8f3b2d1fa8c0a6c063259a22df9c4faf22b1e84d3b1912d11a005521d28

See more details on using hashes here.

Provenance

The following attestation bundles were made for gherbal-1.0.1.tar.gz:

Publisher: publish.yml on omneity-labs/gherbal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gherbal-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: gherbal-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gherbal-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c8e1b47fc3a697d9bc54dcaa07fab18101595a14fe1f55ebfac0c62953587cb9
MD5 456c5e041d6f371e65b101e34c70f604
BLAKE2b-256 15162ef3e70254d50ba63aebcc4d096b0723f0e84261560e7dd1875c6bea90bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for gherbal-1.0.1-py3-none-any.whl:

Publisher: publish.yml on omneity-labs/gherbal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page