Skip to main content

Position-aware, cross-lingually aligned word embeddings built on FastText

Project description

BabelVec

Position-aware, cross-lingually aligned word embeddings built on FastText.

PyPI version License Python 3.9+

Features

  • Position-Aware Embeddings: Word order matters! Uses RoPE, sinusoidal, or decay positional encoding
  • Cross-Lingual Alignment: Ensemble alignment (Procrustes + InfoNCE) for multilingual compatibility
  • FastText Foundation: Handles OOV words through subword information
  • Multiple Training Modes: Monolingual, multilingual, or post-hoc alignment

Installation

pip install babelvec

For visualization support:

pip install babelvec[viz]

Quick Start

from babelvec import BabelVec

# Load a model
model = BabelVec.load('path/to/model.bin')

# Get word vector
vec = model.get_word_vector("hello")

# Position-aware sentence embedding (order matters)
vec1 = model.get_sentence_vector("The dog bites the man", method='rope')
vec2 = model.get_sentence_vector("The man bites the dog", method='rope')
# vec1 != vec2 because word order is different!

# Standard averaging (order-agnostic)
vec_avg = model.get_sentence_vector("The dog bites the man", method='average')

Training

Monolingual Training

from babelvec.training import train_monolingual

model = train_monolingual(
    lang='en',
    corpus_path='corpus.txt',
    dim=300,
    epochs=5
)
model.save('en_300d.bin')

Multilingual Training with Alignment

from babelvec.training import train_multilingual

model = train_multilingual(
    languages=['en', 'fr', 'de'],
    corpus_paths={'en': 'en.txt', 'fr': 'fr.txt', 'de': 'de.txt'},
    dim=300,
    alignment='ensemble'
)

Post-hoc Alignment

from babelvec.training import align_models

aligned = align_models(
    models={'en': model_en, 'fr': model_fr},
    method='ensemble',
    parallel_data=parallel_sentences
)

Positional Encoding Methods

Method Description Use Case
average Simple averaging (no position) Bag-of-words tasks
rope Rotary Position Embedding Best for semantic similarity
sinusoidal Transformer-style positional General purpose
decay Exponential position decay Emphasis on early words

Citation

@software{babelvec2025,
  title = {BabelVec: Position-Aware Cross-Lingual Word Embeddings},
  author = {Kamali, Omar},
  year = {2025},
  url = {https://github.com/omarkamali/babelvec}
}

License

MIT License - see LICENSE for details.

Copyright © 2025 Omar Kamali

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babelvec-0.1.0.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

babelvec-0.1.0-py3-none-any.whl (37.9 kB view details)

Uploaded Python 3

File details

Details for the file babelvec-0.1.0.tar.gz.

File metadata

  • Download URL: babelvec-0.1.0.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for babelvec-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c9f2170c5ec5d320a6cd41768ffa0ee9ca611a49399d6ffe1703eb64659da809
MD5 3de8d47f246cab2cded1fd15fe52779a
BLAKE2b-256 000531c7b4a34ae4bc16f2ce62483ca557c1159f3801c89a4b8fc17994498136

See more details on using hashes here.

Provenance

The following attestation bundles were made for babelvec-0.1.0.tar.gz:

Publisher: publish.yml on omarkamali/babelvec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file babelvec-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: babelvec-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 37.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for babelvec-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4361d5ea3217cab5bf54db20d3d7d33fd37cc6d670cf5c7ffb6f237d7d88878e
MD5 97f4c6123aa724d932b561bb27649f2a
BLAKE2b-256 c94b11b925036e0bf47faec1fb5885876ff676af1ba270751e121c3ec26ef010

See more details on using hashes here.

Provenance

The following attestation bundles were made for babelvec-0.1.0-py3-none-any.whl:

Publisher: publish.yml on omarkamali/babelvec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page