Skip to main content

Position-aware, cross-lingually aligned word embeddings built on FastText

Project description

BabelVec

Position-aware, cross-lingually aligned word embeddings built on FastText.

PyPI version License Python 3.9+

Features

  • Position-Aware Embeddings: Word order matters! Uses RoPE, sinusoidal, or decay positional encoding
  • Cross-Lingual Alignment: Ensemble alignment (Procrustes + InfoNCE) for multilingual compatibility
  • FastText Foundation: Handles OOV words through subword information
  • Multiple Training Modes: Monolingual, multilingual, or post-hoc alignment

Installation

pip install babelvec

For visualization support:

pip install babelvec[viz]

Quick Start

from babelvec import BabelVec

# Load a model
model = BabelVec.load('path/to/model.bin')

# Get word vector
vec = model.get_word_vector("hello")

# Position-aware sentence embedding (order matters)
vec1 = model.get_sentence_vector("The dog bites the man", method='rope')
vec2 = model.get_sentence_vector("The man bites the dog", method='rope')
# vec1 != vec2 because word order is different!

# Standard averaging (order-agnostic)
vec_avg = model.get_sentence_vector("The dog bites the man", method='average')

Training

Monolingual Training

from babelvec.training import train_monolingual

model = train_monolingual(
    lang='en',
    corpus_path='corpus.txt',
    dim=300,
    epochs=5
)
model.save('en_300d.bin')

Multilingual Training with Alignment

from babelvec.training import train_multilingual

model = train_multilingual(
    languages=['en', 'fr', 'de'],
    corpus_paths={'en': 'en.txt', 'fr': 'fr.txt', 'de': 'de.txt'},
    dim=300,
    alignment='ensemble'
)

Post-hoc Alignment

from babelvec.training import align_models

aligned = align_models(
    models={'en': model_en, 'fr': model_fr},
    method='ensemble',
    parallel_data=parallel_sentences
)

Positional Encoding Methods

Method Description Use Case
average Simple averaging (no position) Bag-of-words tasks
rope Rotary Position Embedding Best for semantic similarity
sinusoidal Transformer-style positional General purpose
decay Exponential position decay Emphasis on early words

Citation

@software{babelvec2025,
  title = {BabelVec: Position-Aware Cross-Lingual Word Embeddings},
  author = {Kamali, Omar},
  year = {2025},
  url = {https://github.com/omarkamali/babelvec}
}

License

MIT License - see LICENSE for details.

Copyright © 2025 Omar Kamali

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babelvec-0.1.2.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

babelvec-0.1.2-py3-none-any.whl (37.9 kB view details)

Uploaded Python 3

File details

Details for the file babelvec-0.1.2.tar.gz.

File metadata

  • Download URL: babelvec-0.1.2.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for babelvec-0.1.2.tar.gz
Algorithm Hash digest
SHA256 faf8857e5dcc4f5aeb08223f7e42d1a51121600c5b80acb81f98b4d2ff4024b9
MD5 ec34498388d7cf3cb730e1ff7b32dc8d
BLAKE2b-256 bb132a33a7f5082805520f03b33ad033ab5aec1a4d6c78c550fb19db1baeb78a

See more details on using hashes here.

Provenance

The following attestation bundles were made for babelvec-0.1.2.tar.gz:

Publisher: publish.yml on omarkamali/babelvec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file babelvec-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: babelvec-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 37.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for babelvec-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 834b18becba761da83799dd0ce56a00809f65b40a0b2448f660dbb51680912ed
MD5 5b0710921055b3b30771d2c69104ba8b
BLAKE2b-256 2c7f022a212c9813a0e00ebd36189562c8dd5f48ecca6643d6c6b87e8a67da0e

See more details on using hashes here.

Provenance

The following attestation bundles were made for babelvec-0.1.2-py3-none-any.whl:

Publisher: publish.yml on omarkamali/babelvec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page