Position-aware, cross-lingually aligned word embeddings built on FastText
Project description
BabelVec
Position-aware, cross-lingually aligned word embeddings built on FastText.
Features
- Position-Aware Embeddings: Word order matters! Uses RoPE, sinusoidal, or decay positional encoding
- Cross-Lingual Alignment: Ensemble alignment (Procrustes + InfoNCE) for multilingual compatibility
- FastText Foundation: Handles OOV words through subword information
- Multiple Training Modes: Monolingual, multilingual, or post-hoc alignment
Installation
pip install babelvec
For visualization support:
pip install babelvec[viz]
Quick Start
from babelvec import BabelVec
# Load a model
model = BabelVec.load('path/to/model.bin')
# Get word vector
vec = model.get_word_vector("hello")
# Position-aware sentence embedding (order matters)
vec1 = model.get_sentence_vector("The dog bites the man", method='rope')
vec2 = model.get_sentence_vector("The man bites the dog", method='rope')
# vec1 != vec2 because word order is different!
# Standard averaging (order-agnostic)
vec_avg = model.get_sentence_vector("The dog bites the man", method='average')
Training
Monolingual Training
from babelvec.training import train_monolingual
model = train_monolingual(
lang='en',
corpus_path='corpus.txt',
dim=300,
epochs=5
)
model.save('en_300d.bin')
Multilingual Training with Alignment
from babelvec.training import train_multilingual
model = train_multilingual(
languages=['en', 'fr', 'de'],
corpus_paths={'en': 'en.txt', 'fr': 'fr.txt', 'de': 'de.txt'},
dim=300,
alignment='ensemble'
)
Post-hoc Alignment
from babelvec.training import align_models
aligned = align_models(
models={'en': model_en, 'fr': model_fr},
method='ensemble',
parallel_data=parallel_sentences
)
Positional Encoding Methods
| Method | Description | Use Case |
|---|---|---|
average |
Simple averaging (no position) | Bag-of-words tasks |
rope |
Rotary Position Embedding | Best for semantic similarity |
sinusoidal |
Transformer-style positional | General purpose |
decay |
Exponential position decay | Emphasis on early words |
Citation
@software{babelvec2025,
title = {BabelVec: Position-Aware Cross-Lingual Word Embeddings},
author = {Kamali, Omar},
year = {2025},
url = {https://github.com/omarkamali/babelvec}
}
License
MIT License - see LICENSE for details.
Copyright © 2025 Omar Kamali
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file babelvec-0.1.0.tar.gz.
File metadata
- Download URL: babelvec-0.1.0.tar.gz
- Upload date:
- Size: 28.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9f2170c5ec5d320a6cd41768ffa0ee9ca611a49399d6ffe1703eb64659da809
|
|
| MD5 |
3de8d47f246cab2cded1fd15fe52779a
|
|
| BLAKE2b-256 |
000531c7b4a34ae4bc16f2ce62483ca557c1159f3801c89a4b8fc17994498136
|
Provenance
The following attestation bundles were made for babelvec-0.1.0.tar.gz:
Publisher:
publish.yml on omarkamali/babelvec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
babelvec-0.1.0.tar.gz -
Subject digest:
c9f2170c5ec5d320a6cd41768ffa0ee9ca611a49399d6ffe1703eb64659da809 - Sigstore transparency entry: 774609952
- Sigstore integration time:
-
Permalink:
omarkamali/babelvec@69741ee55eecec6d543f1e1ea49541930969b5f1 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/omarkamali
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@69741ee55eecec6d543f1e1ea49541930969b5f1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file babelvec-0.1.0-py3-none-any.whl.
File metadata
- Download URL: babelvec-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4361d5ea3217cab5bf54db20d3d7d33fd37cc6d670cf5c7ffb6f237d7d88878e
|
|
| MD5 |
97f4c6123aa724d932b561bb27649f2a
|
|
| BLAKE2b-256 |
c94b11b925036e0bf47faec1fb5885876ff676af1ba270751e121c3ec26ef010
|
Provenance
The following attestation bundles were made for babelvec-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on omarkamali/babelvec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
babelvec-0.1.0-py3-none-any.whl -
Subject digest:
4361d5ea3217cab5bf54db20d3d7d33fd37cc6d670cf5c7ffb6f237d7d88878e - Sigstore transparency entry: 774609953
- Sigstore integration time:
-
Permalink:
omarkamali/babelvec@69741ee55eecec6d543f1e1ea49541930969b5f1 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/omarkamali
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@69741ee55eecec6d543f1e1ea49541930969b5f1 -
Trigger Event:
release
-
Statement type: