Swarmauri Bert Embedding Parser
Project description
Swarmauri Parser Bert Embedding
Parser that converts text into embeddings using a Hugging Face BERT encoder. Produces Document objects whose metadata carries the averaged token embedding so downstream Swarmauri pipelines can work with dense vectors.
Features
- Uses
transformers.BertModel+BertTokenizer(defaultbert-base-uncased). - Accepts single strings or lists of strings and emits
Documentinstances with original text and embedding metadata. - Runs in inference (
eval) mode with automatictorch.no_grad()handling. - Works on CPU by default; configure PyTorch device settings to leverage GPU.
Prerequisites
- Python 3.10 or newer.
- PyTorch compatible with your hardware (installed automatically via
transformersif not present; install CUDA-enabled wheels manually when needed). - Internet access on first run so Hugging Face downloads tokenizer/model weights (or warm the cache ahead of time).
Installation
# pip
pip install swarmauri_parser_bertembedding
# poetry
poetry add swarmauri_parser_bertembedding
# uv (pyproject-based projects)
uv add swarmauri_parser_bertembedding
Quickstart
from swarmauri_parser_bertembedding import BERTEmbeddingParser
parser = BERTEmbeddingParser(parser_model_name="bert-base-uncased")
documents = parser.parse([
"Swarmauri agents cooperate over shared memory.",
"Dense embeddings power semantic search.",
])
for doc in documents:
vector = doc.metadata["embedding"]
print(doc.content)
print(len(vector), vector[:5])
Custom Models & Devices
import torch
from swarmauri_parser_bertembedding import BERTEmbeddingParser
from transformers import BertModel
class GPUParser(BERTEmbeddingParser):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self._model = BertModel.from_pretrained(self.parser_model_name).to("cuda")
parser = GPUParser(parser_model_name="bert-base-multilingual-cased")
parser._model.eval()
Batch Embeddings at Scale
from tqdm import tqdm
from swarmauri_parser_bertembedding import BERTEmbeddingParser
texts = [f"Paragraph {i}" for i in range(1000)]
parser = BERTEmbeddingParser()
batched_docs = []
batch_size = 32
for start in tqdm(range(0, len(texts), batch_size)):
batch = texts[start:start + batch_size]
batched_docs.extend(parser.parse(batch))
Persist the resulting vectors into Swarmauri vector stores (Redis, Qdrant, etc.) via the metadata field.
Tips
- Preprocess text to match model expectations (lowercase for uncased BERT, language-specific cleanup for multilingual models).
- For extremely long documents, consider chunking before calling
parseto respect the 512 token limit. - Use PyTorch's
to("cuda")orto("mps")to execute on GPUs or Apple silicon accelerators. - Cache Hugging Face weights in CI/CD environments (
HF_HOME=/cache/hf) to avoid repeated downloads.
Want to help?
If you want to contribute to swarmauri-sdk, read up on our guidelines for contributing that will help you get started.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swarmauri_parser_bertembedding-0.8.3.dev10.tar.gz.
File metadata
- Download URL: swarmauri_parser_bertembedding-0.8.3.dev10.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3c4e0c6408e4fafc94bf268b7dcaa931c77367229f730495a892a2f8de63857
|
|
| MD5 |
438a002cc0cbfec8564c4cbd17a48d05
|
|
| BLAKE2b-256 |
1979d75895588b778e91c9f1709746d60a664af00a8a90f757c15e666ffda0cf
|
File details
Details for the file swarmauri_parser_bertembedding-0.8.3.dev10-py3-none-any.whl.
File metadata
- Download URL: swarmauri_parser_bertembedding-0.8.3.dev10-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e58d0162742067ef4df23a6cbf3d7b52ec909405361678d34253b73cfd6adb6
|
|
| MD5 |
ce833c32519d1b3cd7d514bad96c1112
|
|
| BLAKE2b-256 |
ba96950aa80f1f169d9af248df25776580932cf556e8aaebe4d6d84f13128bc3
|