Embeddings Flow -- Tools for workflows involving semantic embeddings

These details have not been verified by PyPI

Project links

Homepage

Project description

ef — Embedding Flow

A facade for boilerplate-less semantic search, corpus indexing, and RAG-plug-in readiness.

ef makes the modern embedding pipeline — corpus → segment → embed → vector store → retrieve — usable with progressive disclosure: the light case (a list of strings, search in one or two lines) and the heavy case (huge corpora, many segmentations and embedders, varied sources and vector DBs) share one facade. ef is not a RAG framework — it returns ranked context; you bring your own LLM.

import ef

index = ef.ingest([
    "The cat sat on the mat",
    "Dogs are loyal companions",
    "Neural networks learn from data",
])

for hit in index.search("loyal dogs", limit=2):
    print(hit.score, hit.segment["text"])

That is the whole light path — no configuration, no install beyond ef itself. ingest returns a SearchableCorpus ready to search.

Installation

pip install ef                       # core: search, indexing, refresh, eval
pip install "ef[openai]"             # OpenAI embeddings
pip install "ef[sentence-transformers]"  # local sentence-transformers embeddings
pip install "ef[explore]"            # the L5 explore layer (UMAP, HDBSCAN)
pip install "ef[imbed]"              # imbed-backed components & cluster labelling

The core install needs only numpy (plus dol, i2, vd). The default embedder is dependency-free — feature hashing, lexical not semantic; for real semantic search pass a sentence-transformers or provider embedder (see below).

What `ef` is

ef is a facade, not a framework. It owns the schemas (Segment, Embedder, Segmenter, Corpus), the indexing core, refresh, and the RAG-plug-in surface — and it stops there: no agent loops, no prompt templating, no answer synthesis. "Bring your own LLM, your own agent framework, your own UI."

It is built on five layers, the same facade covering all of them:

L0 Sources    Corpus = MutableMapping[source_id, Source]   (dol store: fs/S3/API/RAM)
L1 Parse      pluggable text extraction
L2 Segment    Segmenter facade (chunkers)
L3 Embed      Embedder facade (provider / local adapters)
L4 Index      vd.Collection  (ef writes; vd owns the index)
L5 Derive     project / cluster / label   ("explore the corpus")
──────────────────────────────────────────────────────────────
   Search     search(query) -> ranked SearchHits
   RAG plug   retrieve(query) -> list[Segment]  handed to your LLM/agent

Choosing an embedder

ingest and SourceManager take an embedder= — a string, a callable, a URL, or an Embedder. The as_embedder seam normalizes all of them:

from ef import as_embedder, openai_embedder, sentence_transformers_embedder

index = ef.ingest(corpus, embedder=sentence_transformers_embedder("all-MiniLM-L6-v2"))
index = ef.ingest(corpus, embedder=openai_embedder("text-embedding-3-small"))
index = ef.ingest(corpus, embedder=as_embedder("cohere:embed-v4.0"))   # also voyage:/gemini:
index = ef.ingest(corpus, embedder=as_embedder(my_callable, model_id="custom@768"))

Hosted-API adapters: openai_embedder (needs ef[openai]) plus cohere_embedder, voyage_embedder and gemini_embedder — the latter three speak their providers' REST endpoints directly, so they need only an API key, no SDK. Each translates ef's canonical input_type (query / document / classification / clustering) to the vendor's own task name. Local options: sentence_transformers_embedder, http_embedder (any TEI-style service), and the dependency-free HashingEmbedder default.

An Embedder is just a batch callable Iterable[str] -> ndarray(n, dim) with a little metadata. Composition wrappers — CachedEmbedder, RetryingEmbedder, MultiEmbedder, NormalizingEmbedder — each wrap an inner embedder.

The heavy case — `SourceManager`

For large or changing corpora, multiple segmentations/embedders, and explicit control, use SourceManager. Configs that share a pipeline step share its artifacts for free — the indexing core is a content-addressed producer graph, so a second embedder or segmenter re-uses everything upstream of it.

from ef import SourceManager

manager = SourceManager(corpus, store="my_vectors")
manager.ingest(segmenter="recursive", embedder="openai:text-embedding-3-small")
index = manager.searchable()

Keeping an index fresh

As sources change, an index drifts. SourceManager diagnoses and repairs it:

report = manager.diagnose()        # the four staleness conditions
manager.refresh(mode="incremental")  # none | incremental | full | scoped_full
manager = SourceManager(corpus, store="my_vectors", auto_refresh=True)  # live

RAG plug-in & evaluation

ef hands a corpus to your RAG/agent framework and measures retrieval quality — it does not synthesize answers.

segments = index.retrieve("how do neural networks learn?", limit=5)
context = "\n\n".join(s["text"] for s in segments)   # feed context to your LLM

from ef import evaluate_retrieval, evaluate_rag
retrieval = evaluate_retrieval(index.retrieve, qrels, queries)  # BEIR-shaped, NDCG@10
rag = evaluate_rag(samples)                          # deterministic lexical metrics

retrieve() returns plain Segments (provenance preserved in metadata["source"]); search() returns scored SearchHits. with_reranker adds a two-stage reranking pass. as_ragas_dataset bridges to Ragas for LLM-judged metrics.

Exploring a corpus (layer L5)

The secondary "see the shape of the corpus" surface — ef's visualization heritage, the backend an app_ef corpus map consumes. Three functions, each taking a corpus or a vector matrix:

coords = ef.project(index, dims=2)          # PCA -> UMAP, 2-D coordinates
labels = ef.cluster(index, n_clusters=8)    # k-means (or method="hdbscan")
titles = ef.label_clusters(segments, labels)  # LLM-titled clusters (via imbed)

project and cluster import numpy-only; their default paths (PCA, k-means) need no extra. method="umap", method="hdbscan" and label_clusters use the ef[explore] / ef[imbed] extras, imported lazily.

What `ef` is not

No agent loops, no tool-calling, no conversation memory, no prompt templating, no LLM answer synthesis, no bundled UI, no global config singleton. The RAG-plug-in surface is the boundary: ef returns retrieve(query) -> list[Segment]; the application (or srag / raglab / LangGraph) takes it from there.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.18

May 27, 2026

0.1.17

May 22, 2026

0.1.16

May 21, 2026

0.1.15

May 21, 2026

This version

0.1.14

May 21, 2026

0.1.13

May 21, 2026

0.1.12

May 21, 2026

0.1.11

May 20, 2026

0.1.10

May 20, 2026

0.1.9

May 20, 2026

0.1.8

May 20, 2026

0.1.7

May 20, 2026

0.1.6

May 20, 2026

0.1.5

May 20, 2026

0.1.4

May 20, 2026

0.1.3

May 20, 2026

0.1.2

May 19, 2026

0.1.1

Oct 31, 2025

0.0.6

Jun 15, 2025

0.0.5

May 17, 2025

0.0.4

Oct 10, 2022

0.0.3

Oct 4, 2022

0.0.2

Jan 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ef-0.1.14.tar.gz (155.7 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ef-0.1.14-py3-none-any.whl (115.3 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file ef-0.1.14.tar.gz.

File metadata

Download URL: ef-0.1.14.tar.gz
Upload date: May 21, 2026
Size: 155.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ef-0.1.14.tar.gz
Algorithm	Hash digest
SHA256	`dd7fe50c7c332d5a91f33166bb14c1bfb05efb003ad79aa9d6aee68d96275249`
MD5	`38b9db183a3a83764b007ecea7b98c2a`
BLAKE2b-256	`5fd64faf6254b68adef73f0f0cbaf16dbab87b74cbe4745d64de12dab6b75eee`

See more details on using hashes here.

File details

Details for the file ef-0.1.14-py3-none-any.whl.

File metadata

Download URL: ef-0.1.14-py3-none-any.whl
Upload date: May 21, 2026
Size: 115.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ef-0.1.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3531df794b0e27761294b6cf9bf75095ea218962cc5a37b9f1b6f373db2a4529`
MD5	`9737a1f87efd400b682e189f772c7bee`
BLAKE2b-256	`391940aa9c3351720ca38102edfe0324eea92a956239b7c3b953da655549c35b`

See more details on using hashes here.

ef 0.1.14

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ef — Embedding Flow

Installation

What `ef` is

Choosing an embedder

The heavy case — `SourceManager`

Keeping an index fresh

RAG plug-in & evaluation

Exploring a corpus (layer L5)

What `ef` is not

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

ef 0.1.14

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ef — Embedding Flow

Installation

What ef is

Choosing an embedder

The heavy case — SourceManager

Keeping an index fresh

RAG plug-in & evaluation

Exploring a corpus (layer L5)

What ef is not

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

What `ef` is

The heavy case — `SourceManager`

What `ef` is not