Embs is a Python toolkit for retrieving documents (via Docsifer), generating embeddings (via Lightweight Embeddings API), and ranking texts with an optional caching system.
Project description
embs
embs is a Python toolkit that combines:
- Document retrieval via Docsifer
- Embeddings generation with Lightweight Embeddings API
- Text ranking (reranking) based on query relevance
- Optional caching (in-memory LRU or disk) for performance and scalability
It provides both asynchronous (asyncio) and synchronous methods. If you need to ingest documents (files, URLs), convert them to text/markdown, embed them, and then sort by relevance, embs simplifies these tasks in a single package.
Note: This library references external services for Docsifer (for document conversion) and Lightweight Embeddings (for text and image embeddings). Ensure you have valid endpoints or deploy your own versions.
Why Use embs?
- Unified Pipeline: Manage document retrieval and text conversion from PDF, URLs, images, etc., then generate embeddings and rank them—all with a single API.
- Async + Sync: Choose the style that fits your application. The library uses
aiohttpinternally but also offers synchronous wrappers viaasyncio.run(). - Caching: Supports in-memory LRU or disk-based caching with optional time-to-live (TTL) eviction to avoid repeated network calls and save resources.
- Lightweight: No heavy dependencies besides
aiohttpfor async requests. Minimal overhead.
Installation
Install embs via pip:
pip install embs
Or add it to your pyproject.toml dependencies (if using Poetry):
[tool.poetry.dependencies]
embs = "^0.1.0"
Quick Start
1. Basic Document Retrieval
import asyncio
from embs import Embs
async def main():
client = Embs()
documents = await client.retrieve_documents_async(
files=["/path/to/local/file.pdf"],
urls=["https://example.com"]
)
print(documents)
# => [{"filename": "file.pdf", "markdown": "...converted text..."}, {"filename": "example.com", "markdown": "..."}]
asyncio.run(main())
2. Generate Embeddings
import asyncio
from embs import Embs
async def main():
client = Embs()
embedding_result = await client.embed_async("Hello world")
print(embedding_result)
# => {"object": "list", "data": [{"object": "embedding", "index": 0, "embedding": [...] }], "model": "...", "usage": {...}}
asyncio.run(main())
3. Rank Documents
import asyncio
from embs import Embs
async def main():
client = Embs()
ranked = await client.rank_async("What is AI?", ["AI is about learning", "AI stands for artificial intelligence"])
print(ranked)
# => [{"text": "...", "probability": 0.9, "cosine_similarity": 0.85}, ...]
asyncio.run(main())
4. Integrated Workflow: search_documents_async
import asyncio
from embs import Embs
async def main():
client = Embs()
results = await client.search_documents_async(
query="Explain quantum computing",
files=["/path/to/local/quantum.pdf"],
urls=["https://example.com/quantum.html"]
)
for item in results:
print(item["filename"], item["probability"], item["markdown"][:100]) # partial content
asyncio.run(main())
Using the Cache
Enable caching by specifying a cache_config:
from embs import Embs
cache_conf = {
"enabled": True,
"type": "memory", # "memory" or "disk"
"prefix": "myapp", # optional prefix for cache keys
"dir": "cache_folder", # only needed if type="disk"
"max_mem_items": 128, # max items for LRU in memory
"max_ttl_seconds": 86400 # 1-day TTL
}
client = Embs(cache_config=cache_conf)
- Memory Caching: Uses an LRU approach; older items are removed once it exceeds
max_mem_items. - Disk Caching: Stores
.jsonfiles with a timestamp. Items older thanmax_ttl_secondsare discarded upon next read.
Testing
embs includes test suites that rely on pytest and pytest-asyncio to verify:
- Document retrieval with Docsifer
- Embeddings calls with Lightweight Embeddings
- Ranking results
- Caching behaviors (in-memory and on disk)
Run tests with:
pytest --asyncio-mode=auto
License
This project is licensed under the MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embs-0.1.1.tar.gz.
File metadata
- Download URL: embs-0.1.1.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebca05d0c86303990d42d4a6b0a088a6e99d00a56e294c1ec99db9de74fc51ef
|
|
| MD5 |
b5792624552f8b38828b7546748b664f
|
|
| BLAKE2b-256 |
aa9dd01693a5f18e7bc760d399709dc133439b6bf688050ca40fae80101a18d8
|
Provenance
The following attestation bundles were made for embs-0.1.1.tar.gz:
Publisher:
release.yml on lh0x00/embs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
embs-0.1.1.tar.gz -
Subject digest:
ebca05d0c86303990d42d4a6b0a088a6e99d00a56e294c1ec99db9de74fc51ef - Sigstore transparency entry: 165799591
- Sigstore integration time:
-
Permalink:
lh0x00/embs@8a7cd4aa0c48b4e7cad1bae69fa28c4c88d63a61 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/lh0x00
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8a7cd4aa0c48b4e7cad1bae69fa28c4c88d63a61 -
Trigger Event:
push
-
Statement type:
File details
Details for the file embs-0.1.1-py3-none-any.whl.
File metadata
- Download URL: embs-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1448b2d8eb0f9717e25dfac4df288906ced37ec76ad90d40a53d7948485a9543
|
|
| MD5 |
b77aff682920a5ef04f5f74e554bb315
|
|
| BLAKE2b-256 |
13646cfdbe8a725a7cb0e4f182ccce32bc4e329114ab703ec396d6f043ad0552
|
Provenance
The following attestation bundles were made for embs-0.1.1-py3-none-any.whl:
Publisher:
release.yml on lh0x00/embs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
embs-0.1.1-py3-none-any.whl -
Subject digest:
1448b2d8eb0f9717e25dfac4df288906ced37ec76ad90d40a53d7948485a9543 - Sigstore transparency entry: 165799593
- Sigstore integration time:
-
Permalink:
lh0x00/embs@8a7cd4aa0c48b4e7cad1bae69fa28c4c88d63a61 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/lh0x00
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8a7cd4aa0c48b4e7cad1bae69fa28c4c88d63a61 -
Trigger Event:
push
-
Statement type: