A semantic engine that just works - offline-first semantic search for everyday laptops

These details have not been verified by PyPI

Project links

Project description

JustEmbed

A semantic engine that just works.

Offline-first semantic search for everyday laptops.

⚠️ Alpha Release

This is v0.1.0a3 - Logging Improvements!

Core functionality complete with transparent logging and separate timing. Full release v0.1.0 coming soon!

What is JustEmbed?

JustEmbed is an offline-first semantic search library designed for everyday laptops. No cloud. No API keys. No telemetry. Just embed your documents and search.

Philosophy

One model only: e5-small (English, fast and efficient)
Offline-first: Zero network dependencies
Just works: No configuration, no choices, no surprises
Hardware-aware: Automatic limits based on your laptop (soft: 5s, hard: 30s for work only)
Privacy-first: Everything stays on your machine

Quick Start

import justembed as je

# Load documents from a folder
result = je.load("./documents")
print(f"Found {result['files_total']} files")

# Generate embeddings (first time only)
if not result['indexed']:
    stats = je.embed()
    print(f"Embedded {stats['files_embedded']} files in {stats['time_taken']:.2f}s")

# Search semantically
results = je.search("fruits that are red in color")
for r in results:
    print(f"Score: {r['score']:.3f} | {r['file']}")
    print(f"  {r['text'][:100]}...")

# Check status
status = je.status()
print(f"Loaded: {status['loaded']}")
print(f"Chunks: {status['chunks_used']}/{status['chunks_limit']}")

# Clear query cache
je.clear_cache()

# Unload when done
je.unload()

Core Features

✅ Single model (e5-small.onnx - English)
✅ Offline-first (zero network dependencies)
✅ Python 3.8+ support
✅ Polars-based storage (Parquet files)
✅ Hardware-aware limits (5s soft, 30s hard - work only, excludes model loading)
✅ Query caching for fast repeated searches
✅ Simple API (5 functions + 1 utility)
✅ Comprehensive error handling

Installation

pip install justembed

Current version: v0.1.0a3 - Logging improvements with transparent timing!

API Reference

Main Functions

`load(path: str) -> dict`

Load documents from a folder or file.

result = je.load("./documents")
# Returns: {"status": "loaded"|"not_indexed", "files_total": int, "indexed": bool}

`embed() -> dict`

Generate embeddings for loaded documents.

stats = je.embed()
# Returns: {"files_embedded": int, "chunks_created": int, 
#           "time_taken": float, "model_load_time": float, "total_time": float}

`search(query: str, top_k: int = 5) -> list`

Search indexed documents semantically.

results = je.search("red fruits", top_k=10)
# Returns: [{"score": float, "file": str, "text": str}, ...]

`status() -> dict`

Get current index status.

status = je.status()
# Returns: {"loaded": bool, "path": str, "files_indexed": int, 
#           "chunks_used": int, "chunks_limit": int, "query_cache_size": int}

`unload() -> None`

Unload current index and clear memory.

je.unload()

Utility Functions

`clear_cache() -> None`

Clear query cache to free disk space.

je.clear_cache()

`set_verbose(verbose: bool) -> None`

Enable or disable verbose logging.

je.set_verbose(False)  # Disable logging
je.set_verbose(True)   # Re-enable logging

Exception Classes

JustEmbedError - Base exception
NotLoadedError - No folder loaded
InvalidInputError - Invalid path or input
ChunkLimitError - Too many chunks for system
TimeoutError - Operation exceeded time limit

Requirements

Python 3.8+
~100MB disk space (model + dependencies)
4GB+ RAM recommended

Dependencies

onnxruntime - ONNX inference
tokenizers - Tokenization (standalone, not transformers!)
numpy - Array operations
polars - DataFrame operations
pyarrow - Parquet I/O
psutil - Hardware detection

No pandas. No transformers. No network dependencies.

Roadmap

v0.1.0a1 (December 2025) - Name Reservation

✅ Package name locked on PyPI
✅ Basic structure
✅ Placeholder functions

v0.1.0a2 (January 2026) - Working Implementation

✅ Full implementation complete
✅ All core functions working
✅ Property-based tests
✅ Hardware-aware limits
✅ Query caching
✅ Comprehensive error handling

v0.1.0a3 (January 2026) - Logging Improvements

✅ Transparent logging system
✅ Separate model loading time from work time
✅ Timeout limits exclude model loading
✅ New API: set_verbose(True/False)
✅ Enhanced return values with timing details
✅ Better UX for Jupyter users

v0.1.0 (February 2026) - First Stable Release

⏳ Production testing
⏳ Performance optimization
⏳ Complete documentation
⏳ Example projects

v0.2.0 (Future)

⏳ Multilingual model support (100+ languages)
⏳ Advanced search filters
⏳ Batch operations API
⏳ Progress callbacks

Why "JustEmbed"?

Because that's all you need to do:

Just embed your documents
Just search with natural language
Just works - no configuration needed

Design Decisions

One Model Only

We use e5-small.onnx (384 dimensions, English). Fast, efficient, and fits PyPI's 100MB limit. Multilingual support coming in v0.2.0.

Offline-First

Zero network dependencies. Everything runs locally. No telemetry. No surprises.

Hardware-Aware

Automatic limits based on your laptop's capabilities. Soft limit: 5s. Hard limit: 30s. These limits apply only to actual work (embedding/search), not model loading.

Polars, Not Pandas

We use Polars for speed and efficiency. No pandas dependency.

Tokenizers, Not Transformers

We use the standalone tokenizers library (3MB) instead of transformers (40MB). 93% smaller!

Target Users

Non-ML engineers learning AI for the first time
Business users in paranoid/restricted environments
Developers who need offline semantic search
Anyone who wants a safe sandbox to experiment

License

MIT License - see LICENSE file for details.

Author

Krishnamoorthy Sankaran

Status

✅ Core Functionality Complete! ✅

v0.1.0a3 includes:

✅ Document loading and scanning
✅ Embedding generation with ONNX
✅ Semantic search with cosine similarity
✅ Query caching for performance
✅ Status monitoring and management
✅ Hardware-aware resource limits
✅ Comprehensive error handling
✅ Property-based testing
✅ Transparent logging with timing details
✅ Separate model loading time tracking
✅ Verbose mode control

Ready for testing and feedback! Full v0.1.0 release coming soon.

JustEmbed - A semantic engine that just works.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1a9 pre-release

Feb 23, 2026

0.1.1a8 pre-release

Feb 23, 2026

0.1.1a7 pre-release

Feb 16, 2026

0.1.1a6 pre-release

Feb 16, 2026

0.1.1a5 pre-release

Feb 15, 2026

0.1.1a4 pre-release

Feb 15, 2026

0.1.1a3 pre-release

Feb 15, 2026

0.1.1a2 pre-release

Feb 15, 2026

0.1.1a1 pre-release

Feb 14, 2026

0.1.0a6 pre-release

Jan 30, 2026

0.1.0a5 pre-release

Jan 29, 2026

This version

0.1.0a3 pre-release

Jan 28, 2026

0.1.0a2 pre-release

Jan 28, 2026

0.1.0a1 pre-release

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

justembed-0.1.0a3.tar.gz (78.7 MB view details)

Uploaded Jan 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

justembed-0.1.0a3-py3-none-any.whl (78.7 MB view details)

Uploaded Jan 28, 2026 Python 3

File details

Details for the file justembed-0.1.0a3.tar.gz.

File metadata

Download URL: justembed-0.1.0a3.tar.gz
Upload date: Jan 28, 2026
Size: 78.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for justembed-0.1.0a3.tar.gz
Algorithm	Hash digest
SHA256	`b91fd74ef9126727d99e4a01bf359b2e6339506520ab58ddf13c1480745c7aa2`
MD5	`91f9c41583ff683fe69f50c3e9da05af`
BLAKE2b-256	`08e12f5e9d409facea645a25136bede3e335637db2d1b378fb9080a2d5b1e129`

See more details on using hashes here.

File details

Details for the file justembed-0.1.0a3-py3-none-any.whl.

File metadata

Download URL: justembed-0.1.0a3-py3-none-any.whl
Upload date: Jan 28, 2026
Size: 78.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for justembed-0.1.0a3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c7d1aae213329efbffc78912e76090a4032a11a6c59f07b092f1b118e51dd8bf`
MD5	`4501a028c5ec1da04c0e760177fe1076`
BLAKE2b-256	`57b199288800d5af37d640d09e21ba5f574f55c2b6d277f90ae1cb2d0feff5ec`

See more details on using hashes here.

justembed 0.1.0a3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

JustEmbed

⚠️ Alpha Release

What is JustEmbed?

Philosophy

Quick Start

Core Features

Installation

API Reference

Main Functions

load(path: str) -> dict

embed() -> dict

search(query: str, top_k: int = 5) -> list

status() -> dict

unload() -> None

Utility Functions

clear_cache() -> None

set_verbose(verbose: bool) -> None

Exception Classes

Requirements

Dependencies

Roadmap

v0.1.0a1 (December 2025) - Name Reservation

v0.1.0a2 (January 2026) - Working Implementation

v0.1.0a3 (January 2026) - Logging Improvements

v0.1.0 (February 2026) - First Stable Release

v0.2.0 (Future)

Why "JustEmbed"?

Design Decisions

One Model Only

Offline-First

Hardware-Aware

Polars, Not Pandas

Tokenizers, Not Transformers

Target Users

License

Author

Links

Status

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`load(path: str) -> dict`

`embed() -> dict`

`search(query: str, top_k: int = 5) -> list`

`status() -> dict`

`unload() -> None`

`clear_cache() -> None`

`set_verbose(verbose: bool) -> None`