Your First Step Into Semantic Search. Experience embeddings hands-on with no cloud accounts required.

These details have not been verified by PyPI

Project links

Project description

JustEmbed

Your First Step Into Semantic Search

Experience embeddings hands-on. No cloud accounts, no setup complexity, no commitment. Just your laptop and your curiosity.

Author: Krishnamoorthy Sankaran
Email: krishnamoorthy.sankaran@sekrad.org
GitHub: https://github.com/sekarkrishna/justembed
PyPI: https://pypi.org/project/justembed/

What is JustEmbed?

JustEmbed is a focused tool for semantic search - understanding meaning, not just matching keywords. It's designed as your entry point into the embedding ecosystem, letting you experience how semantic search works before committing to cloud platforms or production tools.

For Non-Technical Users

Upload your documents through a web interface and search by meaning. No coding required, no technical knowledge needed. See exactly how your text is processed and understand what's happening at each step.

For Developers

A simple Python API (import justembed as je) that lets you experiment with embeddings locally. Build confidence with semantic search concepts before moving to production vector databases.

Quick Start

Installation

pip install justembed

Web Interface

justembed begin --workspace ~/my_documents

Open http://localhost:5424 in your browser.

Python API

import justembed as je

je.begin(workspace="~/docs")
je.create_kb("my_kb")
je.add(kb="my_kb", file="document.txt")
results = je.query("search term", kb="my_kb")

Understanding Semantic Search

Traditional keyword search looks for exact word matches. Semantic search understands meaning.

Example: Imagine a document with these paragraphs:

"Volcanoes erupt with molten lava at temperatures exceeding 1000°C..."
"Industrial smelting uses high-temperature furnaces above 800°C..."
"Igloos are dome-shaped shelters built from compressed snow..."
"Icebergs float in cold ocean waters at sub-zero temperatures..."

Search for "hot":

Traditional search: No results (word "hot" doesn't appear)
Semantic search: Returns paragraphs 1 & 2 (understands heat/temperature relationship)

This is what JustEmbed lets you experience.

Core Concepts

1. Chunking

Documents are broken into smaller pieces (chunks) for efficient searching. JustEmbed's UI shows you exactly how your text will be chunked before processing.

2. Embedding

Each chunk is converted to a list of numbers (an embedding) that represents its meaning. Similar meanings have similar numbers.

3. Searching

When you search, your query is converted to an embedding and compared to all chunk embeddings. Results are ranked by similarity (0.0-1.0 score).

Complete API Reference

Workspace Management

# Start workspace
je.begin(workspace="~/my_docs", port=5424)

# Register existing workspace
je.register_workspace("~/shared_workspace")

# List workspaces
workspaces = je.list_workspaces()

# Deregister (data stays on disk)
je.deregister_workspace("~/old_workspace", confirm=True)

# Stop server
je.terminate()

Knowledge Bases

# Create with default model
je.create_kb("general_kb")

# Create with custom model
je.create_kb("medical_kb", model_type="custom", model_name="medical_v1")

# List all KBs
kbs = je.list_kbs()

# Delete KB
je.delete_kb("old_kb")

Adding Documents

# From file
je.add(kb="my_kb", file="document.txt")

# From text
je.add(kb="my_kb", text="Your content...", filename="custom.txt")

# With chunking options
je.add(
    kb="my_kb",
    file="document.txt",
    max_tokens=300,
    merge_threshold=50,
    split_by_headings=True,
    split_by_paragraphs=True
)

Searching

# Basic search
results = je.query("search term", kb="my_kb")

# Search all KBs
results = je.query("search term", kb="all")

# Advanced options
results = je.query(
    query="search term",
    kb="my_kb",
    top_k=10,
    mode="retrieve"  # or "count"
)

# Results structure
for result in results:
    print(f"Score: {result['score']:.3f}")
    print(f"Text: {result['text']}")
    print(f"File: {result['file']}")
    print(f"KB: {result['kb']}")

Custom Model Training

# Train from file
je.train_model(
    model_name="medical_v1",
    file="medical_textbook.txt",
    embedding_dim=128,
    max_features=5000
)

# Train from text
je.train_model(
    model_name="legal_v1",
    text="Your training corpus...",
    embedding_dim=128
)

# List models
models = je.list_models()

Key Features

Domain-Specific Models

Train models that understand your domain's vocabulary:

# Medical domain
medical_text = """
Pyrexia, commonly known as fever, is elevated body temperature.
Renal function refers to kidney performance.
A UTI affects the bladder and kidneys.
"""

je.train_model("medical_v1", text=medical_text)
je.create_kb("medical_kb", model_type="custom", model_name="medical_v1")

# Now "fever" finds "pyrexia", "kidney" finds "renal"

Multiple Knowledge Bases

Organize by topic, each with its own model:

je.create_kb("medical_kb", model_type="custom", model_name="medical_v1")
je.create_kb("legal_kb", model_type="custom", model_name="legal_v1")
je.create_kb("general_kb")  # Uses default E5-Small model

Workspace Sharing

Share by zipping the workspace folder:

# Create and populate
je.begin(workspace="~/shared_kb")
je.create_kb("team_kb")
je.add(kb="team_kb", file="docs.txt")

# Zip ~/shared_kb and share

# Recipient registers and uses
je.register_workspace("~/received_kb")
je.begin(workspace="~/received_kb")
results = je.query("search", kb="team_kb")

Architecture

User Interface (Web UI / Python API)
           ↓
    FastAPI Server
           ↓
Embedder Layer (E5-Small / Custom Models)
           ↓
Storage Layer (DuckDB / File System)

Design Decisions

Offline-First: Everything runs locally. No API keys, no cloud dependencies, no internet after installation.

ONNX Models: Portable, CPU-friendly, small size (~8-15 MB). Works on any platform.

DuckDB Storage: Embedded database, no separate server. Fast columnar storage.

Deterministic Chunking: Rule-based, predictable. Same input always produces same chunks.

Privacy: Your data never leaves your machine. No telemetry, no tracking.

Understanding Limitations

Context Matters

Example: "The igloo was decorated with fireworks for the winter celebration."

Searching for "hot" might return this (score: 0.5-0.6) because "fireworks" associates with heat.

What this reveals: Embeddings capture word associations, not deep understanding. Production systems use larger context windows, attention mechanisms, and re-ranking.

Domain Specificity

Without domain training, "fever" in medical vs financial contexts scores similarly. Custom models learn domain-specific meanings.

What this shows: Why domain-specific training matters. Production systems use massive pre-training and fine-tuning.

No Generation

JustEmbed finds similar text. It doesn't generate new text, answer questions, or summarize.

What this demonstrates: Embeddings are one component. Full LLMs combine embeddings, generation, reasoning, memory, and tools.

Scale

Designed for 1-1000 documents, 1-10 queries/second, single user.

What this illustrates: Production systems handle millions of documents, thousands of queries/second, concurrent users.

The Complete Picture

JustEmbed focuses on the embedding layer - the foundation of semantic search. This represents approximately 2-3% of what full LLM systems provide.

What JustEmbed Covers

Text chunking
Embedding generation
Vector similarity search
Basic model training

What Production Systems Add

Massive pre-training (billions of parameters)
Text generation
Reasoning and inference
Long context windows (100K+ tokens)
Memory and conversation history
Safety and alignment
Optimization (quantization, distillation)
Distributed infrastructure
Tool integration
Multimodal understanding

After experiencing JustEmbed, you'll appreciate the engineering behind systems like GPT-4, Claude, or Gemini.

JustEmbed vs Production Tools

Feature	JustEmbed	Vector DBs	Full LLMs
Purpose	Learn embeddings	Production search	Complete AI
Setup	`pip install`	Cloud account	API keys
Cost	Free	$70-500/mo	$0.002-0.06/1K tokens
Scale	1-1K docs	Millions	Unlimited
Speed	<100ms	<10ms	<1s
Offline	✅ Yes	❌ No	❌ No
Privacy	✅ Local	⚠️ Cloud	⚠️ Cloud
Learning	Gentle	Moderate	Steep
Generation	❌ No	❌ No	✅ Yes

When to Use What

Use JustEmbed:

Learning about embeddings
Small collections (10-1000 docs)
Privacy-critical applications
Offline environments
Quick prototypes
Building confidence

Graduate to Vector DBs:

Scaling beyond 1000 docs
Production reliability
Sub-10ms latency
Team collaboration
Advanced features

Move to Full LLMs:

Need text generation
Require reasoning
Conversational AI
Multi-modal applications

Requirements

Python 3.8+
500 MB disk space
1 GB RAM
CPU (no GPU required)
No internet (after installation)

Guarantees

Technical:

Deterministic (same input → same output)
No hallucinations (only returns your text)
Offline (works without internet)
Private (data never leaves your machine)
No tracking or telemetry

File System:

Writes only to workspace and ~/.cache/justembed/
Reads only files you upload
Never deletes files outside workspace

License

MIT License

Author

Krishnamoorthy Sankaran

Email: krishnamoorthy.sankaran@sekrad.org
GitHub: https://github.com/sekarkrishna/justembed
PyPI: https://pypi.org/project/justembed/

Support

Issues: https://github.com/sekarkrishna/justembed/issues
Discussions: https://github.com/sekarkrishna/justembed/discussions
Email: krishnamoorthy.sankaran@sekrad.org

Citation

@software{justembed2026,
  title = {JustEmbed: Your First Step Into Semantic Search},
  author = {Sankaran, Krishnamoorthy},
  year = {2026},
  url = {https://github.com/sekarkrishna/justembed}
}

Acknowledgments

E5-Small model: Microsoft Research
ONNX Runtime: Microsoft
FastAPI: Sebastián Ramírez
DuckDB: DuckDB Labs
scikit-learn: scikit-learn developers

JustEmbed - Start here. Build confidence. Graduate to production tools when ready.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1a9 pre-release

Feb 23, 2026

0.1.1a8 pre-release

Feb 23, 2026

0.1.1a7 pre-release

Feb 16, 2026

0.1.1a6 pre-release

Feb 16, 2026

This version

0.1.1a5 pre-release

Feb 15, 2026

0.1.1a4 pre-release

Feb 15, 2026

0.1.1a3 pre-release

Feb 15, 2026

0.1.1a2 pre-release

Feb 15, 2026

0.1.1a1 pre-release

Feb 14, 2026

0.1.0a6 pre-release

Jan 30, 2026

0.1.0a5 pre-release

Jan 29, 2026

0.1.0a3 pre-release

Jan 28, 2026

0.1.0a2 pre-release

Jan 28, 2026

0.1.0a1 pre-release

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

justembed-0.1.1a5.tar.gz (22.3 MB view details)

Uploaded Feb 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

justembed-0.1.1a5-py3-none-any.whl (22.3 MB view details)

Uploaded Feb 15, 2026 Python 3

File details

Details for the file justembed-0.1.1a5.tar.gz.

File metadata

Download URL: justembed-0.1.1a5.tar.gz
Upload date: Feb 15, 2026
Size: 22.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for justembed-0.1.1a5.tar.gz
Algorithm	Hash digest
SHA256	`7ca5ce72deaa2facf743771ba0d61314b5e4078f323538901053cbe59c3ebfc0`
MD5	`7ddeab858965975825ddb8c1ce211b92`
BLAKE2b-256	`0e107ce04038a724184ee8956126f5e21603aa599f7ff6a33af040cfe68e820c`

See more details on using hashes here.

File details

Details for the file justembed-0.1.1a5-py3-none-any.whl.

File metadata

Download URL: justembed-0.1.1a5-py3-none-any.whl
Upload date: Feb 15, 2026
Size: 22.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for justembed-0.1.1a5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0bed84978f2d9684bd83979840a821d99a3dc99a56f17ad5650c398c5dbe0d8f`
MD5	`840f0087894d3dcc125d3d277fc6145c`
BLAKE2b-256	`82009484e5e6fe842fe121b8a4687043a8fcf41f4e4cbee3fe6e1fb32e9f3f5e`

See more details on using hashes here.

justembed 0.1.1a5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

JustEmbed

What is JustEmbed?

For Non-Technical Users

For Developers

Quick Start

Installation

Web Interface

Python API

Understanding Semantic Search

Core Concepts

1. Chunking

2. Embedding

3. Searching

Complete API Reference

Workspace Management

Knowledge Bases

Adding Documents

Searching

Custom Model Training

Key Features

Domain-Specific Models

Multiple Knowledge Bases

Workspace Sharing

Architecture

Design Decisions

Understanding Limitations

Context Matters

Domain Specificity

No Generation

Scale

The Complete Picture

What JustEmbed Covers

What Production Systems Add

JustEmbed vs Production Tools

When to Use What

Requirements

Guarantees

License

Author

Support

Citation

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes