Lightweight semantic search engine for text files. Zero dependencies, JSONL storage.
Project description
Cicada Vector 🦗
A lightweight semantic search engine for text files.
Cicada Vector is a simple, zero-dependency semantic search engine and RAG database. Search any text content semantically - code, documentation, commits, or custom data.
Why this exists?
The original Cicada is powerful because it deeply understands code structure (SCIP, ASTs). However, that power often requires heavy dependencies and longer setup times.
Cicada Vector takes a complementary path: It focuses on Semantic Awareness for any text content. By combining local LLM embeddings (via Ollama) with a hybrid database, it provides robust search capabilities with a minimal footprint and maximum flexibility.
Features
- Lightweight: Minimal Python codebase. Zero dependencies (Standard Library only) for the core engine.
- Instant Install: No waiting for heavy ML libraries to compile.
- Semantic Intelligence: Understands intent. Searching for "auth" finds login logic, even if the word "auth" isn't present.
- Hybrid Search: Combines Vector semantic search with Keyword exact matching. Won't miss exact terms while understanding meaning.
- Simple RAG: A "Search Broad -> Scan Specific" pipeline that pinpoints relevant content snippets.
- Universal: Works on code, docs, commits, configs - any text content.
- MCP Ready: Built-in Model Context Protocol server for immediate use with AI assistants.
Database Classes
Cicada Vector provides four database classes:
- VectorDB: Pure semantic vector search
- KeywordDB: Traditional keyword-based search
- HybridDB: Combines vector + keyword search (Recommended)
- RagDB: RAG database for file-based search with line numbers
Tools
1. cigrep (Semantic File Search)
Zero-config semantic search for any text files - code, docs, configs, anything.
cigrep "how do I handle authentication" # Search code
cigrep "installation steps" docs/ # Search docs
cigrep "database config" . # Search everything
Automatically indexes changed files in the background and searches instantly.
2. cilog (Semantic Git Commit Search)
Search your git commit history semantically.
cilog "authentication bug fix"
cilog "refactor API" --limit 500
cilog "performance improvements" --since "1 month ago"
Indexes commit messages for fast semantic search. Use --no-diff (recommended) for faster indexing.
MCP Server
AI assistants can use your local knowledge base directly:
pip install 'cicada-vector[server]'
export CICADA_HYBRID_DIR=./my_db
cicada-vec-server
Tools:
search_vectors: Pure semantic search.search_hybrid: Vector + Keyword search (Recommended).search_code_context: RAG search returning file snippets with line numbers.index_directory: Incrementally index a local directory into the database.
Configuration:
If using uv or uvx, ensure you include the [server] extra:
uv tool install "cicada-vector[server]"
For manual configuration (e.g., in Claude Desktop or Gemini), set the command to:
uvx --from "cicada-vector[server]" cicada-vec-server
And set the environment variable CICADA_HYBRID_DIR to your database path.
Quick Start
1. Generate Embeddings
Cicada Vector requires you to provide embeddings (vectors). Use Ollama to generate them:
import json
import urllib.request
def get_embedding(text, model="nomic-embed-text"):
"""Get embedding from Ollama API"""
url = "http://localhost:11434/api/embeddings"
data = json.dumps({"model": model, "prompt": text}).encode('utf-8')
req = urllib.request.Request(
url,
data=data,
headers={'Content-Type': 'application/json'}
)
with urllib.request.urlopen(req, timeout=30) as response:
result = json.loads(response.read().decode('utf-8'))
return result['embedding']
2. Create Database and Add Data
from cicada_vector import HybridDB
# Initialize HybridDB (combines vector + keyword search)
db = HybridDB("./my_knowledge_base")
# Add data with embeddings
auth_text = "def login(username, password):\n ..."
auth_vector = get_embedding(auth_text)
db.add(id="auth.py", vector=auth_vector, text=auth_text, meta={"path": "src/auth.py"})
user_text = "class User:\n ..."
user_vector = get_embedding(user_text)
db.add(id="user.py", vector=user_vector, text=user_text, meta={"path": "src/user.py"})
# Persist to disk (optional - data is written on add, but this rewrites the file)
db.persist()
3. Search
# Generate embedding for query
query = "how to authenticate users"
query_vector = get_embedding(query)
# Hybrid search (recommended - combines vector + keyword)
results = db.search(query_text=query, query_vector=query_vector, k=5)
for doc_id, score, meta in results:
print(f"[{score:.4f}] {doc_id}: {meta.get('path')}")
Indexing Custom Data
Cicada Vector isn't just for code files - index any text data:
from cicada_vector import HybridDB
import subprocess
# Example: Index git commits
db = HybridDB("./git_commits_db")
result = subprocess.run(
["git", "log", "--format=%H|%an|%s|%b", "-10"],
capture_output=True, text=True
)
for line in result.stdout.strip().split('\n'):
sha, author, subject, body = line.split('|', 3)
commit_text = f"{subject}\n{body}"
commit_vector = get_embedding(commit_text)
db.add(
id=sha,
vector=commit_vector,
text=commit_text,
meta={"author": author, "subject": subject, "type": "commit"}
)
# Search commits
query = "authentication bug fix"
query_vector = get_embedding(query)
results = db.search(query_text=query, query_vector=query_vector, k=5)
for sha, score, meta in results:
print(f"[{score:.4f}] {sha[:8]} - {meta['subject']}")
Other Database Classes
VectorDB (Pure Semantic Search)
from cicada_vector import VectorDB
db = VectorDB("./my_vectors.jsonl")
# Add vectors (no text storage)
db.add(id="doc1", vector=get_embedding("some text"), meta={"path": "doc1.txt"})
# Search (reuses get_embedding() helper from Quick Start)
query_vector = get_embedding("search query")
results = db.search(query=query_vector, k=5)
KeywordDB (Traditional Search)
from cicada_vector import KeywordDB
db = KeywordDB("./my_keywords.jsonl")
# Add documents
db.add(id="doc1", text="some text to index")
# Search (OR search - matches any word)
results = db.search(query="search terms")
RagDB (File-based RAG)
from cicada_vector import RagDB
db = RagDB("./my_rag_db")
# Add files
file_content = open("src/auth.py").read()
# Reuse get_embedding() helper from Quick Start
file_vector = get_embedding(file_content)
db.add_file(file_path="src/auth.py", content=file_content, vector=file_vector)
# Search (returns file + line numbers)
query_vector = get_embedding("authentication")
results = db.search(query="authentication", k=3, query_vector=query_vector)
Use cases:
- Code files (semantic search across your codebase)
- Git commits and history
- GitHub PRs and issues
- Documentation sites
- Configuration files
- Support tickets
- Any text corpus
The Stack
- Brains: Ollama (Recommended:
nomic-embed-text) - Storage: JSONL (Human-readable, append-only)
- Engine: Pure Python (with optional Numpy acceleration)
Part of the Cicada suite. Simple, effective semantic search for text.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cicada_vector-0.1.5.tar.gz.
File metadata
- Download URL: cicada_vector-0.1.5.tar.gz
- Upload date:
- Size: 93.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b98eb98171d8ce4176ddcfbb92b3afe1c3ec094e6407fd59d714e8a493bbbc0
|
|
| MD5 |
5ff09b50f9ce273b722ff350d28b0b07
|
|
| BLAKE2b-256 |
945b1057640b737acf32417aa6782600e74893e6fda139babfa9b6a43a824741
|
File details
Details for the file cicada_vector-0.1.5-py3-none-any.whl.
File metadata
- Download URL: cicada_vector-0.1.5-py3-none-any.whl
- Upload date:
- Size: 29.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7a6b8aa3c83bca265dc00c66627a0c4be6b312a3c5f240f5e0132506a733bbd
|
|
| MD5 |
11b6e5e7b45271c3fd103592fa2eb74b
|
|
| BLAKE2b-256 |
1e3807b2bc02362a6c0f75ee5b9487beea26e242f6c3abf156e771331469e459
|