Lightweight semantic search engine for text files. Zero dependencies, JSONL storage.

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3

Project description

Cicada Vector 🦗

A lightweight semantic search engine for text files.

Cicada Vector is a simple, zero-dependency semantic search engine and RAG database. Search any text content semantically - code, documentation, commits, or custom data.

Why this exists?

The original Cicada is powerful because it deeply understands code structure (SCIP, ASTs). However, that power often requires heavy dependencies and longer setup times.

Cicada Vector takes a complementary path: It focuses on Semantic Awareness for any text content. By combining local LLM embeddings (via Ollama) with a hybrid database, it provides robust search capabilities with a minimal footprint and maximum flexibility.

Features

Lightweight: Minimal Python codebase. Zero dependencies (Standard Library only) for the core engine.
Instant Install: No waiting for heavy ML libraries to compile.
Semantic Intelligence: Understands intent. Searching for "auth" finds login logic, even if the word "auth" isn't present.
Hybrid Search: Combines Vector semantic search with Keyword exact matching. Won't miss exact terms while understanding meaning.
Simple RAG: A "Search Broad -> Scan Specific" pipeline that pinpoints relevant content snippets.
Universal: Works on code, docs, commits, configs - any text content.
MCP Ready: Built-in Model Context Protocol server for immediate use with AI assistants.

Database Classes

Cicada Vector provides four database classes:

VectorDB: Pure semantic vector search
KeywordDB: Traditional keyword-based search
HybridDB: Combines vector + keyword search (Recommended)
RagDB: RAG database for file-based search with line numbers

Tools

1. `cigrep` (Semantic File Search)

Zero-config semantic search for any text files - code, docs, configs, anything.

cigrep "how do I handle authentication"     # Search code
cigrep "installation steps" docs/           # Search docs
cigrep "database config" .                  # Search everything

Automatically indexes changed files in the background and searches instantly.

2. `cilog` (Semantic Git Commit Search)

Search your git commit history semantically.

cilog "authentication bug fix"
cilog "refactor API" --limit 500
cilog "performance improvements" --since "1 month ago"

Indexes commit messages for fast semantic search. Use --no-diff (recommended) for faster indexing.

MCP Server

AI assistants can use your local knowledge base directly:

pip install 'cicada-vector[server]'
export CICADA_HYBRID_DIR=./my_db
cicada-vec-server

Tools:

search_vectors: Pure semantic search.
search_hybrid: Vector + Keyword search (Recommended).
search_code_context: RAG search returning file snippets with line numbers.
index_directory: Incrementally index a local directory into the database.

Configuration: If using uv or uvx, ensure you include the [server] extra:

uv tool install "cicada-vector[server]"

For manual configuration (e.g., in Claude Desktop or Gemini), set the command to: uvx --from "cicada-vector[server]" cicada-vec-server And set the environment variable CICADA_HYBRID_DIR to your database path.

Quick Start

1. Generate Embeddings

Cicada Vector requires you to provide embeddings (vectors). Use Ollama to generate them:

import json
import urllib.request

def get_embedding(text, model="nomic-embed-text"):
    """Get embedding from Ollama API"""
    url = "http://localhost:11434/api/embeddings"
    data = json.dumps({"model": model, "prompt": text}).encode('utf-8')
    
    req = urllib.request.Request(
        url,
        data=data,
        headers={'Content-Type': 'application/json'}
    )
    
    with urllib.request.urlopen(req, timeout=30) as response:
        result = json.loads(response.read().decode('utf-8'))
        return result['embedding']

2. Create Database and Add Data

from cicada_vector import HybridDB

# Initialize HybridDB (combines vector + keyword search)
db = HybridDB("./my_knowledge_base")

# Add data with embeddings
auth_text = "def login(username, password):\n    ..."
auth_vector = get_embedding(auth_text)
db.add(id="auth.py", vector=auth_vector, text=auth_text, meta={"path": "src/auth.py"})

user_text = "class User:\n    ..."
user_vector = get_embedding(user_text)
db.add(id="user.py", vector=user_vector, text=user_text, meta={"path": "src/user.py"})

# Persist to disk (optional - data is written on add, but this rewrites the file)
db.persist()

3. Search

# Generate embedding for query
query = "how to authenticate users"
query_vector = get_embedding(query)

# Hybrid search (recommended - combines vector + keyword)
results = db.search(query_text=query, query_vector=query_vector, k=5)
for doc_id, score, meta in results:
    print(f"[{score:.4f}] {doc_id}: {meta.get('path')}")

Indexing Custom Data

Cicada Vector isn't just for code files - index any text data:

from cicada_vector import HybridDB
import subprocess

# Example: Index git commits
db = HybridDB("./git_commits_db")

result = subprocess.run(
    ["git", "log", "--format=%H|%an|%s|%b", "-10"],
    capture_output=True, text=True
)

for line in result.stdout.strip().split('\n'):
    sha, author, subject, body = line.split('|', 3)
    commit_text = f"{subject}\n{body}"
    commit_vector = get_embedding(commit_text)

    db.add(
        id=sha,
        vector=commit_vector,
        text=commit_text,
        meta={"author": author, "subject": subject, "type": "commit"}
    )

# Search commits
query = "authentication bug fix"
query_vector = get_embedding(query)
results = db.search(query_text=query, query_vector=query_vector, k=5)
for sha, score, meta in results:
    print(f"[{score:.4f}] {sha[:8]} - {meta['subject']}")

Other Database Classes

VectorDB (Pure Semantic Search)

from cicada_vector import VectorDB

db = VectorDB("./my_vectors.jsonl")

# Add vectors (no text storage)
db.add(id="doc1", vector=get_embedding("some text"), meta={"path": "doc1.txt"})

# Search (reuses get_embedding() helper from Quick Start)
query_vector = get_embedding("search query")
results = db.search(query=query_vector, k=5)

KeywordDB (Traditional Search)

from cicada_vector import KeywordDB

db = KeywordDB("./my_keywords.jsonl")

# Add documents
db.add(id="doc1", text="some text to index")

# Search (OR search - matches any word)
results = db.search(query="search terms")

RagDB (File-based RAG)

from cicada_vector import RagDB

db = RagDB("./my_rag_db")

# Add files
file_content = open("src/auth.py").read()
# Reuse get_embedding() helper from Quick Start
file_vector = get_embedding(file_content)
db.add_file(file_path="src/auth.py", content=file_content, vector=file_vector)

# Search (returns file + line numbers)
query_vector = get_embedding("authentication")
results = db.search(query="authentication", k=3, query_vector=query_vector)

Use cases:

Code files (semantic search across your codebase)
Git commits and history
GitHub PRs and issues
Documentation sites
Configuration files
Support tickets
Any text corpus

The Stack

Brains: Ollama (Recommended: nomic-embed-text)
Storage: JSONL (Human-readable, append-only)
Engine: Pure Python (with optional Numpy acceleration)

Part of the Cicada suite. Simple, effective semantic search for text.

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.5

Mar 3, 2026

0.1.2

Dec 25, 2025

0.1.1

Dec 25, 2025

0.1.0

Dec 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cicada_vector-0.1.5.tar.gz (93.8 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cicada_vector-0.1.5-py3-none-any.whl (29.7 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file cicada_vector-0.1.5.tar.gz.

File metadata

Download URL: cicada_vector-0.1.5.tar.gz
Upload date: Mar 3, 2026
Size: 93.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for cicada_vector-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`0b98eb98171d8ce4176ddcfbb92b3afe1c3ec094e6407fd59d714e8a493bbbc0`
MD5	`5ff09b50f9ce273b722ff350d28b0b07`
BLAKE2b-256	`945b1057640b737acf32417aa6782600e74893e6fda139babfa9b6a43a824741`

See more details on using hashes here.

File details

Details for the file cicada_vector-0.1.5-py3-none-any.whl.

File metadata

Download URL: cicada_vector-0.1.5-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 29.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for cicada_vector-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c7a6b8aa3c83bca265dc00c66627a0c4be6b312a3c5f240f5e0132506a733bbd`
MD5	`11b6e5e7b45271c3fd103592fa2eb74b`
BLAKE2b-256	`1e3807b2bc02362a6c0f75ee5b9487beea26e242f6c3abf156e771331469e459`

See more details on using hashes here.

cicada-vector 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Cicada Vector 🦗

Why this exists?

Features

Database Classes

Tools

1. cigrep (Semantic File Search)

2. cilog (Semantic Git Commit Search)

MCP Server

Quick Start

1. Generate Embeddings

2. Create Database and Add Data

3. Search

Indexing Custom Data

Other Database Classes

VectorDB (Pure Semantic Search)

KeywordDB (Traditional Search)

RagDB (File-based RAG)

The Stack

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. `cigrep` (Semantic File Search)

2. `cilog` (Semantic Git Commit Search)