GPU-accelerated semantic similarity and verse resonance explorer.

These details have not been verified by PyPI

Project links

Project description

echoverse

License Python Version GPU Accelerated

Find hidden echoes across massive text corpora—with GPU power.

Overview
Features
Installation
Requirements
Use Cases
Quick Example
CLI Usage
Output Format
Benchmarks
API Reference
Roadmap
Contributing
License
Acknowledgments

Overview

echoverse is a Python module and CLI tool for discovering semantically similar pairs (“echoes”) in large collections of text. Whether you're analyzing verse, literature, or academic works, echoverse uses GPU acceleration (CUDA) to make it feasible to compare millions or billions of text pairs in minutes.

Use it to uncover thematic resonance, detect plagiarism, power search engines, or build next-gen literary analysis tools.

Features

✨ All-pairs semantic similarity: Find every matching text pair above a given threshold.
⚡ GPU acceleration: Built with CUDA and NumPy for extreme performance.
💾 Flexible I/O: Accepts any embedding model, exports to clean CSV format.
🚀 CLI & Library ready: Use as a standalone tool or integrate into your Python workflow.
🔧 Batch-safe: Handles large-scale embeddings with chunking and memory control.
⚖️ Configurable: Tune thresholds, verbosity, filtering, and more.

Installation

pip install echoverse
# Or clone from source:
# git clone https://github.com/buadofalbhain/echoverse.git
# cd echoverse
# pip install .

Requirements

Python 3.8+
PyCUDA
NumPy
(Optional) tqdm for progress bars

⚠️ Requires a CUDA-capable NVIDIA GPU (Compute Capability ≥ 6.1).

Use Cases

Plagiarism Detection & Proof of Ownership: Detect semantically similar passages, even when reworded. Prove authorship by tracing echoes of original work across other texts.
Literary Analysis & Intertextuality: Explore hidden connections between verses, books, or traditions. Build resonance maps between authors, genres, or historical periods.
Content Recommendation: Suggest similar articles, verses, or ideas based on deep meaning.
Dataset Deduplication & Clustering: Eliminate redundancy and group similar entries intelligently.
Semantic Search & Retrieval: Power AI-enhanced search engines for textual archives.

Quick Example

from echoverse import compute_all_pairs_batched_gpu, normalize_embeddings
import numpy as np

# Load and normalize your embeddings
embeddings = np.load("my_corpus_embeddings.npy")
embeddings = normalize_embeddings(embeddings)

# Find all pairs above 0.85 similarity
results = compute_all_pairs_batched_gpu(embeddings, threshold=0.85)

# results is a NumPy structured array: (index1, index2, similarity)

CLI Usage

python -m echoverse_cli \
  --input my_embeddings.json \
  --output echoes.csv \
  --threshold 0.85 \
  --mode allpairs

Output Format

The output CSV contains:

Column	Description
ID1	Index or ID of the first text/verse
ID2	Index or ID of the second text/verse
Similarity	Cosine similarity score (float)
Text1 (opt.)	Text of the first item (if available)
Text2 (opt.)	Text of the second item (if available)

Example:

ID1,ID2,Similarity,Text1,Text2
42,311,0.876,"In the beginning...","And so it was..."
...

Benchmarks

Dataset Size	Pairs Compared	Runtime (A100 GPU)	Notes
10k	~50M	~45 seconds	Medium corpus
100k	~5B	~12 minutes	Large corpus
250k	~31B	~1 hour	Bible-scale

🔄 CPU version would take days to weeks for the same tasks.

API Reference

normalize_embeddings(np.ndarray) -> np.ndarray
compute_all_pairs_batched_gpu(np.ndarray, threshold: float) -> np.ndarray
compute_similarity_cuda_filtered(np.ndarray, threshold: float) -> np.ndarray

See docs/ for detailed parameters, modes, and customization options.

Roadmap

CPU fallback mode
Sparse matrix mode for memory-constrained environments
LangChain / HuggingFace integration
Interactive web-based visualization of echo networks

Contributing

We welcome contributions of all kinds:

New features, bug fixes, and optimization ideas
Documentation, tutorials, and example datasets
Integrations with external tools or frameworks

See CONTRIBUTING.md to get started.

License

MIT License — do whatever you want, just credit the work.

Acknowledgments

Built with love by the open-source community—powered by CUDA, NumPy, and the spirit of discovery.

<<<<<<< HEAD Ready to dive in? Get started here →

Ready to dive in? Get started here →

3bf7529 (Initial commit for echoverse)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

echoverse-0.1.0.tar.gz (5.1 kB view details)

Uploaded Apr 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

echoverse-0.1.0-py3-none-any.whl (4.7 kB view details)

Uploaded Apr 27, 2025 Python 3

File details

Details for the file echoverse-0.1.0.tar.gz.

File metadata

Download URL: echoverse-0.1.0.tar.gz
Upload date: Apr 27, 2025
Size: 5.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for echoverse-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e4e8808c61a6ce1a0d10d49bda54e061b7fd72813bf1b72f4961e3a893c38e55`
MD5	`c22a949e4fd55fb8ffcb55c607cd7dbb`
BLAKE2b-256	`05ef6f4ffa9a5899c7f5aebb85beb4c8bd0f1c394fcad9852e2385d79e47ef93`

See more details on using hashes here.

File details

Details for the file echoverse-0.1.0-py3-none-any.whl.

File metadata

Download URL: echoverse-0.1.0-py3-none-any.whl
Upload date: Apr 27, 2025
Size: 4.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for echoverse-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`621cf3ffaeb6b59aa5dd7a569edb59288fb83666af5977c1d2e2bd0d8569d8cb`
MD5	`65780ca462b27b5eae8e553c1cc6c7e9`
BLAKE2b-256	`b573e23f45494733d80ffd88d9ff2a075e0669df4c74ae9aaa198c18e11e5640`

See more details on using hashes here.

echoverse 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

echoverse

Table of Contents

Overview

Features

Installation

Requirements

Use Cases

Quick Example

CLI Usage

Output Format

Benchmarks

API Reference

Roadmap

Contributing

License

Acknowledgments

<<<<<<< HEAD Ready to dive in? Get started here →

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes