A tiny, persistent, zero-dependency full-text search engine in pure Python.

These details have not been verified by PyPI

Project links

Project description

looseene 🕵️‍♂️

A tiny, persistent, full-text search engine in a single Python file.

It's like Lucene, but... looser.

What is `looseene`?

looseene is a lightweight, zero-dependency search library for Python projects where setting up Elasticsearch or Solr is overkill. It provides a simple API to index documents, persist them to disk efficiently, and perform relevant full-text searches with modern ranking and highlighting.

It's the perfect solution for:

Adding search to a static site generator (e.g., indexing Markdown files).
Searching through application logs or local documents.
Desktop applications needing offline search capabilities.
Prototyping search features before scaling up to a larger system.

Installation

To install looseene, you can clone the repository and install it directly using pip:

git clone https://github.com/YOUR_USERNAME/looseene.git
cd looseene
pip install .

(Note: Once the package is on PyPI, this will become pip install looseene)

Quick Start

Get up and running in less than a minute.

from looseene import create_index, add_to_index, search_text, highlight_result, save_index

# 1. Create a new index or load an existing one from disk.
# The schema defines your document structure. 'id' must be an integer primary key.
create_index(
    'my_docs', 
    schema={'id': int, 'title': str, 'content': str}, 
    path='./my_index_data'
)

# 2. Add some documents. You can add them in batches.
docs = [
    {'id': 1, 'title': 'The Fox', 'content': 'The quick brown fox jumps over the lazy dog.'},
    {'id': 2, 'title': 'The Engine', 'content': 'A lazy developer never creates a good search engine.'}
]
for doc in docs:
    add_to_index('my_docs', doc)

# 3. Flush the in-memory buffer to disk to make the index persistent.
save_index('my_docs')

# 4. Search returns results ranked by BM25 relevance.
query = "lazy fox search"
print(f"Searching for: '{query}'\n")

for doc in search_text('my_docs', query):
    # The 'content' field will be used for highlighting.
    snippet = highlight_result(doc, 'content', query)
    print(f"📄 ID: {doc['id']} | Title: {doc['title']}")
    print(f"   Snippet: {snippet}\n")

Features

looseene is packed with features typically found in much larger search systems:

🗄️ Persistent On-Disk Storage: Your index lives on disk. It uses a Log-Structured Merge-tree (LSM) architecture, flushing data in immutable, compressed segments. This means your data is safe even if your application restarts.
🚀 Fast & Memory-Efficient: Leverages mmap to search through gigabytes of data without loading everything into memory. Vocabularies are kept in RAM for quick lookups, while posting lists are read on demand.
🏆 Modern Ranking (BM25): Forget simple keyword counts. looseene uses the industry-standard BM25 algorithm to rank results by relevance, considering term frequency (TF), inverse document frequency (IDF), and document length.
✨ Result Highlighting: Automatically generates highlighted snippets from your documents, showing users exactly where their query matched.
🗑️ Manual Compaction: Includes a compact_index() function to merge segments, reclaim disk space from deleted/updated documents, and keep searches fast over time.
🐍 Pure Python, Zero Dependencies: Just one file. No complex setup, no external services.

Advanced Usage

Document Updates and Deletions

looseene supports the full CRUD lifecycle.

from looseene import update_document, delete_document

# Update a document by providing its full data with the same ID.
update_document('my_docs', {'id': 2, 'content': 'A proactive developer creates a great search engine.'})

# Delete a document by its ID.
delete_document('my_docs', 1)

Compaction

Over time, your index directory will accumulate segment files. Compaction merges them into a single, optimized segment, removing deleted data and speeding up searches. It's recommended to run this periodically as part of a maintenance task.

from looseene import compact_index

# This can take some time on large indexes.
print("Starting compaction...")
compact_index('my_docs')
print("Compaction finished.")

Schema and Data Types

The schema dictionary defines the structure of your documents.

Primary Key: The primary key field must be named id and its type must be int. This is a current limitation for simplicity.
Indexed Fields: All fields with type str will be tokenized and indexed for full-text search.
Other Types: Other standard Python types (int, float, bool, etc.) are stored but not indexed. You cannot search on them directly.

Performance Characteristics

looseene is designed for performance on a single machine. Benchmarks on consumer hardware (e.g., a modern SSD and CPU) show:

Indexing Speed: Can index 3,000+ documents in under 0.1 seconds.
Search Latency: Typical queries return results in under 1 millisecond on a moderately sized index (thousands of documents).

Performance depends on document size, but the LSM architecture ensures that write performance remains high even as the index grows.

When Not to Use `looseene`

Honesty is the best policy. looseene is a powerful tool, but it's not a silver bullet. You should consider more robust solutions like Elasticsearch or Meilisearch if you need:

Distributed Search: looseene runs on a single node and cannot be clustered.
Terabyte-Scale Data: While it handles data larger than RAM, it's not optimized for massive, TB-scale indexes.
Real-Time, Sub-Millisecond Indexing: Indexing is fast, but it's not real-time. There's a delay until save_index() is called.
Complex Queries: No support for geographical queries, faceted search, or complex aggregations.
Fine-grained Security: No built-in access control or user management.

API Reference

Here is a summary of the public API:

# --- Index Management ---
create_index(name: str, schema: Dict, path: Optional[str] = None) -> None
save_index(name: str) -> None
compact_index(name: str) -> None

# --- Document Operations ---
add_to_index(name: str, data: Dict) -> None
update_document(name: str, data: Dict) -> None
delete_document(name: str, doc_id: int) -> None

# --- Searching ---
search_text(name: str, query: str) -> Generator[Dict, None, None]
highlight_result(doc: Dict, field: str, query: str, window: int = 60) -> str

Thread Safety

looseene is thread-safe for common use cases.

You can safely have multiple threads reading (searching) from an index concurrently.
You can safely have one thread writing (add, update, delete) while other threads are reading.
Writing from multiple threads simultaneously is also safe, as write operations are protected by a lock.

Running Tests

The library includes a comprehensive test suite using Python's standard unittest library. The tests cover indexing, search correctness, BM25 ranking, document updates, deletions, segment flushing, and compaction logic.

To run the tests, navigate to the project's root directory and execute:

python -m unittest tests/test_engine.py

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Dec 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

looseene-1.0.0.tar.gz (16.8 kB view details)

Uploaded Dec 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

looseene-1.0.0-py3-none-any.whl (10.5 kB view details)

Uploaded Dec 13, 2025 Python 3

File details

Details for the file looseene-1.0.0.tar.gz.

File metadata

Download URL: looseene-1.0.0.tar.gz
Upload date: Dec 13, 2025
Size: 16.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for looseene-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`1afe0b38128956c45714764a1d9c7ba91348b4f4863e6a14acdc1379edfe0c41`
MD5	`76ec92b24155716cf975845929d209ea`
BLAKE2b-256	`fe1a9237353b5839d0b97831023085e5f6d3b1715104480fbd9633b8d189e38d`

See more details on using hashes here.

File details

Details for the file looseene-1.0.0-py3-none-any.whl.

File metadata

Download URL: looseene-1.0.0-py3-none-any.whl
Upload date: Dec 13, 2025
Size: 10.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for looseene-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8decb5ca179fab94e6275ecfd1993a48e7921d2e169874a4c3d6181da3a58839`
MD5	`1e694603beb5034633ad0ae7c2084893`
BLAKE2b-256	`83f003d243bc2878a44a8c6827a81524f8a797e3b3ef934ee7349b3e7236ad4b`

See more details on using hashes here.

looseene 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

looseene 🕵️‍♂️

What is looseene?

Installation

Quick Start

Features

Advanced Usage

Document Updates and Deletions

Compaction

Schema and Data Types

Performance Characteristics

When Not to Use looseene

API Reference

Thread Safety

Running Tests

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

What is `looseene`?

When Not to Use `looseene`