Skip to main content

Fast, pure-Python full text indexing, search, and spell checking library with semantic search capabilities.

Project description

Maintained

About semlix

semlix is a fast, featureful full-text indexing and searching library implemented in pure Python. Based on the excellent Whoosh library, semlix extends it with modern semantic search capabilities while maintaining full backward compatibility. Programmers can use it to easily add search functionality to their applications and websites. Every part of how semlix works can be extended or replaced to meet your needs exactly.

What does "semlix" mean?

The name semlix stands for:

  • Semantic - Understanding meaning and context beyond keywords
  • Lexical - Traditional keyword matching (BM25/TF-IDF)
  • Index - Fast, efficient indexing and retrieval

semlix combines all three: it indexes your documents, searches them using both lexical (keyword) and semantic (meaning-based) methods, then intelligently combines the results for superior search quality.

Some of semlix's features include:

  • Pythonic API.
  • Pure-Python. No compilation or binary packages needed, no mysterious crashes.
  • Fielded indexing and search.
  • Fast indexing and retrieval -- faster than any other pure-Python, scoring, full-text search solution I know of.
  • Pluggable scoring algorithm (including BM25F), text analysis, storage, posting format, etc.
  • Powerful query language.
  • Pure Python spell-checker (as far as I know, the only one).
  • Semantic search - Hybrid search combining traditional lexical matching (BM25/TF-IDF) with modern vector-based semantic similarity for understanding meaning beyond keywords.

semlix might be useful in the following circumstances:

  • Anywhere a pure-Python solution is desirable to avoid having to build/compile native libraries (or force users to build/compile them).
  • As a research platform (at least for programmers that find Python easier to read and work with than Java ;)
  • When an easy-to-use Pythonic interface is more important to you than raw speed.

semlix is based on Whoosh, which was created and is maintained by Matt Chaput. Whoosh was originally created for use in the online help system of Side Effects Software's 3D animation software Houdini. Side Effects Software Inc. graciously agreed to open-source the code. semlix extends Whoosh with semantic search capabilities while honoring its pure-Python philosophy.

This software is licensed under the terms of the simplified BSD (A.K.A. "two clause" or "FreeBSD") license. See LICENSE.txt for information.

Installing semlix

Install from PyPI: https://pypi.org/project/semlix/

Basic installation::

pip install semlix

For semantic search capabilities::

pip install semlix[semantic]

For full semantic search with all providers and FAISS support::

pip install semlix[semantic-full]

Or using uv::

uv pip install semlix[semantic-full]

Semantic Search

semlix includes optional semantic search capabilities that combine traditional lexical matching with vector-based semantic similarity. This enables queries like "how to fix authentication issues" to match documents containing "resolving login problems" even without shared keywords.

Key features:

  • Hybrid Search: Combines BM25/TF-IDF lexical search with semantic vector search
  • Multiple Embedding Providers: Support for sentence-transformers, OpenAI, Cohere, and HuggingFace Inference API
  • Flexible Vector Stores: Pure-Python NumPy backend for small datasets, FAISS backend for large-scale deployments
  • Result Fusion: Multiple fusion algorithms (RRF, Linear, DBSF) for optimal ranking
  • Backward Compatible: Existing Whoosh code continues to work without modification

Quick example::

>>> from semlix.index import create_in
>>> from semlix.fields import Schema, TEXT, ID
>>> from semlix.semantic import (
...     HybridSearcher, HybridIndexWriter,
...     SentenceTransformerProvider
... )
>>> from semlix.semantic.stores import NumpyVectorStore
>>> 
>>> # Create schema and index
>>> schema = Schema(id=ID(stored=True, unique=True), content=TEXT(stored=True))
>>> ix = create_in("my_index", schema)
>>> 
>>> # Create semantic components
>>> embedder = SentenceTransformerProvider("all-MiniLM-L6-v2")
>>> vector_store = NumpyVectorStore(dimension=embedder.dimension)
>>> 
>>> # Index documents
>>> with HybridIndexWriter(ix, vector_store, embedder) as writer:
...     writer.add_document(id="1", content="Python programming basics")
...     writer.add_document(id="2", content="How to fix login issues")
>>> 
>>> # Search with hybrid search
>>> searcher = HybridSearcher(ix, vector_store, embedder)
>>> results = searcher.search("authentication problems")  # Matches "login issues"!

See the semantic search documentation for more details.

Project

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semlix-3.1.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semlix-3.1.0-py2.py3-none-any.whl (532.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file semlix-3.1.0.tar.gz.

File metadata

  • Download URL: semlix-3.1.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semlix-3.1.0.tar.gz
Algorithm Hash digest
SHA256 b2ce3afe6065460e64f76884cad3ee7c10b9e2d594e3485925e38fdd29f7f281
MD5 6c22bfc753ec58f4e4c6ae9070800a95
BLAKE2b-256 a41f4290c701e8ab6df23099bc1974e9bad888d43a73fd98b6a7a1dfee21dd3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for semlix-3.1.0.tar.gz:

Publisher: publish.yml on semlix/semlix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file semlix-3.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: semlix-3.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 532.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semlix-3.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 683d2f14a1d4237b08cd8dfb384708019a4c6a166b82d3eaa9b99c9cac7122da
MD5 3ec8f032203f1da0a70eadaf8df6bf18
BLAKE2b-256 e3db53cee354619ba58104b1aca5ac0f522dd5a4b07dc975e562a711237bcb43

See more details on using hashes here.

Provenance

The following attestation bundles were made for semlix-3.1.0-py2.py3-none-any.whl:

Publisher: publish.yml on semlix/semlix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page