Skip to main content

Fast, pure-Python full text indexing, search, and spell checking library with semantic search capabilities.

Project description

Maintained

About semlix

semlix is a fast, featureful full-text indexing and searching library implemented in pure Python. Based on the excellent Whoosh library, semlix extends it with modern semantic search capabilities while maintaining full backward compatibility. Programmers can use it to easily add search functionality to their applications and websites. Every part of how semlix works can be extended or replaced to meet your needs exactly.

What does "semlix" mean?

The name semlix stands for:

  • Semantic - Understanding meaning and context beyond keywords
  • Lexical - Traditional keyword matching (BM25/TF-IDF)
  • Index - Fast, efficient indexing and retrieval

semlix combines all three: it indexes your documents, searches them using both lexical (keyword) and semantic (meaning-based) methods, then intelligently combines the results for superior search quality.

Some of semlix's features include:

  • Pythonic API.
  • Pure-Python. No compilation or binary packages needed, no mysterious crashes.
  • Fielded indexing and search.
  • Fast indexing and retrieval -- faster than any other pure-Python, scoring, full-text search solution I know of.
  • Pluggable scoring algorithm (including BM25F), text analysis, storage, posting format, etc.
  • Powerful query language.
  • Pure Python spell-checker (as far as I know, the only one).
  • Semantic search - Hybrid search combining traditional lexical matching (BM25/TF-IDF) with modern vector-based semantic similarity for understanding meaning beyond keywords.

semlix might be useful in the following circumstances:

  • Anywhere a pure-Python solution is desirable to avoid having to build/compile native libraries (or force users to build/compile them).
  • As a research platform (at least for programmers that find Python easier to read and work with than Java ;)
  • When an easy-to-use Pythonic interface is more important to you than raw speed.

semlix is based on Whoosh, which was created and is maintained by Matt Chaput. Whoosh was originally created for use in the online help system of Side Effects Software's 3D animation software Houdini. Side Effects Software Inc. graciously agreed to open-source the code. semlix extends Whoosh with semantic search capabilities while honoring its pure-Python philosophy.

This software is licensed under the terms of the simplified BSD (A.K.A. "two clause" or "FreeBSD") license. See LICENSE.txt for information.

Installing semlix

Basic installation::

pip install semlix

For semantic search capabilities::

pip install semlix[semantic]

For full semantic search with all providers and FAISS support::

pip install semlix[semantic-full]

Or using uv::

uv pip install semlix[semantic-full]

Semantic Search

semlix includes optional semantic search capabilities that combine traditional lexical matching with vector-based semantic similarity. This enables queries like "how to fix authentication issues" to match documents containing "resolving login problems" even without shared keywords.

Key features:

  • Hybrid Search: Combines BM25/TF-IDF lexical search with semantic vector search
  • Multiple Embedding Providers: Support for sentence-transformers, OpenAI, Cohere, and HuggingFace Inference API
  • Flexible Vector Stores: Pure-Python NumPy backend for small datasets, FAISS backend for large-scale deployments
  • Result Fusion: Multiple fusion algorithms (RRF, Linear, DBSF) for optimal ranking
  • Backward Compatible: Existing Whoosh code continues to work without modification

Quick example::

>>> from semlix.index import create_in
>>> from semlix.fields import Schema, TEXT, ID
>>> from semlix.semantic import (
...     HybridSearcher, HybridIndexWriter,
...     SentenceTransformerProvider
... )
>>> from semlix.semantic.stores import NumpyVectorStore
>>> 
>>> # Create schema and index
>>> schema = Schema(id=ID(stored=True, unique=True), content=TEXT(stored=True))
>>> ix = create_in("my_index", schema)
>>> 
>>> # Create semantic components
>>> embedder = SentenceTransformerProvider("all-MiniLM-L6-v2")
>>> vector_store = NumpyVectorStore(dimension=embedder.dimension)
>>> 
>>> # Index documents
>>> with HybridIndexWriter(ix, vector_store, embedder) as writer:
...     writer.add_document(id="1", content="Python programming basics")
...     writer.add_document(id="2", content="How to fix login issues")
>>> 
>>> # Search with hybrid search
>>> searcher = HybridSearcher(ix, vector_store, embedder)
>>> results = searcher.search("authentication problems")  # Matches "login issues"!

See the semantic search documentation for more details.

Project

semlix is maintained at https://github.com/semlix/semlix

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semlix-3.0.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semlix-3.0.0-py2.py3-none-any.whl (492.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file semlix-3.0.0.tar.gz.

File metadata

  • Download URL: semlix-3.0.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semlix-3.0.0.tar.gz
Algorithm Hash digest
SHA256 4531f79c9530c774fb2c0462399f8768e8046181d327af1122becb22dc788054
MD5 998a06f853e7b64df1dd0eda5cd77f2f
BLAKE2b-256 050a59d2606e9394db1b34b3748eaaed15bb2435cb6059f09c3d8e33ced25278

See more details on using hashes here.

Provenance

The following attestation bundles were made for semlix-3.0.0.tar.gz:

Publisher: publish.yml on semlix/semlix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file semlix-3.0.0-py2.py3-none-any.whl.

File metadata

  • Download URL: semlix-3.0.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 492.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semlix-3.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 1dd4d49dfc96018198b63018d566123f9704de26efefd670c3d38ae676d38f2b
MD5 1d3e879b90b43b4635abab5212066f69
BLAKE2b-256 322fef328830a5cb1a6010c61714c7ca862f3cfcd3190bd9495ae1cfb1de26eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for semlix-3.0.0-py2.py3-none-any.whl:

Publisher: publish.yml on semlix/semlix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page