Fast, pure-Python full text indexing, search, and spell checking library with semantic search capabilities.
Project description
Maintained
About semlix
semlix is a fast, featureful full-text indexing and searching library implemented in pure Python. Based on the excellent Whoosh library, semlix extends it with modern semantic search capabilities while maintaining full backward compatibility. Programmers can use it to easily add search functionality to their applications and websites. Every part of how semlix works can be extended or replaced to meet your needs exactly.
What does "semlix" mean?
The name semlix stands for:
- Semantic - Understanding meaning and context beyond keywords
- Lexical - Traditional keyword matching (BM25/TF-IDF)
- Index - Fast, efficient indexing and retrieval
semlix combines all three: it indexes your documents, searches them using both lexical (keyword) and semantic (meaning-based) methods, then intelligently combines the results for superior search quality.
Some of semlix's features include:
- Pythonic API.
- Pure-Python. No compilation or binary packages needed, no mysterious crashes.
- Fielded indexing and search.
- Fast indexing and retrieval -- faster than any other pure-Python, scoring, full-text search solution I know of.
- Pluggable scoring algorithm (including BM25F), text analysis, storage, posting format, etc.
- Powerful query language.
- Pure Python spell-checker (as far as I know, the only one).
- Semantic search - Hybrid search combining traditional lexical matching (BM25/TF-IDF) with modern vector-based semantic similarity for understanding meaning beyond keywords.
semlix might be useful in the following circumstances:
- Anywhere a pure-Python solution is desirable to avoid having to build/compile native libraries (or force users to build/compile them).
- As a research platform (at least for programmers that find Python easier to read and work with than Java ;)
- When an easy-to-use Pythonic interface is more important to you than raw speed.
semlix is based on Whoosh, which was created and is maintained by Matt Chaput. Whoosh was originally created for use in the online help system of Side Effects Software's 3D animation software Houdini. Side Effects Software Inc. graciously agreed to open-source the code. semlix extends Whoosh with semantic search capabilities while honoring its pure-Python philosophy.
This software is licensed under the terms of the simplified BSD (A.K.A. "two clause" or "FreeBSD") license. See LICENSE.txt for information.
Installing semlix
Basic installation::
pip install semlix
For semantic search capabilities::
pip install semlix[semantic]
For full semantic search with all providers and FAISS support::
pip install semlix[semantic-full]
Or using uv::
uv pip install semlix[semantic-full]
Semantic Search
semlix includes optional semantic search capabilities that combine traditional lexical matching with vector-based semantic similarity. This enables queries like "how to fix authentication issues" to match documents containing "resolving login problems" even without shared keywords.
Key features:
- Hybrid Search: Combines BM25/TF-IDF lexical search with semantic vector search
- Multiple Embedding Providers: Support for sentence-transformers, OpenAI, Cohere, and HuggingFace Inference API
- Flexible Vector Stores: Pure-Python NumPy backend for small datasets, FAISS backend for large-scale deployments
- Result Fusion: Multiple fusion algorithms (RRF, Linear, DBSF) for optimal ranking
- Backward Compatible: Existing Whoosh code continues to work without modification
Quick example::
>>> from semlix.index import create_in
>>> from semlix.fields import Schema, TEXT, ID
>>> from semlix.semantic import (
... HybridSearcher, HybridIndexWriter,
... SentenceTransformerProvider
... )
>>> from semlix.semantic.stores import NumpyVectorStore
>>>
>>> # Create schema and index
>>> schema = Schema(id=ID(stored=True, unique=True), content=TEXT(stored=True))
>>> ix = create_in("my_index", schema)
>>>
>>> # Create semantic components
>>> embedder = SentenceTransformerProvider("all-MiniLM-L6-v2")
>>> vector_store = NumpyVectorStore(dimension=embedder.dimension)
>>>
>>> # Index documents
>>> with HybridIndexWriter(ix, vector_store, embedder) as writer:
... writer.add_document(id="1", content="Python programming basics")
... writer.add_document(id="2", content="How to fix login issues")
>>>
>>> # Search with hybrid search
>>> searcher = HybridSearcher(ix, vector_store, embedder)
>>> results = searcher.search("authentication problems") # Matches "login issues"!
See the semantic search documentation for more details.
Project
semlix is maintained at https://github.com/semlix/semlix
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semlix-3.0.0.tar.gz.
File metadata
- Download URL: semlix-3.0.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4531f79c9530c774fb2c0462399f8768e8046181d327af1122becb22dc788054
|
|
| MD5 |
998a06f853e7b64df1dd0eda5cd77f2f
|
|
| BLAKE2b-256 |
050a59d2606e9394db1b34b3748eaaed15bb2435cb6059f09c3d8e33ced25278
|
Provenance
The following attestation bundles were made for semlix-3.0.0.tar.gz:
Publisher:
publish.yml on semlix/semlix
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semlix-3.0.0.tar.gz -
Subject digest:
4531f79c9530c774fb2c0462399f8768e8046181d327af1122becb22dc788054 - Sigstore transparency entry: 729801426
- Sigstore integration time:
-
Permalink:
semlix/semlix@a057bbfcaf6af4cc2e817f05982357dc9cf75533 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/semlix
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a057bbfcaf6af4cc2e817f05982357dc9cf75533 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file semlix-3.0.0-py2.py3-none-any.whl.
File metadata
- Download URL: semlix-3.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 492.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dd4d49dfc96018198b63018d566123f9704de26efefd670c3d38ae676d38f2b
|
|
| MD5 |
1d3e879b90b43b4635abab5212066f69
|
|
| BLAKE2b-256 |
322fef328830a5cb1a6010c61714c7ca862f3cfcd3190bd9495ae1cfb1de26eb
|
Provenance
The following attestation bundles were made for semlix-3.0.0-py2.py3-none-any.whl:
Publisher:
publish.yml on semlix/semlix
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semlix-3.0.0-py2.py3-none-any.whl -
Subject digest:
1dd4d49dfc96018198b63018d566123f9704de26efefd670c3d38ae676d38f2b - Sigstore transparency entry: 729801464
- Sigstore integration time:
-
Permalink:
semlix/semlix@a057bbfcaf6af4cc2e817f05982357dc9cf75533 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/semlix
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a057bbfcaf6af4cc2e817f05982357dc9cf75533 -
Trigger Event:
workflow_dispatch
-
Statement type: