Run ColBERT Wikipedia server

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nielsgl

These details have not been verified by PyPI

Project links

Dataset

Project description

ColBERT Wikipedia Server

CLI tooling to fetch the ColBERT Wikipedia 2017 dataset and run a lightweight Flask API on top of the ColBERT v2 searcher.

$> I wrote this because the ColBERT server is down and I couldn't try one of the tutorial from DSPy.
$> I only tested this on my macbook, please open an issue if you have problems or feature requests.

Features

One-command install and execution via uv tool.
Automatically downloads either ready-to-serve indexes/collection or the original archives.
Optional archive extraction flow for offline usage.
Caches ColBERT queries for fast, repeated lookups.
Exposes a simple /api/search endpoint for programmatic access.

Installation

uv tool install colbert-server

This registers a colbert-server executable in your uv toolchain.

Or if you just want to run it:

uvx run colbert-server --help

Running the server

Use data from the Hugging Face cache (recommended quick start)

colbert-server serve --from-cache

This downloads only the collection/ and indexes/ folders from nielsgl/colbert-wiki2017, resolves the on-disk paths from the Hugging Face cache, and starts the server.

Provide existing local assets

colbert-server serve \
  --index-root /path/to/indexes \
  --index-name wiki17.nbits.local \
  --collection-path /path/to/collection/wiki.abstracts.2017/collection.tsv

Use this mode when you already have ColBERT indexes and a collection TSV locally.

Download archives first, then serve

colbert-server serve \
  --download-archives /tmp/wiki-assets \
  --extract \
  --port 8894

This fetches the archive files into /tmp/wiki-assets/archives, extracts them into /tmp/wiki-assets, auto-detects the resulting layout (e.g. wiki17.nbits.local), and starts the Flask server on port 8894.

API usage

Once running, the server listens on the host/port provided (defaults to 0.0.0.0:8893) and serves ColBERT search results via:

GET /api/search?query=<text>&k=<top-k>

Example request:

http://127.0.0.1:8893/api/search?query=halloween+movie&k=3

The JSON response includes the ranked passages, their scores, and normalized probabilities.

Managing dataset archives only

If you just want the raw archive bundles in a local directory:

colbert-server download-archives ./downloads --extract

Add --extract-to /desired/path to unpack into a different directory. You can later reuse the extracted paths with the serve command’s --index-root and --collection-path flags.

Alternative / Manual Method

In case you don't want to use the script / uv tool you can set it up as follows:

Add the dependencies to your project: uv add colbert-ai flask faiss-cpu torch
Download the files (both the index and the collection) from the archives directory from the HuggingFace dataset and unzip them.
Copy the standalone.py script from this repository and edit the INDEX_ROOT and COLLECTION_PATH variables.
Run the server with uv run standalone.py and <tada.wav>

Development tips

Requires Python 3.13+ (or adjust the pyproject.toml requirement to match your interpreter).
Run colbert-server --help or colbert-server serve --help to inspect available options.
The dataset helpers live under colbert_server/data.py; server configuration sits in colbert_server/server.py.
GitHub Actions runs lint/tests on every push; see .github/workflows/ci.yml for details.
Publishing uses the .github/workflows/publish.yml workflow. Before releasing, add PYPI_API_TOKEN (and optionally TEST_PYPI_API_TOKEN) to the repository secrets, bump the version in pyproject.toml, create a vX.Y.Z tag, and push it to trigger the publish job.

Happy searching! 🧠📚

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nielsgl

These details have not been verified by PyPI

Project links

Dataset

Release history Release notifications | RSS feed

0.3.1

Nov 3, 2025

0.3.0

Oct 31, 2025

0.2.1

Oct 30, 2025

This version

0.1.0

Oct 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

colbert_server-0.1.0.tar.gz (8.4 kB view details)

Uploaded Oct 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

colbert_server-0.1.0-py3-none-any.whl (9.9 kB view details)

Uploaded Oct 30, 2025 Python 3

File details

Details for the file colbert_server-0.1.0.tar.gz.

File metadata

Download URL: colbert_server-0.1.0.tar.gz
Upload date: Oct 30, 2025
Size: 8.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for colbert_server-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cd781e6e2b648115ea5524cde0ad7a0718467a939f19897020bd27eac69ce19a`
MD5	`ad57d69dddfb1dd5145327ff6360864e`
BLAKE2b-256	`1a3deca65b724d02aec0f17af1823aa11e07d4c7fc353ecc512121e0cffd738d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for colbert_server-0.1.0.tar.gz:

Publisher: publish.yml on nielsgl/colbert-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: colbert_server-0.1.0.tar.gz
- Subject digest: cd781e6e2b648115ea5524cde0ad7a0718467a939f19897020bd27eac69ce19a
- Sigstore transparency entry: 656757499
- Sigstore integration time: Oct 30, 2025
Source repository:
- Permalink: nielsgl/colbert-server@687d9fa6e2f06864c46512a2342157383a008b2c
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/nielsgl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@687d9fa6e2f06864c46512a2342157383a008b2c
- Trigger Event: push

File details

Details for the file colbert_server-0.1.0-py3-none-any.whl.

File metadata

Download URL: colbert_server-0.1.0-py3-none-any.whl
Upload date: Oct 30, 2025
Size: 9.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for colbert_server-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`44620818223241d20bebe77a1a740372567e3780a76eb47681b63ed9c62640bf`
MD5	`5d4cad3319fad1d92bc4891d2dac17e9`
BLAKE2b-256	`f459c08bca68e8cc92f40bd9d3843213d64087646908226b958035a1b5a4bd96`

See more details on using hashes here.

Provenance

The following attestation bundles were made for colbert_server-0.1.0-py3-none-any.whl:

Publisher: publish.yml on nielsgl/colbert-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: colbert_server-0.1.0-py3-none-any.whl
- Subject digest: 44620818223241d20bebe77a1a740372567e3780a76eb47681b63ed9c62640bf
- Sigstore transparency entry: 656757516
- Sigstore integration time: Oct 30, 2025
Source repository:
- Permalink: nielsgl/colbert-server@687d9fa6e2f06864c46512a2342157383a008b2c
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/nielsgl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@687d9fa6e2f06864c46512a2342157383a008b2c
- Trigger Event: push

colbert-server 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ColBERT Wikipedia Server

Features

Installation

Running the server

Use data from the Hugging Face cache (recommended quick start)

Provide existing local assets

Download archives first, then serve

API usage

Managing dataset archives only

Alternative / Manual Method

Development tips

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance