Fast Ball Mapper construction with BallTree and optional FAISS backends
Project description
fast-ballmapper
fast-ballmapper builds Ball Mapper graph summaries of high-dimensional point
clouds. It supports scikit-learn's BallTree, optional FAISS acceleration,
deterministic farthest-point sampling, NetworkX graph construction, node
coloring, and optional Matplotlib or Plotly visualization.
The package is designed around two use cases:
- exact Ball Mapper cover construction using BallTree or exhaustive FAISS indexes;
- experimental approximate range-query construction using configurable FAISS indexes.
Installation
Core package:
pip install fast-ballmapper
With plotting support:
pip install "fast-ballmapper[plot]"
With CPU FAISS support:
pip install "fast-ballmapper[faiss]"
With GPU FAISS support:
pip install "fast-ballmapper[faiss-gpu]"
With development tools, plotting, and CPU FAISS support:
python -m pip install -e ".[dev,plot,faiss]"
pytest
The PyPI distribution name contains a hyphen, while the Python import package uses an underscore:
pip install fast-ballmapper
import fast_ballmapper
GPU FAISS support
GPU support is optional. The faiss extra installs the CPU FAISS package. To
use GPU execution, install a GPU-enabled FAISS build separately for your CUDA
and platform.
The package can request GPU execution through FaissConfig(device="gpu") or
FaissConfig(device="auto"). If GPU support is unavailable and
gpu_fallback_to_cpu=True, the backend falls back to CPU execution.
Minimal example
import numpy as np
from fast_ballmapper import build_mapper, compute_landmarks
rng = np.random.default_rng(42)
x = rng.random((500, 2))
landmarks, cover = compute_landmarks(
x,
eps=0.1,
method="ball_tree",
metric="euclidean",
leaf_size=40,
)
graph = build_mapper(cover)
print(f"Landmarks: {len(landmarks)}")
print(f"Edges: {graph.number_of_edges()}")
Fixed-landmark cover construction
The function compute_landmarks selects landmarks and constructs their cover.
The function build_cover only constructs cover sets around an already chosen
landmark collection.
This is useful for comparing different backends or approximate FAISS configurations on the same landmark set.
import numpy as np
from fast_ballmapper import FaissConfig, build_cover, compute_landmarks
rng = np.random.default_rng(42)
x = rng.random((1000, 8)).astype("float32")
landmarks, exact_cover = compute_landmarks(
x,
eps=0.25,
method="faiss",
metric="euclidean",
faiss_config=FaissConfig(factory="Flat"),
)
approximate_cover = build_cover(
x,
landmarks,
eps=0.25,
method="faiss",
metric="euclidean",
faiss_config=FaissConfig(
factory="IVF64,Flat",
search_params={"nprobe": 8},
),
)
Here, the landmark set is fixed. Therefore, differences between exact_cover
and approximate_cover come from the range-query backend rather than from
different landmark choices.
Farthest-point sampling
from fast_ballmapper import compute_landmarks_fps
landmarks, cover = compute_landmarks_fps(
x,
eps=0.1,
start_index=None,
method="ball_tree",
metric="euclidean",
leaf_size=40,
metric_kwargs=None,
)
When start_index is None, the lexicographically smallest point is selected
first, making the result deterministic. Landmark selection and cover queries use
the same backend and distance.
For FAISS, farthest-point sampling uses an exact Flat index. Approximate FAISS indexes are not used for farthest-point sampling because the algorithm requires distances from each selected landmark to every data point.
BallTree metrics
Any metric supported by scikit-learn's BallTree can be passed through metric.
Parameterized metrics use metric_kwargs:
landmarks, cover = compute_landmarks_fps(
x,
eps=0.2,
metric="minkowski",
metric_kwargs={"p": 3},
)
BallTree is useful when metric flexibility is important.
FAISS backend
FAISS supports Euclidean and cosine distances in this package:
from fast_ballmapper import compute_landmarks
landmarks, cover = compute_landmarks(
x,
eps=0.1,
method="faiss",
metric="euclidean",
)
For cosine distance, zero vectors are rejected because cosine distance is undefined for them.
By default, the FAISS backend uses a Flat index:
from fast_ballmapper import FaissConfig
config = FaissConfig(factory="Flat")
A Flat FAISS index is exhaustive. It compares the query with every indexed vector, so it accelerates distance computation without changing the Ball Mapper range sets, apart from finite-precision effects near the threshold.
Configurable FAISS indexes
The FAISS backend can be configured using FaissConfig:
from fast_ballmapper import FaissConfig, compute_landmarks
config = FaissConfig(
factory="IVF256,Flat",
search_params={"nprobe": 16},
)
landmarks, cover = compute_landmarks(
x,
eps=0.1,
method="faiss",
metric="euclidean",
faiss_config=config,
)
The factory argument is a FAISS index-factory string. Examples include:
FaissConfig(factory="Flat")
FaissConfig(factory="IVF256,Flat", search_params={"nprobe": 16})
FaissConfig(factory="HNSW32", search_params={"efSearch": 64})
FaissConfig(factory="SQ8")
FaissConfig(factory="IVF256,SQ8", search_params={"nprobe": 16})
FaissConfig(factory="IVF256,PQ16x4", query_mode="knn", candidate_k=1024)
Approximate indexes may change the computed cover. Non-exhaustive indexes can
omit true ball members, while compressed indexes can also introduce points whose
exact distance lies outside the ball. For controlled comparisons, use
build_cover with a fixed landmark set.
Candidate search and exact verification
Some FAISS configurations use a candidate-limited k-nearest-neighbour search
instead of native range search:
config = FaissConfig(
factory="IVF256,PQ16x4",
search_params={"nprobe": 16},
query_mode="knn",
candidate_k=1024,
)
Exact verification can be enabled to recompute exact distances for candidates and remove points outside the requested radius:
config = FaissConfig(
factory="IVF256,Flat",
search_params={"nprobe": 16},
query_mode="knn",
candidate_k=1024,
exact_verify=True,
)
Exact verification prevents false-positive ball memberships among the examined candidates, but it cannot recover true members that were never included in the candidate set.
Optional GPU execution
GPU execution can be requested through FaissConfig:
config = FaissConfig(
factory="Flat",
device="gpu",
gpu_device=0,
gpu_fallback_to_cpu=True,
)
Use device="auto" to use GPU only when a GPU-enabled FAISS build and a
visible GPU are available:
config = FaissConfig(
factory="Flat",
device="auto",
)
The default is device="cpu" for reproducibility.
Graph construction and coloring
from fast_ballmapper import (
build_mapper,
color_by_density,
color_by_entropy,
color_by_function,
color_by_mode,
color_by_size,
)
graph = build_mapper(cover)
sizes = color_by_size(cover)
density = color_by_density(cover)
mean_first_coordinate = color_by_function(x[:, 0], cover)
Matplotlib visualization
import matplotlib.pyplot as plt
from fast_ballmapper.plotting.matplotlib import add_colorbar, draw_ball_mapper
fig, ax = plt.subplots(figsize=(8, 6))
_, nodes = draw_ball_mapper(
graph,
colors=sizes,
sizes=sizes,
layout="spring",
node_scale=500,
ax=ax,
)
add_colorbar(nodes, ax, label="Ball size")
plt.show()
Plotly visualization
from fast_ballmapper.plotting.plotly import draw_ball_mapper_plotly
figure = draw_ball_mapper_plotly(
graph,
cover,
colorings={"Ball size": sizes, "Density": density},
sizes=sizes,
show=True,
)
Use export_html="ball_mapper.html" to create a standalone interactive HTML
file.
Experiments
Research scripts can be placed in the top-level experiments/ directory.
These scripts are not part of the public package API.
A fixed-landmark approximate FAISS experiment can be run with:
python experiments/run_fixed_landmarks.py \
--n 5000 \
--d 32 \
--seed 0 \
--target-ball-size 30
The fixed-landmark design first selects landmarks exactly and then compares different range-query backends around the same landmark set. This separates range-query approximation errors from changes in landmark selection.
Public API
Core functions and configuration objects are exported directly from
fast_ballmapper:
FaissConfigcompute_landmarkscompute_landmarks_fpsbuild_coverbuild_mappercompute_edge_overlapscolor_by_functioncolor_by_modecolor_by_entropycolor_by_sizecolor_by_density
Plotting functions live in their optional submodules so importing the core package does not import Matplotlib or Plotly.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fast_ballmapper-0.0.1.tar.gz.
File metadata
- Download URL: fast_ballmapper-0.0.1.tar.gz
- Upload date:
- Size: 26.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b7bfb7efa5343d798b409cb75cfcab7d6aa31611aa18dd09f51e18c983ea7c2
|
|
| MD5 |
73a909786479f46f0f49d4845e71a04f
|
|
| BLAKE2b-256 |
fbdb3ef9bfb07a2da2a9080a50106b8303dd2db234c554c59599047268bf3941
|
File details
Details for the file fast_ballmapper-0.0.1-py3-none-any.whl.
File metadata
- Download URL: fast_ballmapper-0.0.1-py3-none-any.whl
- Upload date:
- Size: 21.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
547c64308d5b5619f5c80813e86d726a492e71ccdfd338804275990ead57d4d3
|
|
| MD5 |
09b8bdaa182f2c51d59335f9c3dde8f6
|
|
| BLAKE2b-256 |
9c2622d0bf999f33c71b3990fad3585387302fd6ba01d606bf7a7258a8ae67ac
|