Skip to main content

Benchmark suite for Flatseek and competitor search engines (Elasticsearch, SQLite, Typesense, ZincSearch, Tantivy, Whoosh, DuckDB).

Project description

Flatbench

Flatbench

Search engine benchmark suite — compare Flatseek against Elasticsearch, tantivy, Typesense, Whoosh, ZincSearch, SQLite, and DuckDB.

PyPI Version Test License

Benchmarks: build speed, search latency, wildcard, range queries, and aggregations. Results saved as JSON + Markdown to ./output/.


Install

pip install flatbench

Requires Python ≥ 3.10, Docker (for full engine comparison).


Quick Start

1. Start all search engines (Docker)

flatbench make up

Starts: Flatseek API (port 8000), Elasticsearch (9200), Typesense (8108), ZincSearch (4080).

2. Generate a dataset

flatbench generate -s article -r 500000 -o ./data/article.csv

3. Run benchmark comparison

flatbench compare -e flatseek_cli,elasticsearch,tantivy,typesense,whoosh,zincsearch -s 500000

Results → output/benchmark_YYYYMMDD_HHMMSS.json + .md.


CLI Reference

Commands

Command Description
flatbench generate Generate synthetic dataset
flatbench compare Compare multiple engines
flatbench run Benchmark single engine
flatbench serve Serve report viewer locally
flatbench make Run infrastructure Makefile targets

Generate

flatbench generate --schema <schema> --rows <N> --output <path> [--format csv|jsonl]

Schemas: standard, ecommerce, logs, nested, sparse, article, adsb, campaign, devops, sosmed, blockchain

Compare

flatbench compare --engines <engines> --sizes <sizes> [options]

Options:

Flag Description Default
--schema Data schema standard
--workers, -w Parallel index workers 1
--format csv or jsonl csv
--source, -S Use existing CSV/JSONL instead of generating
--mode, -m normal (disk) or tmpfs (RAM) normal
--cache-dir, -c Cache generated data for reuse
--skip-build Skip build (use existing index)
--serve After compare completes, build site and serve report

Engines: flatseek, flatseek_cli, elasticsearch, tantivy, typesense, whoosh, zincsearch, sqlite, duckdb

Sizes: multiple sizes supported, e.g. --sizes 1000 10000 500000

Run

flatbench run --engine <engine> --data <path> --index-dir <path> [-o output] [--iterations N]

Serve

flatbench serve [--dir ./output] [--port 8080]

Opens the report viewer in your browser automatically.

Make (Infrastructure)

flatbench make <targets...>     # Run Makefile targets (default: help)
flatbench make up               # Start all services (docker-compose up -d)
flatbench make down             # Stop services (keep volumes)
flatbench make clean             # Stop and remove volumes
flatbench make status            # Show service status
flatbench make logs              # View logs (follow mode)
flatbench make benchmark NROWS=500000   # Run benchmark via Make

Service management:

Target Description
up/down/clean/status/logs Docker compose lifecycle
fs-health/fs-stats/fs-create/fs-delete Flatseek API (port 8000)
es-health/es-stats/es-create/es-delete Elasticsearch (port 9200)
ts-health/ts-stats/ts-create/ts-delete Typesense (port 8108)
zs-health/zs-stats/zs-create/zs-delete ZincSearch (port 4080)

Examples

# Generate article dataset (500K rows)
flatbench generate -s article -r 500000 -o ./data/article.csv

# Compare at single scale
flatbench compare -e flatseek_cli,elasticsearch -s 500000

# Compare at multiple scales
flatbench compare -e flatseek,tantivy -s 1000 10000 500000

# Use existing CSV (reuse generated data)
flatbench compare -e flatseek,elasticsearch -s 500000 -S ./data/article.csv

# RAM-backed index (tmpfs mode, faster builds)
flatbench compare -e flatseek,tantivy -s 500000 -m tmpfs

# Compare and auto-serve report
flatbench compare -e flatseek,tantivy -s 500000 --serve

# Run benchmark via Make
flatbench make benchmark NROWS=500000 ENGINES="flatseek_cli,elasticsearch,tantivy"

Service URLs:

Service URL
Flatseek API http://localhost:8000
Elasticsearch http://localhost:9200
Typesense http://localhost:8108
ZincSearch http://localhost:4080
Kibana http://localhost:5601 (dev profile)

Available Schemas

Schema Fields Description
article 8 Blog articles: id, title, content, tags, views, published_at, author
standard 12 Generic: id, name, email, phone, city, country, status, balance, created_at, updated_at, is_verified, tags
ecommerce 12 Order tracking data
logs 11 Log entries: timestamp, level, service, message, etc.
nested 6 Complex nested JSON objects
sosmed 9 Social media posts
devops 11 Infrastructure/monitoring data
adsb 10 Flight tracking data
campaign 10 Marketing campaign data
blockchain 9 Blockchain transaction data

Benchmark Operations

Operation Description Metrics
build_index Bulk API indexing (1000 rows/batch) duration_ms, rows/sec, index_size_mb
search Full-text query p50_ms, p95_ms, p99_ms, ops/sec
wildcard_search Prefix/suffix wildcard queries p50_ms, p95_ms, ops/sec
range_query Numeric/date range filtering duration_ms, hits, ops/sec
aggregate Terms/stats aggregations duration_ms, bucket_count, ops/sec

Output

Results written to ./output/ with timestamps:

output/
├── benchmark_20260501_142947.json   # Full structured results
├── benchmark_20260501_142947.md     # Markdown summary
└── index.json                        # Report manifest (for web viewer)

Report Viewer

Live: bench.flatseek.io — hosted Flatbench report viewer.

Local: Run flatbench serve --port 8080 or open report_viewer.html directly in browser.

Flatbench Report Viewer


Build Static Site

Build output directory as a static site (for self-hosted or Vercel deploy):

make build
# or
bash build.sh

Output → public/ directory with index.html, output/*.json, output/*.md.

Deploy to Vercel

make deploy        # Deploy to production (flatbench.vercel.app)
make deploy-preview  # Deploy preview build

Project Structure

flatbench/
├── Dockerfile              # Flatseek API server container
├── docker-compose.yml       # All engine containers
├── Makefile                 # Infrastructure + build commands
├── build.sh                 # Static site build script
├── report_viewer.html       # Web UI for browsing results
├── pyproject.toml           # flatbench package definition
├── src/flatbench/
│   ├── cli.py               # CLI entry point
│   ├── benchmarks/           # Benchmark orchestration + report generation
│   ├── generators/           # Synthetic data generators (schema-aware)
│   ├── runners/              # Engine runners (HTTP API / CLI)
│   │   ├── flatseek_api.py   # Flatseek HTTP API runner
│   │   ├── flatseek_cli.py   # Flatseek CLI runner
│   │   ├── elasticsearch.py   # Elasticsearch runner
│   │   ├── tantivy.py        # tantivy (Rust) runner
│   │   ├── typesense.py      # Typesense runner
│   │   ├── whoosh.py         # Whoosh runner
│   │   ├── zincsearch.py     # ZincSearch runner
│   │   ├── sqlite.py         # SQLite FTS5 runner
│   │   └── duckdb.py         # DuckDB full-text runner
│   └── output/               # Benchmark results (JSON + Markdown)

Adding a New Engine

from flatbench.runners import BaseRunner, BenchmarkResult, register_engine

@register_engine("myengine")
class MyEngineRunner(BaseRunner):
    name = "myengine"
    supports_aggregate = False
    supports_range_query = True
    supports_wildcard = True

    def build_index(self, data_path: str, **kwargs) -> BenchmarkResult:
        # Bulk API indexing logic
        pass

    def search(self, query: str, iterations: int = 10, **kwargs) -> BenchmarkResult:
        # Search via HTTP API
        pass

Then add to --engines list: --engines flatseek,myengine,...


Benchmark Results (Latest: 500K rows, article schema)

Latest Full results: bench.flatseek.io

Overall Score (60% speed · 40% correctness)

Engine Speed Correctness Score
Flatseek 🟢 🟢 0.878
typesense 🟢 🟢 0.832
zincsearch 🟢 🟢 0.823
elasticsearch 🟢 🟢 0.820
tantivy 🟢 🔴 0.650
whoosh 🔴 🔴 0.025

Key Takeaways

  • Correctness matters: Flatseek is the only engine with zero correctness errors. Tantivy misses 99.4% of range query hits.
  • Search: Tantivy fastest (0.7ms p50), but wrong. Flatseek second-fastest correct (7.9ms).
  • Build: Tantivy wins (21s for 500K), but Flatseek build is reasonable (217s).
  • Aggregation: Competitors (ES, tantivy) are 20–300× faster — Flatseek aggregation is a known weakness.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flatbench-0.1.1.tar.gz (59.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flatbench-0.1.1-py3-none-any.whl (70.6 kB view details)

Uploaded Python 3

File details

Details for the file flatbench-0.1.1.tar.gz.

File metadata

  • Download URL: flatbench-0.1.1.tar.gz
  • Upload date:
  • Size: 59.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flatbench-0.1.1.tar.gz
Algorithm Hash digest
SHA256 312a3c4e42b110bc560bb25e543373a5497806398b68945d1b07ed4e6802a331
MD5 9286c5805ec7a22a208cf22b5c79fb15
BLAKE2b-256 75dca834ba2b1a7b271e064ccb20abf7f13edf6809123d2830258b0ef07184c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for flatbench-0.1.1.tar.gz:

Publisher: publish.yml on flatseek/flatbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flatbench-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: flatbench-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 70.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flatbench-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8d32fce3bbacab408eb77ffc0cef684c116582ff8f049304cc7045bce4e425ab
MD5 8cab36c1c5b69d0b6e536d0d035e0e4c
BLAKE2b-256 6678ad24956bb236aab00281ac4f31d8c610297ece5dad9f492a8f708ce987e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for flatbench-0.1.1-py3-none-any.whl:

Publisher: publish.yml on flatseek/flatbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page