Skip to main content

Scalable engine to ingest, process, and query large datasets—transactions, logs, events, and analytics—directly from flat files.

Project description

Logo

Flatseek

The Elasticsearch alternative for flat files. Search blockchain transactions, logs, CSVs — without provisioning a cluster.

Python License Tests PyPI version

Demo: https://flatlens.demo.flatseek.io  ·  Docs: https://flatseek.io  ·  Author: judotens@flatseek.io


Why Flatseek?

Use case Example query
Blockchain / DeFi program:raydium AND signer:*7xMg AND slot:>150000000
DevOps / SRE service:api-gateway AND level:ERROR AND timestamp:[NOW-1h TO NOW]
Social media / CRM platform:twitter AND lang:id AND sentiment:negative AND retweets:>100
Aviation / ADS-B callsign:*GARUDA* AND origin:WIII AND altitude:>30000
AdTech / DSP campaign:*promo* AND country:ID AND bid:>50 AND status:active

No JVM. No heap tuning. No cluster setup. Just pip install flatseek.


Features

  • Trigram index on disk — Memory-mapped I/O. Resident memory stays low regardless of index size.
  • Sub-second queries — Trigram postings skip lists narrow the search space fast, even on spinning disks.
  • Lucene-style query syntax — Wildcards (name:*john*), AND/OR/NOT, phrase match, field filters, ranges.
  • On-disk aggregations — Terms, stats, range, cardinality. No document loaded into heap.
  • Parallel multi-worker builds — Auto-planning, resume on interrupt, ETA display, --workers N.
  • ChaCha20-Poly1305 encryption — Passphrase-protected indices with PBKDF2 key derivation.
  • REST API — FastAPI backend with auto-generated docs at /docs and /redoc.
  • Flatlens dashboard — Upload CSV/JSON/JSONL/XLS/XLSX up to 500 MB directly from browser.
  • Dual-mode Python client — API mode (HTTP) or direct mode (local files, no server needed).

Quick Start

pip install flatseek

Build an index

flatseek build ./data/solana_txs.csv -o ./data

Supported formats: CSV, JSON, JSONL, XLS, XLSX (up to 500 MB per file).

Serve API + dashboard

flatseek serve -d ./data
# API:        http://localhost:8000
# Dashboard:  http://localhost:8000/dashboard

Search from CLI

flatseek search ./data "program:raydium AND signer:*7xMg AND amount:>1000000"

Search from Python

from flatseek import Flatseek

client = Flatseek("http://localhost:8000")
result = client.search(
    index="solana_txs",
    q="program:raydium AND signer:*7xMg AND amount:>1000000",
    size=20
)
print(result.total, "matching transactions")

Installation

Via install.sh (recommended — includes Flatlens dashboard)

curl -fsSL flatseek.io/install.sh | sh

From PyPI

pip install flatseek

From source

git clone https://github.com/flatseek/flatseek.git
cd flatseek
pip install -e .

# Clone Flatlens for the dashboard (required for flatseek serve)
git clone https://github.com/flatseek/flatlens.git

Requirements: Python >= 3.10


Running

# Serve API + dashboard
flatseek serve -d ./data
# API:        http://localhost:8000
# Dashboard:  http://localhost:8000/dashboard

# Or API only (no dashboard)
flatseek api -d ./data

CLI Reference

Command Description
flatseek build <file> [-o output] Build index from CSV/JSON/JSONL/XLS/XLSX
flatseek serve [-d data] [-p port] Start API server + Flatlens dashboard
flatseek api [-d data] [-p port] Start API server only (no dashboard)
flatseek search <path> <query> Search from CLI
flatseek stats <path> Show index statistics
flatseek classify <path> Detect column semantic types
flatseek plan <path> [-n workers] Generate parallel build plan
flatseek encrypt <path> [--passphrase P] Encrypt index with ChaCha20-Poly1305
flatseek decrypt <path> [--passphrase P] Decrypt encrypted index
flatseek delete <path> [--yes] Delete index directory

Use flatseek <command> --help for detailed options.


API Reference

Base URL: http://localhost:8000

Core endpoints

Method Endpoint Description
GET /_cluster/health Cluster health
GET /_indices List all indices
GET /{index}/_stats Index statistics
GET /{index}/_mapping Column type mappings
POST /{index}/_search Search ({query, size, from})
GET /{index}/_search?q=&size=&from= Search via query params
POST /{index}/_aggregate Run aggregations
DELETE /{index} Delete an index

Interactive docs

  • Swagger UI: GET /docs
  • ReDoc: GET /redoc

Query Syntax

# Wildcard (contains)
campaign_name:*promo*
callsign:GARUDA*

# Field filter
level:ERROR
status:active

# Range
timestamp:[20260101 TO 20261231]
amount:>1000000

# Boolean
level:ERROR AND service:api-gateway

# Bare term (searches all columns)
jakarta garuda error

Architecture

Flatseek stores a trigram inverted index on disk using memory-mapped files.

data/
  solana_txs/
    index/           # Trigram postings (mmap-ed)
    docs/            # Column-oriented document store
    stats.json       # Doc count, index size
    mapping.json     # Semantic type per column
    manifest.json    # File manifest

Why trigram? To match *john*, Flatseek extracts trigrams (joh, ohn) from the query, then intersects their posting lists. Posting lists are sorted, compressed, and support skip-pointers.


Deployment

Vercel (recommended)

cd flatseek && vercel --prod

Docker

docker build -t flatseek .
docker run -v /path/to/data:/data -p 8000:8000 flatseek serve -d /data

Contributing

Contributions are welcome! Please open an issue or submit a pull request.


License

Apache License 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flatseek-0.1.0.tar.gz (130.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flatseek-0.1.0-py3-none-any.whl (137.1 kB view details)

Uploaded Python 3

File details

Details for the file flatseek-0.1.0.tar.gz.

File metadata

  • Download URL: flatseek-0.1.0.tar.gz
  • Upload date:
  • Size: 130.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flatseek-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eda435c68cba98aaf01dd31862b6a86eca9db543ca9dfbd6ff062c4ed698550f
MD5 0e95e1cb990a1897b42641e4fccf56c5
BLAKE2b-256 f22fbae4837daf8296f9b1cde90e2821dd615af75f7d38fa2a5daeac6f641d03

See more details on using hashes here.

Provenance

The following attestation bundles were made for flatseek-0.1.0.tar.gz:

Publisher: publish.yml on flatseek/flatseek

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flatseek-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: flatseek-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 137.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flatseek-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dac259ea3afe6fd4ec37a059b7bb08fb9b2ff6c1eabb1620d7f9aa9775f4d38d
MD5 5d7cd2234ddc7e0b0d245a6b8b7ba694
BLAKE2b-256 d9ae9a9417aa6c511000f5f000358f5dc54d39a76ec5a1bf7ccfd2959f8fd71a

See more details on using hashes here.

Provenance

The following attestation bundles were made for flatseek-0.1.0-py3-none-any.whl:

Publisher: publish.yml on flatseek/flatseek

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page