Scalable engine to ingest, process, and query large datasets—transactions, logs, events, and analytics—directly from flat files.
Project description
Flatseek
The Elasticsearch alternative for flat files. Search blockchain transactions, logs, CSVs — without provisioning a cluster.
Demo: https://flatlens.demo.flatseek.io · Docs: https://flatseek.io · Author: judotens@flatseek.io
Why Flatseek?
| Use case | Example query |
|---|---|
| Blockchain / DeFi | program:raydium AND signer:*7xMg AND slot:>150000000 |
| DevOps / SRE | service:api-gateway AND level:ERROR AND timestamp:[NOW-1h TO NOW] |
| Social media / CRM | platform:twitter AND lang:id AND sentiment:negative AND retweets:>100 |
| Aviation / ADS-B | callsign:*GARUDA* AND origin:WIII AND altitude:>30000 |
| AdTech / DSP | campaign:*promo* AND country:ID AND bid:>50 AND status:active |
No JVM. No heap tuning. No cluster setup. Just pip install flatseek.
Features
- Trigram index on disk — Memory-mapped I/O. Resident memory stays low regardless of index size.
- Sub-second queries — Trigram postings skip lists narrow the search space fast, even on spinning disks.
- Lucene-style query syntax — Wildcards (
name:*john*), AND/OR/NOT, phrase match, field filters, ranges. - On-disk aggregations — Terms, stats, range, cardinality. No document loaded into heap.
- Parallel multi-worker builds — Auto-planning, resume on interrupt, ETA display,
--workers N. - ChaCha20-Poly1305 encryption — Passphrase-protected indices with PBKDF2 key derivation.
- REST API — FastAPI backend with auto-generated docs at
/docsand/redoc. - Flatlens dashboard — Upload CSV/JSON/JSONL/XLS/XLSX up to 500 MB directly from browser.
- Dual-mode Python client — API mode (HTTP) or direct mode (local files, no server needed).
Quick Start
pip install flatseek
Build an index
flatseek build ./data/solana_txs.csv -o ./data
Supported formats: CSV, JSON, JSONL, XLS, XLSX (up to 500 MB per file).
Serve API + dashboard
flatseek serve -d ./data
# API: http://localhost:8000
# Dashboard: http://localhost:8000/dashboard
Search from CLI
flatseek search ./data "program:raydium AND signer:*7xMg AND amount:>1000000"
Search from Python
from flatseek import Flatseek
client = Flatseek("http://localhost:8000")
result = client.search(
index="solana_txs",
q="program:raydium AND signer:*7xMg AND amount:>1000000",
size=20
)
print(result.total, "matching transactions")
Installation
Via install.sh (recommended — includes Flatlens dashboard)
curl -fsSL flatseek.io/install.sh | sh
From PyPI
pip install flatseek
From source
git clone https://github.com/flatseek/flatseek.git
cd flatseek
pip install -e .
# Clone Flatlens for the dashboard (required for flatseek serve)
git clone https://github.com/flatseek/flatlens.git
Requirements: Python >= 3.10
Running
# Serve API + dashboard
flatseek serve -d ./data
# API: http://localhost:8000
# Dashboard: http://localhost:8000/dashboard
# Or API only (no dashboard)
flatseek api -d ./data
CLI Reference
| Command | Description |
|---|---|
flatseek build <file> [-o output] |
Build index from CSV/JSON/JSONL/XLS/XLSX |
flatseek serve [-d data] [-p port] |
Start API server + Flatlens dashboard |
flatseek api [-d data] [-p port] |
Start API server only (no dashboard) |
flatseek search <path> <query> |
Search from CLI |
flatseek stats <path> |
Show index statistics |
flatseek classify <path> |
Detect column semantic types |
flatseek plan <path> [-n workers] |
Generate parallel build plan |
flatseek encrypt <path> [--passphrase P] |
Encrypt index with ChaCha20-Poly1305 |
flatseek decrypt <path> [--passphrase P] |
Decrypt encrypted index |
flatseek delete <path> [--yes] |
Delete index directory |
Use flatseek <command> --help for detailed options.
API Reference
Base URL: http://localhost:8000
Core endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/_cluster/health |
Cluster health |
GET |
/_indices |
List all indices |
GET |
/{index}/_stats |
Index statistics |
GET |
/{index}/_mapping |
Column type mappings |
POST |
/{index}/_search |
Search ({query, size, from}) |
GET |
/{index}/_search?q=&size=&from= |
Search via query params |
POST |
/{index}/_aggregate |
Run aggregations |
DELETE |
/{index} |
Delete an index |
Interactive docs
- Swagger UI:
GET /docs - ReDoc:
GET /redoc
Query Syntax
# Wildcard (contains)
campaign_name:*promo*
callsign:GARUDA*
# Field filter
level:ERROR
status:active
# Range
timestamp:[20260101 TO 20261231]
amount:>1000000
# Boolean
level:ERROR AND service:api-gateway
# Bare term (searches all columns)
jakarta garuda error
Architecture
Flatseek stores a trigram inverted index on disk using memory-mapped files.
data/
solana_txs/
index/ # Trigram postings (mmap-ed)
docs/ # Column-oriented document store
stats.json # Doc count, index size
mapping.json # Semantic type per column
manifest.json # File manifest
Why trigram? To match *john*, Flatseek extracts trigrams (joh, ohn) from the query, then intersects their posting lists. Posting lists are sorted, compressed, and support skip-pointers.
Deployment
Vercel (recommended)
cd flatseek && vercel --prod
Docker
docker build -t flatseek .
docker run -v /path/to/data:/data -p 8000:8000 flatseek serve -d /data
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
License
Apache License 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flatseek-0.1.0.tar.gz.
File metadata
- Download URL: flatseek-0.1.0.tar.gz
- Upload date:
- Size: 130.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eda435c68cba98aaf01dd31862b6a86eca9db543ca9dfbd6ff062c4ed698550f
|
|
| MD5 |
0e95e1cb990a1897b42641e4fccf56c5
|
|
| BLAKE2b-256 |
f22fbae4837daf8296f9b1cde90e2821dd615af75f7d38fa2a5daeac6f641d03
|
Provenance
The following attestation bundles were made for flatseek-0.1.0.tar.gz:
Publisher:
publish.yml on flatseek/flatseek
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flatseek-0.1.0.tar.gz -
Subject digest:
eda435c68cba98aaf01dd31862b6a86eca9db543ca9dfbd6ff062c4ed698550f - Sigstore transparency entry: 1390323174
- Sigstore integration time:
-
Permalink:
flatseek/flatseek@24c56d615719ae275796a6bfba5ce7b6166a16a7 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/flatseek
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@24c56d615719ae275796a6bfba5ce7b6166a16a7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file flatseek-0.1.0-py3-none-any.whl.
File metadata
- Download URL: flatseek-0.1.0-py3-none-any.whl
- Upload date:
- Size: 137.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dac259ea3afe6fd4ec37a059b7bb08fb9b2ff6c1eabb1620d7f9aa9775f4d38d
|
|
| MD5 |
5d7cd2234ddc7e0b0d245a6b8b7ba694
|
|
| BLAKE2b-256 |
d9ae9a9417aa6c511000f5f000358f5dc54d39a76ec5a1bf7ccfd2959f8fd71a
|
Provenance
The following attestation bundles were made for flatseek-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on flatseek/flatseek
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flatseek-0.1.0-py3-none-any.whl -
Subject digest:
dac259ea3afe6fd4ec37a059b7bb08fb9b2ff6c1eabb1620d7f9aa9775f4d38d - Sigstore transparency entry: 1390323263
- Sigstore integration time:
-
Permalink:
flatseek/flatseek@24c56d615719ae275796a6bfba5ce7b6166a16a7 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/flatseek
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@24c56d615719ae275796a6bfba5ce7b6166a16a7 -
Trigger Event:
release
-
Statement type: