Scalable engine to ingest, process, and query large datasets—transactions, logs, events, and analytics—directly from flat files.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

judotens

These details have not been verified by PyPI

Project links

Project description

Flatseek

Full text search over the flat files. An Elasticsearch alternatives. Search billions of rows from CSV / JSON / XLSX without 24/7 machine.

Demo: flatlens.demo.flatseek.io · Docs: flatseek.io/docs

Why Flatseek?

Most teams don't actually need a 24/7 search cluster. They have CSVs of customer data, log archives, or scraped catalogs that need occasional querying and full text searching — and roughly nothing the rest of the time.

Use case	Example query
Blockchain / DeFi	`program:raydium AND signer:7xMg AND amount:>1000000`
DevOps / SRE	`level:ERROR AND service:api-gateway AND region:us-east1`
Social / CRM	`platform:twitter AND lang:id AND sentiment:negative AND retweets:>100`
Aviation / ADS-B	`callsign:GARUDA* AND origin:WIII AND altitude:>30000`
AdTech / DSP	`campaign:promo AND country:ID AND bid:>50 AND status:active`

No JVM. No heap tuning. No cluster setup. Disk-first, single Python process.

Quick Start

# 1. Install (CLI + Flatlens dashboard)
curl -fsSL flatseek.io/install.sh | sh

# 2. Build an index from any CSV
flatseek build ./data/solana_txs.csv -o ./data

# 3. Serve API + dashboard
flatseek serve -d ./data
# → API:        http://localhost:8000
# → Dashboard:  http://localhost:8000/dashboard

# 4. Query
flatseek search ./data/solana_txs "program:raydium AND amount:>1000000"

That's it. No config files, no cluster.

Install

One-line script — recommended

curl -fsSL flatseek.io/install.sh | sh

Installs both the flatseek CLI and the Flatlens dashboard to ~/.local/share/flatlens.

From PyPI (CLI + Python client only)

pip install flatseek

The dashboard must be installed separately if you want it:

git clone https://github.com/flatseek/flatlens.git ~/.local/share/flatlens

From source

git clone https://github.com/flatseek/flatseek.git
cd flatseek
pip install -e .

Verify

flatseek --help

Requirements: Python ≥ 3.10. macOS, Linux, WSL.

Step 1 — Index your data

Flatseek indexes any of these formats (up to 500 MB per file via dashboard): CSV, JSON, JSONL, XLS, XLSX.

Option A — Dashboard (newbie-friendly)

flatseek serve -d ./data

Open http://localhost:8000/dashboard, click + Upload, and walk through the four-step wizard:

Drop — drag a file or paste a URL.
Preview — Flatseek auto-detects column types and shows the first 5 rows.
Configure — name the index, optionally encrypt it, pick a batch size.
Ingest — streamed progress with elapsed time and ETA.

Option B — CLI (advanced)

# Single-worker build (small files)
flatseek build ./data/solana_txs.csv -o ./data

# Parallel build with 8 workers (auto-splits by byte range)
flatseek build ./data/solana_txs.csv -o ./data -w 8

# Inspect the build plan without running it
flatseek plan ./data/solana_txs.csv -o ./data

# Detect column types only (writes column_map.json, no index)
flatseek classify ./data/solana_txs.csv

# Pre-flight estimate — sample 5K rows for ETA + storage projection
flatseek build ./data/large.csv -o ./data --estimate

The result lands at ./data/solana_txs/ — that subdirectory is the index. Multiple indices live as siblings under ./data/.

Column types

Auto-detected at index time. Override via column_map.json or the dashboard's mapping step.

Type	Indexed as	Query style	Use for
`TEXT`	Trigrams + tokenization	Wildcard, contains	Free text, descriptions
`KEYWORD`	Exact value	Equals, terms aggregation	Tags, status, IDs
`DATE`	ISO date (YYYYMMDD)	Range, date_histogram	Timestamps
`FLOAT` / `INT`	Numeric value	Range, stats aggregation	Price, lat/lng, counts
`BOOL`	Boolean	Equals	Flags
`ARRAY`	JSON array, element-wise	Contains	tags, categories
`OBJECT`	Dot-path expansion	`address.city:Jakarta`	Nested records

Sample datasets — with arrays and nested objects

Two reference schemas that exercise every column type. CSVs matching these shapes work end-to-end with the queries in the rest of this README.

Blockchain — `programs` is `ARRAY`, `swap` is `OBJECT`

signature, slot,      timestamp,            fee,  status,  signer,    programs,                       swap
5xJ3k...,  250000000, 2026-01-01T00:00:00Z, 5000, success, 7xMg3...,  ["raydium","jupiter"],          {"in":"USDC","out":"SOL","amount":2500000}
3Ab2k...,  250000001, 2026-01-01T00:01:00Z, 5000, success, 9kL2n...,  ["raydium"],                    {"in":"USDT","out":"BONK","amount":1500000}
1CdefG..., 250000002, 2026-01-01T00:02:00Z, 5000, failed,  9kL2n...,  ["jupiter","pump.fun","meteora"],   {"in":"SOL","out":"USDC","amount":3200000}

Queries that exploit the structure:

programs:jupiter                                    # array element match
programs:raydium AND swap.amount:>1000000           # array + nested numeric
swap.in:USDC OR swap.out:USDC                       # nested OR
status:success AND programs:pump.fun                    # element-wise AND

Locations — `tags` is `ARRAY`, `address` is `OBJECT`

id, name,    country, lat,     lng,      tags,                                 address
1,  Depok,   ID,      -6.4413, 106.8183, ["urban","metro","west-java"],        {"province":"West Java","district":"Beji","postal":"16412"}
2,  Bandung, ID,      -6.9175, 107.6191, ["highland","tourism","west-java"],   {"province":"West Java","district":"Coblong","postal":"40132"}
3,  Bali,    ID,      -8.4095, 115.1889, ["island","tourism","beach"],         {"province":"Bali","district":"Denpasar","postal":"80111"}

Queries:

tags:tourism                                        # any doc tagged "tourism"
tags:metro AND address.province:"West Java"        # array + nested string
address.postal:[10000 TO 50000]                    # nested numeric range
* AND NOT tags:beach                               # exclusions on array

Step 2 — Query

Same query language across CLI, REST API, Python client, and dashboard.

CLI

flatseek search ./data/solana_txs "program:raydium AND signer:*7xMg* AND amount:>1000000"
flatseek search ./data/solana_txs "fee:[4000 TO 6000]" -n 50
flatseek search ./data/solana_txs "*" --and status:success --and program:jupiter

REST API

flatseek api -d ./data        # API only, no dashboard
# → http://localhost:8000

# GET — query string
curl "http://localhost:8000/solana_txs/_search?q=program:raydium AND amount:>1000000&size=10"

# POST — JSON body
curl -X POST http://localhost:8000/solana_txs/_search \
  -H "Content-Type: application/json" \
  -d '{"query": "program:raydium AND amount:>1000000", "size": 20, "from": 0}'

Interactive docs: /docs (Swagger UI), /redoc (ReDoc).

Python — API mode

from flatseek import Flatseek

client = Flatseek("http://localhost:8000")

result = client.search(
    index="solana_txs",
    q="program:raydium AND signer:*7xMg* AND amount:>1000000",
    size=20,
)
print(f"Found {result.total} matches")
for doc in result.docs:
    print(doc["signature"], doc["status"], doc["fee"])

Python — direct mode (no server)

from flatseek import Flatseek

# Open an index directory directly — same API as remote mode
qe = Flatseek("./data/solana_txs")

result = qe.search(q="program:raydium AND amount:>1000000", size=10)
print(result.total, "matches")

Step 3 — Aggregate

Run aggregations against the same on-disk index used for search. Results don't load documents into RAM.

REST API

# Top programs by transaction count + fee stats + monthly histogram
curl -X POST http://localhost:8000/solana_txs/_aggregate \
  -H "Content-Type: application/json" \
  -d '{
    "query": "*",
    "aggs": {
      "by_program":  {"terms":          {"field": "program",   "size": 10}},
      "fee_stats":   {"stats":          {"field": "fee"}},
      "by_month":    {"date_histogram": {"field": "timestamp", "interval": "month"}}
    }
  }'

Python

result = client.aggregate(
    index="solana_txs",
    body={
        "query": "status:success",
        "aggs": {
            "by_program":     {"terms":       {"field": "program", "size": 10}},
            "fee_stats":      {"stats":       {"field": "fee"}},
            "unique_signers": {"cardinality": {"field": "signer"}},
        },
    },
)
print(result.aggs["by_program"]["buckets"])

Aggregation types

Type	Output	Notes
`terms`	`buckets[{key, doc_count}]`	Works on `KEYWORD` and `ARRAY` fields
`stats`	`{count, min, max, sum, avg}`	Single pass over a numeric field
`avg` / `min` / `max` / `sum`	`{value}`	Faster than `stats` when you need just one
`cardinality`	`{value}`	HyperLogLog; ~1% relative error
`date_histogram`	`buckets[{key, doc_count}]`	Intervals: `hour`, `day`, `week`, `month`

Query syntax reference

Pattern	Example	Description
`field:value`	`city:Jakarta`	Exact on `KEYWORD`, free-text on `TEXT`
`field:term`	`name:john`	Wildcard prefix / suffix / substring (trigram-backed)
`field:[A TO B]`	`amount:[1000000 TO 5000000]`	Inclusive range on numeric or date fields
`field:>N`, `field:>=N`	`score:>=88`, `level:<=10`	Open-ended numeric comparators
`field:true` / `false`	`enabled:true`	Booleans (also accepts `yes/no`, `1/0`)
`tags:value`	`tags:graphql`	Array fields — matches if any element equals
`obj.path:value`	`address.city:Jakarta`	Auto-expanded object dot-path (any depth)
`obj.arr[i]:val`	`info.tags[0]:alpha`	Index into a nested array
`AND` / `OR` / `NOT`	`level:ERROR AND NOT region:eu-west1`	Boolean combinators with parens
`*`	`* AND NOT enabled:true`	Match-all, useful for negation
`\<char\>`	`john\+doe`	Escape: `+ - && \|\| ! ( ) { } [ ] ^ " ~ * ? : \ /`

Numeric and date queries walk a sorted block index — no document scan. Wildcard queries route through trigram postings — same shape as a literal lookup.

CLI reference

flatseek <command> --help    # detailed options

Command	Description
`flatseek build <file\|dir> [-o out] [-w N]`	Build trigram index. `-w` for parallel workers, `--daemon` for RAM-mode builds.
`flatseek classify <file>`	Detect column semantic types → `column_map.json`.
`flatseek plan <file> [-o out] [-n N]`	Generate a parallel build plan without executing.
`flatseek serve [-d data] [-p 8000]`	Start API + Flatlens dashboard.
`flatseek api [-d data] [-p 8000]`	Start API only (no dashboard).
`flatseek search <dir> <query>`	CLI search. `-c col` to scope, `-n` page size, `-p` page.
`flatseek stats <dir>`	Doc count, index size, columns, types.
`flatseek dedup <dir> [--fields col1,col2]`	Cross-shard dedup after parallel build.
`flatseek compress <dir> [-l 1-9]`	zlib-compress index files in place.
`flatseek encrypt <dir> [--passphrase P]`	ChaCha20-Poly1305 encryption at rest.
`flatseek decrypt <dir> [--passphrase P]`	Reverse of encrypt.
`flatseek delete <dir> [--yes]`	Fast parallel-unlink delete.
`flatseek join <dir> "<qA>" "<qB>" --on field`	Cross-dataset join on a shared field.
`flatseek chat <dir> [--model NAME]`	Natural-language query via Ollama.

REST API endpoints

Method	Endpoint	Description
`GET`	`/_indices`	List all indices
`GET`	`/_cluster/health`	Health check
`GET`	`/{index}/_search?q=&size=&from=`	Search (query string)
`POST`	`/{index}/_search`	Search (JSON body)
`GET`	`/{index}/_count?q=`	Match count without docs
`POST`	`/{index}/_aggregate`	Run aggregations
`POST`	`/{index}/_bulk`	Bulk index documents
`GET`	`/{index}/_stats`	Doc count, index size
`GET`	`/{index}/_mapping`	Column types
`POST`	`/{index}/_encrypt`	Encrypt at rest
`POST`	`/{index}/_decrypt`	Decrypt
`DELETE`	`/{index}`	Delete index

Encrypted indexes: pass the passphrase via header X-Index-Password: <pass>.

Architecture

Flatseek stores a trigram inverted index on disk using memory-mapped files.

data/
  solana_txs/
    index/           # Trigram postings (mmap'd, block-compressed)
    docs/            # Column-oriented document store
    stats.json       # Doc count, index size
    mapping.json     # Type per column

Why trigram? To match *john*, Flatseek extracts trigrams (joh, ohn) from the query, then intersects their posting lists. Postings are sorted, compressed, and skip-pointer enabled — wildcard queries cost the same shape as a literal lookup, and resident memory stays low regardless of index size.

Deployment

Docker

docker build -t flatseek .
docker run -v /path/to/data:/data -p 8000:8000 flatseek serve -d /data

Vercel (dashboard front-end only)

cd flatlens && vercel --prod

The Flatseek API still runs on your own host; Vercel hosts the static dashboard pointed at it.

Performance Benchmarks

Full benchmark suite: flatbench/ — compares flatseek vs Elasticsearch on 100K-row article schema with verified correctness.

Highlights (100K rows, article schema)

Operation	flatseek	Elasticsearch	Winner
Build index	155,965 ms / 641 rows/s	17,100 ms / 5,848 rows/s	ES (9x faster)
Search p50	2.31 ms	7.60 ms	flatseek (3x faster)
Range query	42.9 ms	45.3 ms	flatseek
Aggregation	300–700 ms	5–50 ms	ES

See flatbench/README.md for full results including wildcard search, per-query breakdown, and correctness verification.

Contributing

Issues and PRs welcome at github.com/flatseek/flatseek.

Run the test suites:

# Search accuracy — ~110 cases covering wildcards, ranges, booleans, arrays, nested objects, aggs
pytest src/flatseek/test/test_search.py -v

# API smoke tests
pytest src/flatseek/test/test_api.py

# CLI integration tests
pytest src/flatseek/test/test_cli.py

# Legacy interactive query script
python test.py

TODO

sort — result ranking / scoring
geo near search — distance-based queries
type: geo shape — polygon search for geographic regions
opsi tmpfs — RAM-backed index mode for in-memory search
paralel query — concurrent query execution
vector search — dense/sparse embedding similarity
grafana connector — metrics export to Grafana
s3 upload / search — cloud object storage integration
report, predefined dashboard — custom query + metric dashboards
shareable link — public dashboard sharing
auth, user leveling — per-user index/report permissions

License

Apache License 2.0. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

judotens

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Apr 28, 2026

0.1.0

Apr 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flatseek-0.1.1.tar.gz (148.2 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

flatseek-0.1.1-py3-none-any.whl (151.2 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file flatseek-0.1.1.tar.gz.

File metadata

Download URL: flatseek-0.1.1.tar.gz
Upload date: Apr 28, 2026
Size: 148.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flatseek-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`ff564c319abf3cf31f754c18410082cbf1a1740081933ac513712af251d71699`
MD5	`b042afcb33818086738f82da339a3982`
BLAKE2b-256	`0cc7bfe1fbba0e4ddd22620190b99047a4ed7faec82a3da01518acaf2ec0ed7a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for flatseek-0.1.1.tar.gz:

Publisher: publish.yml on flatseek/flatseek

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: flatseek-0.1.1.tar.gz
- Subject digest: ff564c319abf3cf31f754c18410082cbf1a1740081933ac513712af251d71699
- Sigstore transparency entry: 1397345900
- Sigstore integration time: Apr 28, 2026
Source repository:
- Permalink: flatseek/flatseek@7461f6138b6cd0f7c0aa064b03def0a82e46c75e
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/flatseek
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7461f6138b6cd0f7c0aa064b03def0a82e46c75e
- Trigger Event: release

File details

Details for the file flatseek-0.1.1-py3-none-any.whl.

File metadata

Download URL: flatseek-0.1.1-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 151.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flatseek-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cddaccc8e8b0cd02ed00eae34068db1d57e753a601a6c3a6486eff859c8f46c5`
MD5	`da1a9231b8a0164b545b814a9aa2a13a`
BLAKE2b-256	`5b9a327b40d7f71f647fc0f1d691451988baf9be129156979b882a6ba7002b85`

See more details on using hashes here.

Provenance

The following attestation bundles were made for flatseek-0.1.1-py3-none-any.whl:

Publisher: publish.yml on flatseek/flatseek

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: flatseek-0.1.1-py3-none-any.whl
- Subject digest: cddaccc8e8b0cd02ed00eae34068db1d57e753a601a6c3a6486eff859c8f46c5
- Sigstore transparency entry: 1397345912
- Sigstore integration time: Apr 28, 2026
Source repository:
- Permalink: flatseek/flatseek@7461f6138b6cd0f7c0aa064b03def0a82e46c75e
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/flatseek
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7461f6138b6cd0f7c0aa064b03def0a82e46c75e
- Trigger Event: release

flatseek 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Project description

Flatseek

Why Flatseek?

Quick Start

Install

One-line script — recommended

From PyPI (CLI + Python client only)

From source

Verify

Step 1 — Index your data

Option A — Dashboard (newbie-friendly)

Option B — CLI (advanced)

Column types

Sample datasets — with arrays and nested objects

Blockchain — programs is ARRAY, swap is OBJECT

Locations — tags is ARRAY, address is OBJECT

Step 2 — Query

CLI

REST API

Python — API mode

Python — direct mode (no server)

Step 3 — Aggregate

REST API

Python

Aggregation types

Query syntax reference

CLI reference

REST API endpoints

Architecture

Deployment

Docker

Vercel (dashboard front-end only)

Performance Benchmarks

Highlights (100K rows, article schema)

Contributing

TODO

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Blockchain — `programs` is `ARRAY`, `swap` is `OBJECT`

Locations — `tags` is `ARRAY`, `address` is `OBJECT`