Query S3 files with SQL — no database, no pipeline, no infrastructure.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Patrickroy

These details have not been verified by PyPI

Project description

s3explore

Query S3 files with SQL — no database, no pipeline, no infrastructure.

s3explore wraps chDB (ClickHouse's embedded Python engine) and boto3 to let you run SQL directly against files sitting in S3. Drop s3explore.py next to your notebook or run it from the terminal.

Works as a CLI, a Jupyter notebook library, and produces structured JSON output for piping into LLMs like Claude Code.

Prerequisites

Python 3.9+
AWS CLI with SSO configured (aws configure sso) — or any credentials boto3 can resolve

Installation

pip install s3explore

Or directly from GitHub:

pip install git+https://github.com/PatrickRoyMac/s3_data_explorer.git

Try it now — no AWS account needed

Query a public dataset (Amazon product reviews, ~150M rows of Parquet on S3) straight away:

# Schema — what columns are in these files?
s3explore schema "s3://datasets-documentation/amazon_reviews/*.parquet"

# Count rows per file
s3explore count "s3://datasets-documentation/amazon_reviews/*.parquet"

# Sample 5 rows
s3explore sample --rows 5 "s3://datasets-documentation/amazon_reviews/*.parquet"

# Run your own SQL
s3explore query "s3://datasets-documentation/amazon_reviews/*.parquet" \
  --sql "SELECT product_category, avg(star_rating) AS avg_stars, count() AS reviews
         FROM {table}
         GROUP BY product_category
         ORDER BY reviews DESC
         LIMIT 10"

No --profile flag needed — public buckets are accessed anonymously.

Quickstart

# See what's in a bucket
s3explore --profile my-profile ls s3://my-bucket/events/year=2025/

# Understand the schema
s3explore --profile my-profile schema s3://my-bucket/events/year=2025/*.parquet

# Count rows across files
s3explore --profile my-profile count s3://my-bucket/events/year=2025/*.parquet

# Sample 10 rows
s3explore --profile my-profile sample s3://my-bucket/events/year=2025/*.parquet

# Run your own SQL
s3explore --profile my-profile query s3://my-bucket/events/year=2025/*.parquet \
  --sql "SELECT event_type, count() AS n FROM {table} GROUP BY event_type ORDER BY n DESC"

Commands

s3explore [--profile PROFILE] [--format table|json|csv] COMMAND S3_PATH [OPTIONS]

Command	What it does	Key options
`ls`	List files at an S3 prefix (boto3)
`schema`	Show column names and types	`--fmt`
`sample`	Show N sample rows	`--rows N`, `--fmt`
`count`	Count rows per file	`--fmt`
`query`	Run custom SQL (use `{table}` placeholder)	`--sql`, `--fmt`

Output formats

Flag	Output	Use for
`--format table`	Pretty table	Human reading (default)
`--format json`	One JSON object/line	LLMs, pipes, scripts
`--format csv`	CSV with headers	Export, downstream tooling

Notebook usage

Open notebook_template.ipynb, fill in the config cell, and run all cells.

import s3explore

creds = s3explore.get_credentials(profile="my-profile")

# Schema
print(s3explore.get_schema("s3://my-bucket/data/*.parquet", creds))

# Sample rows
print(s3explore.sample_rows("s3://my-bucket/data/*.parquet", creds, n=10))

# Custom query
print(s3explore.run_user_query(
    "SELECT event_type, count() AS n FROM {table} GROUP BY event_type",
    "s3://my-bucket/data/*.parquet",
    creds,
))

The {table} placeholder in your SQL is replaced with the full s3(...) call at runtime — you never need to handle credentials in your SQL strings.

Supported file formats

Auto-detected from the file extension in the S3 path:

Extension	Format
`.parquet`	Parquet
`.json` / `.jsonl`	JSONEachRow
`.json.gz`	JSONEachRow (auto-decompressed)
`.csv`	CSVWithNames
`.tsv`	TabSeparatedWithNames
`.gz` (bare)	JSONEachRow (best-effort)

Override with --fmt:

s3explore schema s3://bucket/data/*.gz --fmt JSONEachRow

LLM / Claude Code usage

s3explore is designed to be consumed by command-line LLMs. Use --format json to get structured output:

# Let Claude Code explore your data
s3explore --profile my-profile --format json schema s3://bucket/data/*.parquet
s3explore --profile my-profile --format json sample s3://bucket/data/*.parquet

See CLAUDE.md for a full tool description including the recommended exploration workflow.

Troubleshooting

Credentials expired (SSO)

aws sso login --profile my-profile

Format not detected Add --fmt with the explicit format name: Parquet, JSONEachRow, CSVWithNames.

Bare .gz files (e.g. Kinesis Firehose output) These have no inner extension hint. s3explore defaults to JSONEachRow with a warning. Override: --fmt JSONEachRow.

How it works

boto3 resolves AWS SSO credentials from your named profile
boto3 lists files via list_objects_v2 for the ls command
chDB builds and executes a SELECT ... FROM s3('path', creds, 'Format') query in-process — no network call to any database, no cluster, no cost beyond S3 GET requests

Dependencies

chdb>=2.0.2      # ClickHouse embedded engine
boto3>=1.34.0    # AWS credential resolution + S3 listing
click>=8.1.0     # CLI
pandas>=2.0.0    # CSV export in the notebook template

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Patrickroy

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3explore-0.1.0.tar.gz (8.3 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

s3explore-0.1.0-py3-none-any.whl (8.8 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file s3explore-0.1.0.tar.gz.

File metadata

Download URL: s3explore-0.1.0.tar.gz
Upload date: Apr 13, 2026
Size: 8.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for s3explore-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3ad4f4050f628572d6bc49d337774564f87b1205edb2b5e90e572403844c6447`
MD5	`7da45b417e75fda174d7229e6e36f00c`
BLAKE2b-256	`53d756d5017e5e0c1816daaad9b896015f6f18f35d287feb8d7076b9d59004a4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for s3explore-0.1.0.tar.gz:

Publisher: publish.yml on PatrickRoyMac/s3_data_explorer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: s3explore-0.1.0.tar.gz
- Subject digest: 3ad4f4050f628572d6bc49d337774564f87b1205edb2b5e90e572403844c6447
- Sigstore transparency entry: 1282937958
- Sigstore integration time: Apr 13, 2026
Source repository:
- Permalink: PatrickRoyMac/s3_data_explorer@249ad2662941c38d1190846f0defac50805421d5
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/PatrickRoyMac
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@249ad2662941c38d1190846f0defac50805421d5
- Trigger Event: release

File details

Details for the file s3explore-0.1.0-py3-none-any.whl.

File metadata

Download URL: s3explore-0.1.0-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 8.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for s3explore-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2358b3b42e304b9a905444ea0b962e307f760e96a9afcea8c01a1c300fb6da6b`
MD5	`69d30783efbf91f15ffdd77ff1ae41d3`
BLAKE2b-256	`c28b9dd8ebb6c96538df010545808298d4daf533abb165e947d8ae8ee371e3ef`

See more details on using hashes here.

Provenance

The following attestation bundles were made for s3explore-0.1.0-py3-none-any.whl:

Publisher: publish.yml on PatrickRoyMac/s3_data_explorer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: s3explore-0.1.0-py3-none-any.whl
- Subject digest: 2358b3b42e304b9a905444ea0b962e307f760e96a9afcea8c01a1c300fb6da6b
- Sigstore transparency entry: 1282937965
- Sigstore integration time: Apr 13, 2026
Source repository:
- Permalink: PatrickRoyMac/s3_data_explorer@249ad2662941c38d1190846f0defac50805421d5
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/PatrickRoyMac
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@249ad2662941c38d1190846f0defac50805421d5
- Trigger Event: release

s3explore 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

s3explore

Prerequisites

Installation

Try it now — no AWS account needed

Quickstart

Commands

Output formats

Notebook usage

Supported file formats

LLM / Claude Code usage

Troubleshooting

How it works

Dependencies

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance