Terminal-first S3 browser for scientists and data engineers
Project description
s3peek
Navigate S3 buckets from terminal — instant header quicklook for FITS, ASDF, Parquet, JSON files, plus one-command pre-signed URL sharing.
Purpose
Problem: S3 CLI navigation clunky. aws s3 ls shows raw keys; inspecting FITS/ASDF requires download first; sharing files needs remembering aws s3 presign syntax and expiry flags.
Solution: s3peek — terminal-first S3 browser combining interactive bucket navigation (arrow keys, fuzzy filter) with in-place header quicklook for astronomy/data science formats, plus instant pre-signed URL generation with clipboard copy.
Who benefits: Astronomers and data engineers at IPAC, STScI, or institutions working with AWS-hosted science data (FITS, ASDF, Parquet, JSON). Zero Python import required — distributed as standalone binary or Homebrew formula.
Architecture
┌──────────────────────────────────────────────────────────┐
│ s3peek CLI │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ TUI browser │ │ Quicklook │ │ Presign cmd │ │
│ │ (Textual) │ │ engine │ │ (boto3) │ │
│ └──────┬───────┘ └──────┬───────┘ └───────┬────────┘ │
│ └─────────────────┴──────────────────┘ │
│ S3 abstraction layer │
│ (boto3 + s3fs + range-GET) │
└──────────────────────────────────────────────────────────┘
│ │
AWS S3 API Clipboard (pyperclip)
Key design decisions:
- Range-GET for headers — FITS/Parquet headers read via HTTP
Rangerequests (first N bytes only, configurable viamax_range_get_bytes). No full download. - ASDF/FITS dual-mode read — fast path: raw byte parse of first N KB (no heavy lib, zero overhead). Deep-inspect path (
--deep):SeekableS3Streamissues Range-GETs on demand as astropy/asdf seek through the file — full multi-HDU / full ASDF tree, no full download, works on arbitrarily large files. - SeekableS3Stream — file-like object backed by S3 Range-GETs with a 256 KB chunk cache. Passed directly to
astropy.io.fits.open()/asdf.open()so libraries only fetch the bytes they actually read. - No local state — no DB, no cache file. All navigation state in-memory per session.
- AWS credentials pass-through — standard boto3 credential chain (
~/.aws, env vars, instance profile). No credential storage.
Repository Layout
s3peek/
├── README.md # This file — spec + public docs
├── pyproject.toml # Build config; entry_points for CLI
├── Makefile # Dev commands: lint, test, build, brew-test
├── Formula/
│ └── s3peek.rb # Homebrew formula (auto-generated by release CI)
├── src/
│ └── s3peek/
│ ├── __init__.py
│ ├── cli.py # Typer app: entry point, top-level commands
│ ├── browser.py # Textual TUI: bucket/prefix navigation widget
│ ├── quicklook.py # Format dispatch; accepts bytes or seekable stream
│ ├── streams.py # SeekableS3Stream: Range-GET-backed file-like object
│ ├── presign.py # Pre-signed URL generation + clipboard copy
│ ├── s3.py # S3 abstraction: list, stat, range-GET
│ └── config.py # Config model: defaults, env var bindings
├── tests/
│ ├── conftest.py # moto-based S3 fixtures; sample test files
│ ├── test_quicklook.py # Format readers against fixture files
│ ├── test_presign.py # Pre-signed URL generation (moto)
│ ├── test_s3.py # S3 abstraction layer (moto)
│ └── test_cli.py # CLI smoke tests via Typer test runner
├── fixtures/
│ ├── sample.fits # Minimal FITS with header only
│ ├── sample.asdf # Minimal ASDF with known tree
│ ├── sample.parquet # Minimal Parquet with schema
│ └── sample.json # Sample JSON object
├── .github/
│ └── workflows/
│ ├── ci.yml # Test + lint on push/PR
│ └── release.yml # PyPI publish + Homebrew formula bump on tag
├── .env.example # Documented env vars; never committed with values
└── CHANGELOG.md
Prerequisites
| Requirement | Version | Notes |
|---|---|---|
| Python | 3.11+ | CPython; PyPy untested |
| AWS credentials | any valid chain | ~/.aws/credentials, env vars, or instance profile |
| IAM permissions | s3:ListBucket, s3:GetObject |
s3:GetObjectAttributes for stat; no write permissions needed |
xclip or xsel (Linux) |
any | For clipboard copy; optional — URL printed to stdout if absent |
| macOS | 12+ | pbcopy built-in; no extra deps |
IAM minimum policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:ListBucket", "s3:GetObject", "s3:GetObjectAttributes"],
"Resource": ["arn:aws:s3:::YOUR_BUCKET", "arn:aws:s3:::YOUR_BUCKET/*"]
}
]
}
Pre-signed URLs: caller's identity signs URL. Recipient needs no AWS credentials. No s3:PutObject or s3:GetBucketPolicy required.
Quick Start
Install via Homebrew (macOS / Linux with Linuxbrew)
brew tap ejoliet/tap
brew install s3peek
Install via tarball (Linux, no Homebrew)
curl -fsSL https://github.com/ejoliet/s3peek/releases/latest/download/s3peek-linux-x86_64.tar.gz \
| tar -xz -C ~/.local/bin
chmod +x ~/.local/bin/s3peek
Install via pip / uv
pip install s3peek
# or
uv tool install s3peek
First run
# Browse a bucket interactively
s3peek browse s3://my-bucket/
# Quicklook a file header
s3peek peek s3://my-bucket/data/obs001.fits
# Copy a pre-signed URL to clipboard (1-day default)
s3peek share s3://my-bucket/data/obs001.fits
# Pre-signed URL with custom expiry
s3peek share s3://my-bucket/data/obs001.fits --expiry 4h
# Send an S3 object to a local Firefly server
FIREFLY_URL=http://localhost:8080/firefly s3peek firefly s3://my-bucket/data/obs001.fits
Configuration Reference
All settings via env var or ~/.config/s3peek/config.toml.
| Env var | Type | Default | Description |
|---|---|---|---|
AWS_DEFAULT_REGION |
string | unset | AWS region passed through to boto3 when config omits aws_region |
FIREFLY_URL |
string | unset | Firefly server URL, e.g. http://localhost:8080/firefly |
FIREFLY_CHANNEL |
string | unset | Firefly browser channel override |
S3PEEK_CONFIG |
string | unset | Path to config TOML; defaults to ~/.config/s3peek/config.toml |
Run s3peek config to see the resolved config file path and all current values.
A fully commented template is at docs/config.toml.sample.
~/.config/s3peek/config.toml example:
aws_profile = "roman-dev"
aws_region = "us-east-1"
presign_expiry_seconds = 3600
# max bytes fetched for fast-path quicklook (--deep bypasses this via streaming)
max_range_get_bytes = 65536
firefly_url = "http://localhost:8080/firefly"
firefly_channel = "my-session"
API / Interface Contract
CLI commands
s3peek [OPTIONS] COMMAND [ARGS]
Commands:
browse Interactive TUI browser for a bucket or prefix
config Show resolved config file path and current values
du Summarize storage usage under an S3 prefix
firefly Send an S3 object to a Firefly visualization server
ls List objects under an S3 prefix
peek Print header/schema of a single S3 object to stdout
share Generate a pre-signed URL; copy to clipboard
version Print version and exit
Options:
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to copy it or customize the installation.
--help Show this message and exit.
s3peek browse
s3peek browse S3_URI
Arguments:
S3_URI s3://bucket[/prefix] required
TUI keybindings:
↑ / ↓ Navigate list
Enter Descend into prefix / auto-peek object
Backspace Go up one prefix level
p Peek selected object (fast header quicklook via Range-GET)
d Deep-peek selected object (full header via SeekableS3Stream)
s Share — generate pre-signed URL, copy to clipboard
c Copy s3:// URI of selected object to clipboard
f Send to Firefly visualization server (requires firefly_url config)
q Quit
s3peek peek
s3peek peek S3_URI [OPTIONS]
Arguments:
S3_URI s3://bucket/key or local file path required
Options:
--output TEXT Output format: text or json [default: text]
--max-hdus INTEGER Max HDUs to show [default: 1]
--deep Full header extraction via astropy/asdf (S3: streams via Range-GETs, no full download)
--max-range-bytes INT Override max bytes for fast-path Range-GET (default: max_range_get_bytes from config)
Exit codes:
0 success
1 S3 access error
2 format not supported
3 parse error (file exists but header unreadable)
s3peek share
s3peek share S3_URI [OPTIONS]
Arguments:
S3_URI s3://bucket/key required
Options:
--expiry TEXT Expiry: 1h, 30m, 7d [default: 1h]
--qr Print QR code to terminal (requires `qrcode` extra)
Output (stdout):
Pre-signed URL as plain text; copied to clipboard when possible
s3peek du
s3peek du S3_URI [OPTIONS]
Arguments:
S3_URI s3://bucket[/prefix] required
Options:
--human-readable / --no-human-readable Print human-readable size [default: human-readable]
s3peek firefly
s3peek firefly S3_URI [OPTIONS]
Arguments:
S3_URI s3://bucket/key required
Options:
--server TEXT Firefly server URL; falls back to config `firefly_url` or `FIREFLY_URL`
--channel TEXT Browser tab channel; falls back to config `firefly_channel` or `FIREFLY_CHANNEL`
--open-browser Open Firefly in a browser tab
--preview Show metadata picker first
--title TEXT Display title
Example:
FIREFLY_URL=http://localhost:8080/firefly s3peek firefly s3://bucket/data/image.fits
Quicklook output contract
Each format reader returns HeaderResult:
from dataclasses import dataclass, field
from typing import Any
@dataclass
class HeaderResult:
format: str # "fits" | "asdf" | "parquet" | "json"
s3_uri: str
size_bytes: int | None # None if unavailable
headers: list[dict[str, Any]] # one dict per HDU (FITS) or one (others)
truncated: bool = False # True if range-GET hit max_bytes
error: str | None = None # set on parse failure
FITS headers entry structure:
{
"hdu_index": 0,
"hdu_type": "PrimaryHDU", # HDU type string from astropy
"naxis": 2,
"shape": [2048, 2048],
"cards": {"SIMPLE": True, "BITPIX": -32, ...}
}
ASDF headers entry structure:
{
"asdf_version": "1.6.0",
"tree": { ... } # full YAML tree dict, no array data
}
Parquet headers entry structure:
{
"num_rows": 1048576,
"num_row_groups": 4,
"schema": [
{"name": "ra", "type": "DOUBLE", "nullable": False},
...
],
"metadata": { ... } # file-level key/value metadata
}
JSON headers entry structure:
{
"type": "object", # top-level JSON type
"keys": ["ra", "dec", "mag"], # top-level keys if object
"length": 3 # array length if top-level is array
}
Data Model
No persistent storage. All runtime state in:
@dataclass
class SessionState:
bucket: str
prefix: str = ""
history: list[str] = field(default_factory=list) # navigation breadcrumb
selected_key: str | None = None
Config loaded once at startup into:
class Config(BaseModel):
aws_profile: str | None = None
aws_region: str | None = None
default_bucket: str | None = None
presign_expiry_seconds: int = 3600
max_range_get_bytes: int = 65536
theme: str = "dark"
firefly_url: str | None = None
firefly_channel: str | None = None
Error Handling
| Error class | When raised | Exit code | User message |
|---|---|---|---|
S3AccessError |
NoCredentialsError, ClientError 403 |
1 | "AWS credentials missing or insufficient permissions" |
S3KeyNotFoundError |
ClientError 404 |
1 | "Object not found: s3://..." |
FormatNotSupportedError |
Extension not in supported list | 2 | "Format not supported. Supported: fits, asdf, parquet, json" |
QuicklookParseError |
Header bytes unreadable | 3 | "Could not parse header — file may be truncated or corrupt" |
PresignExpirySyntaxError |
Expiry string invalid | 1 | "Invalid expiry format. Use: 1d, 6h, 30m" |
PresignExpiryTooLongError |
Expiry > 7 days | 1 | "Maximum expiry is 7 days (604800 seconds)" |
All errors -> stderr. stdout reserved for data output only.
Testing
# Run full suite
make test
# With coverage report
make test-cov
# Lint only
make lint
# Single module
pytest tests/test_quicklook.py -v
Test matrix
| Suite | Scope | Fixtures |
|---|---|---|
test_s3.py |
list, stat, range-GET | moto S3 mock; fixtures/ uploaded at setup |
test_quicklook.py |
all four format readers; stream input | fixtures/sample.{fits,asdf,parquet,json} |
test_presign.py |
URL generation, expiry parsing, clipboard skip | moto + monkeypatched pyperclip |
test_cli.py |
CLI smoke paths; --deep streaming, --max-range-bytes |
moto + Typer CliRunner |
test_streams.py |
SeekableS3Stream seek/read/cache/EOF/guard | mocked S3Client.range_get |
test_deep_readers.py |
FITSReader/ASDFReader _read_deep with stream |
BytesIO fixtures |
Constraint: Tests must never hit real AWS endpoints. moto mocking mandatory.
Deployment / Installation Targets
Homebrew (primary macOS + Linux)
Formula/s3peek.rb auto-generated by release.yml on tag push.
Manual formula update (maintainer):
make brew-bump VERSION=0.2.0 SHA256=<sha256_of_tarball>
Standalone binary (PyInstaller)
make build-binary # outputs dist/s3peek (macOS) or dist/s3peek-linux
CI builds for macos-latest and ubuntu-latest via GitHub Actions matrix. Artifacts uploaded to GitHub Release assets.
pip / uv
pip install s3peek
uv tool install s3peek # preferred; isolated env
Linux tarball (no package manager)
# Built by release.yml; SHA256 verified in formula
curl -fsSL https://github.com/ejoliet/s3peek/releases/latest/download/s3peek-linux-x86_64.tar.gz \
| tar -xz -C ~/.local/bin
Security
- Pre-signed URLs signed with caller's temporary or long-term AWS credentials. Grant no additional IAM permissions beyond signing identity.
- Pre-signed URL expiry —
--expiryaccepts values like30m,1h, or7d; effective maximums still depend on AWS credential/session limits. - No credentials stored by
s3peek. Tool is read-only by design (nos3:PutObject). - Clipboard warning —
sharecopies the pre-signed URL when system clipboard support is available, then prints the URL to stdout. - Firefly handoff is local by default — the CLI downloads the selected S3 object to a transient local file and passes that path to Firefly. It does not expose bucket credentials or generate remote URLs for Firefly.
Non-Goals (v1)
- No upload or delete — read-only; no
s3:PutObject,s3:DeleteObject - No recursive download — use
aws s3 sync - No full file render — quicklook reads headers/schema only; not FITS image viewer
- No bucket creation or policy management
- No Windows support — macOS and Linux only in v1; PRs welcome
- No auth management — does not create or rotate AWS credentials
- No HiPS / MOC / VOTable quicklook — v2 candidate
- No interactive diff between two S3 keys
Open Questions
| # | Question | Blocking? | Owner |
|---|---|---|---|
| 1 | Should s3peek browse support paginating >10k objects with spinner, or limit to first page_size? |
No — default to limit; add pagination in v2 | Emmanuel |
| 2 | qrcode extra for --qr flag: include in default install or opt-in extra? |
No — make it extra: pip install s3peek[qr] |
Emmanuel |
| 3 | Should FITS quicklook display WCS (World Coordinate System) summary if present? | No — nice-to-have for v2 | open |
| 4 | For ASDF files from Roman pipeline: should asdf-astropy be hard dep or optional? |
No — optional extra: pip install s3peek[astro] |
Emmanuel |
Agent Build Instructions
Authoritative build spec. Implement end-to-end using only this README — no clarifying questions needed.
Build Order
| Phase | Deliverable | Done when |
|---|---|---|
| 0 | Repo scaffold + CI skeleton | make lint passes on empty project; GitHub Actions runs |
| 1 | S3 abstraction layer (s3.py) |
test_s3.py passes with moto; list, stat, range-GET work |
| 2 | Quicklook engine (quicklook.py) |
test_quicklook.py passes for all 4 formats against fixtures |
| 3 | Presign module (presign.py) |
test_presign.py passes; clipboard copy mocked; expiry parsing correct |
| 4 | CLI commands (cli.py) — non-TUI first |
test_cli.py passes for peek, share, ls, version |
| 5 | TUI browser (browser.py) |
Done — s3peek browse s3://bucket/prefix/ navigates, peek/deep-peek/share/copy/firefly keybindings work; 88 tests pass |
| 6 | Build + packaging | make build-binary succeeds; brew install from local formula |
File Map
| File | Purpose | Key symbols |
|---|---|---|
src/s3peek/config.py |
Pydantic config model; env var + TOML loading | class Config(BaseModel) |
src/s3peek/s3.py |
S3 list, stat, range-GET via boto3 | list_prefix(), stat_object(), range_get() |
src/s3peek/streams.py |
Seekable S3-backed file-like object | SeekableS3Stream(io.RawIOBase) |
src/s3peek/quicklook.py |
Format dispatch; accepts bytes or stream | quicklook(data: bytes | io.RawIOBase, ...) |
src/s3peek/presign.py |
URL generation + expiry parsing + clipboard | generate_presigned_url(), parse_expiry(), copy_to_clipboard() |
src/s3peek/browser.py |
Textual TUI app: navigation, listing, quicklook panel | S3Browser(App), Entry, ListingReady, QuicklookReady, list_dir() |
src/s3peek/cli.py |
Typer app; all commands | app = typer.Typer(), browse, peek, share, ls, version |
tests/conftest.py |
moto fixtures; fixture file upload | s3_client, populated_bucket |
tests/test_s3.py |
S3 layer tests | test_list_prefix, test_range_get |
tests/test_quicklook.py |
Format reader tests | one test per format; error path tests |
tests/test_presign.py |
Presign + expiry tests | test_expiry_parsing, test_url_structure |
tests/test_cli.py |
CLI integration tests | test_peek_fits, test_share_no_clipboard, test_ls |
Makefile |
Dev commands | lint, test, test-cov, build-binary, brew-bump |
pyproject.toml |
Build + deps + entry point | [project.scripts] s3peek = "s3peek.cli:app" |
Formula/s3peek.rb |
Homebrew formula | url, sha256, depends_on blocks |
Constraints
- Python 3.11+ only. No
matchon Python < 3.10; use 3.11+ syntax freely. - Range-GET for FITS: read first
65536bytes (configurable via--max-bytes). Parse withastropy.io.fits.open(BytesIO(...))+ignore_missing_end=True. - ASDF range-GET: read first
65536bytes; open withasdf.open(BytesIO(...), lazy_load=True, copy_arrays=False). - Parquet range-GET: use
pyarrow.parquet.ParquetFile(pa.BufferReader(bytes))— reads footer from end; for range-GET, fetch last 65536 bytes (footer at end of file in Parquet format). - JSON: fetch first
65536bytes; parse withjson.loads; on failure tryjson.JSONDecoder().raw_decode()for streaming objects. - Pre-signed URL expiry: parse
Xd/Xh/Xm→ seconds. Cap at604800(7 days). Error on invalid format. - Clipboard: use
pyperclip; catchpyperclip.PyperclipException, fall back to stdout-only with warning. - Tests must use
moto(@mock_awsdecorator). No real boto3 calls in tests. - All public functions: typed signatures + docstrings.
ruff+mypymust pass at zero warnings.
Acceptance Criteria
-
make testpasses with ≥ 80% coverage -
make lintpasses (ruff check+mypy --strict) -
s3peek peek s3://test-bucket/sample.fitsprints HDU table to stdout -
s3peek share s3://test-bucket/sample.fitsprints valid pre-signed URL -
s3peek share s3://test-bucket/sample.fits --expiry badexits code 1 with error on stderr -
s3peek browse s3://test-bucket/launches TUI without crash (manual check) -
make build-binaryproduces standalone executable on macOS and Linux -
brew install --build-from-source Formula/s3peek.rbsucceeds locally - All Open Questions resolved or deferred to v2 in CHANGELOG
Next Steps
Completed
| Phase | Deliverable | Status |
|---|---|---|
| 0 | Repo scaffold + CI skeleton | Done |
| 1 | S3 abstraction layer (s3.py) |
Done |
| 2 | Quicklook engine (quicklook.py) |
Done |
| 3 | Presign module (presign.py) |
Done |
| 4 | CLI commands (cli.py) |
Done |
| 5 | TUI browser (browser.py) |
Done — PR #11 |
Upcoming
| # | Feature | Notes |
|---|---|---|
| 6 | Column sorting in TUI — sort object listing by file size or last-modified date | s key conflicts; use S (capital) or dedicated sort-cycle binding. DataTable sort via sort() with key= lambda on Entry.size / Entry.last_modified. Toggle asc/desc on repeated press. |
| 7 | Filter / fuzzy search in browser | / key → Input widget overlaid on DataTable; filter Entry.name in-memory |
| 8 | Build + packaging | make build-binary → standalone executable; Homebrew formula |
| 9 | LocalStack integration tests | Optional CI stage; moto covers unit tests |
- Set up
.github/workflows/release.yml(tag → PyPI publish + binary upload + formula bump) - Write
Formula/s3peek.rbtemplate; validate withbrew audit - Resolve all Open Questions; update CHANGELOG
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file s3peek-0.2.0.tar.gz.
File metadata
- Download URL: s3peek-0.2.0.tar.gz
- Upload date:
- Size: 45.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82572f289a4d54922cf1e6706e794166ed4f41428d75066cf79309a33636c7fc
|
|
| MD5 |
751d57b5fce13b7dd530e5c7b2f18c76
|
|
| BLAKE2b-256 |
5b470dd4abb785d072e585e5d66a8377e9dbd31a93232f5622144c6602a96682
|
File details
Details for the file s3peek-0.2.0-py3-none-any.whl.
File metadata
- Download URL: s3peek-0.2.0-py3-none-any.whl
- Upload date:
- Size: 32.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0afa44d71c6fbb8caeb77e638a63a11ee4437473496af8d4a34a237c0ace985
|
|
| MD5 |
94eb2a7d506e979913a9e1718495a4c7
|
|
| BLAKE2b-256 |
581af8c15cc0abee1eaa0479ce179d4ce715de4010f255ff0db09e0e43403884
|