Skip to main content

Terminal-first S3 browser for scientists and data engineers

Project description

s3peek

Navigate S3 buckets from terminal — instant header quicklook for FITS, ASDF, Parquet, JSON files, plus one-command pre-signed URL sharing.

CI License: MIT Python 3.11+ Homebrew


Purpose

Problem: S3 CLI navigation clunky. aws s3 ls shows raw keys; inspecting FITS/ASDF requires download first; sharing files needs remembering aws s3 presign syntax and expiry flags.

Solution: s3peek — terminal-first S3 browser combining interactive bucket navigation (arrow keys, fuzzy filter) with in-place header quicklook for astronomy/data science formats, plus instant pre-signed URL generation with clipboard copy.

Who benefits: Astronomers and data engineers at IPAC, STScI, or institutions working with AWS-hosted science data (FITS, ASDF, Parquet, JSON). Zero Python import required — distributed as standalone binary or Homebrew formula.


Architecture

┌──────────────────────────────────────────────────────────┐
│                        s3peek CLI                        │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────┐  │
│  │  TUI browser │  │  Quicklook   │  │  Presign cmd   │  │
│  │  (Textual)   │  │  engine      │  │  (boto3)       │  │
│  └──────┬───────┘  └──────┬───────┘  └───────┬────────┘  │
│         └─────────────────┴──────────────────┘           │
│                      S3 abstraction layer                 │
│                  (boto3 + s3fs + range-GET)               │
└──────────────────────────────────────────────────────────┘
         │                                    │
    AWS S3 API                         Clipboard (pyperclip)

Key design decisions:

  • Range-GET for headers — FITS/Parquet headers read via HTTP Range requests (first N bytes only, configurable via max_range_get_bytes). No full download.
  • ASDF/FITS dual-mode read — fast path: raw byte parse of first N KB (no heavy lib, zero overhead). Deep-inspect path (--deep): SeekableS3Stream issues Range-GETs on demand as astropy/asdf seek through the file — full multi-HDU / full ASDF tree, no full download, works on arbitrarily large files.
  • SeekableS3Stream — file-like object backed by S3 Range-GETs with a 256 KB chunk cache. Passed directly to astropy.io.fits.open() / asdf.open() so libraries only fetch the bytes they actually read.
  • No local state — no DB, no cache file. All navigation state in-memory per session.
  • AWS credentials pass-through — standard boto3 credential chain (~/.aws, env vars, instance profile). No credential storage.

Repository Layout

s3peek/
├── README.md                  # This file — spec + public docs
├── pyproject.toml             # Build config; entry_points for CLI
├── Makefile                   # Dev commands: lint, test, build, brew-test
├── Formula/
│   └── s3peek.rb              # Homebrew formula (auto-generated by release CI)
├── src/
│   └── s3peek/
│       ├── __init__.py
│       ├── cli.py             # Typer app: entry point, top-level commands
│       ├── browser.py         # Textual TUI: bucket/prefix navigation widget
│       ├── quicklook.py       # Format dispatch; accepts bytes or seekable stream
│       ├── streams.py         # SeekableS3Stream: Range-GET-backed file-like object
│       ├── presign.py         # Pre-signed URL generation + clipboard copy
│       ├── s3.py              # S3 abstraction: list, stat, range-GET
│       └── config.py          # Config model: defaults, env var bindings
├── tests/
│   ├── conftest.py            # moto-based S3 fixtures; sample test files
│   ├── test_quicklook.py      # Format readers against fixture files
│   ├── test_presign.py        # Pre-signed URL generation (moto)
│   ├── test_s3.py             # S3 abstraction layer (moto)
│   └── test_cli.py            # CLI smoke tests via Typer test runner
├── fixtures/
│   ├── sample.fits            # Minimal FITS with header only
│   ├── sample.asdf            # Minimal ASDF with known tree
│   ├── sample.parquet         # Minimal Parquet with schema
│   └── sample.json            # Sample JSON object
├── .github/
│   └── workflows/
│       ├── ci.yml             # Test + lint on push/PR
│       └── release.yml        # PyPI publish + Homebrew formula bump on tag
├── .env.example               # Documented env vars; never committed with values
└── CHANGELOG.md

Prerequisites

Requirement Version Notes
Python 3.11+ CPython; PyPy untested
AWS credentials any valid chain ~/.aws/credentials, env vars, or instance profile
IAM permissions s3:ListBucket, s3:GetObject s3:GetObjectAttributes for stat; no write permissions needed
xclip or xsel (Linux) any For clipboard copy; optional — URL printed to stdout if absent
macOS 12+ pbcopy built-in; no extra deps

IAM minimum policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket", "s3:GetObject", "s3:GetObjectAttributes"],
      "Resource": ["arn:aws:s3:::YOUR_BUCKET", "arn:aws:s3:::YOUR_BUCKET/*"]
    }
  ]
}

Pre-signed URLs: caller's identity signs URL. Recipient needs no AWS credentials. No s3:PutObject or s3:GetBucketPolicy required.


Quick Start

Install via Homebrew (macOS / Linux with Linuxbrew)

brew tap ejoliet/tap
brew install s3peek

Install via tarball (Linux, no Homebrew)

curl -fsSL https://github.com/ejoliet/s3peek/releases/latest/download/s3peek-linux-x86_64.tar.gz \
  | tar -xz -C ~/.local/bin
chmod +x ~/.local/bin/s3peek

Install via pip / uv

pip install s3peek
# or
uv tool install s3peek

First run

# Browse a bucket interactively
s3peek browse s3://my-bucket/

# Quicklook a file header
s3peek peek s3://my-bucket/data/obs001.fits

# Copy a pre-signed URL to clipboard (1-day default)
s3peek share s3://my-bucket/data/obs001.fits

# Pre-signed URL with custom expiry
s3peek share s3://my-bucket/data/obs001.fits --expiry 4h

# Send an S3 object to a local Firefly server
FIREFLY_URL=http://localhost:8080/firefly s3peek firefly s3://my-bucket/data/obs001.fits

Configuration Reference

All settings via env var or ~/.config/s3peek/config.toml.

Env var Type Default Description
AWS_DEFAULT_REGION string unset AWS region passed through to boto3 when config omits aws_region
FIREFLY_URL string unset Firefly server URL, e.g. http://localhost:8080/firefly
FIREFLY_CHANNEL string unset Firefly browser channel override
S3PEEK_CONFIG string unset Path to config TOML; defaults to ~/.config/s3peek/config.toml

Run s3peek config to see the resolved config file path and all current values.

A fully commented template is at docs/config.toml.sample.

~/.config/s3peek/config.toml example:

aws_profile = "roman-dev"
aws_region = "us-east-1"
presign_expiry_seconds = 3600
# max bytes fetched for fast-path quicklook (--deep bypasses this via streaming)
max_range_get_bytes = 65536
firefly_url = "http://localhost:8080/firefly"
firefly_channel = "my-session"

API / Interface Contract

CLI commands

s3peek [OPTIONS] COMMAND [ARGS]

Commands:
  browse   Interactive TUI browser for a bucket or prefix
  config   Show resolved config file path and current values
  du       Summarize storage usage under an S3 prefix
  firefly  Send an S3 object to a Firefly visualization server
  ls       List objects under an S3 prefix
  peek     Print header/schema of a single S3 object to stdout
  share    Generate a pre-signed URL; copy to clipboard
  version  Print version and exit

Options:
  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or customize the installation.
  --help                Show this message and exit.

s3peek browse

s3peek browse S3_URI

Arguments:
  S3_URI    s3://bucket[/prefix]  required

TUI keybindings:
  ↑ / ↓          Navigate list
  Enter           Descend into prefix / auto-peek object
  Backspace       Go up one prefix level
  p               Peek selected object (fast header quicklook via Range-GET)
  d               Deep-peek selected object (full header via SeekableS3Stream)
  s               Share — generate pre-signed URL, copy to clipboard
  c               Copy s3:// URI of selected object to clipboard
  f               Send to Firefly visualization server (requires firefly_url config)
  q               Quit

s3peek peek

s3peek peek S3_URI [OPTIONS]

Arguments:
  S3_URI    s3://bucket/key or local file path   required

Options:
  --output TEXT           Output format: text or json [default: text]
  --max-hdus INTEGER      Max HDUs to show [default: 1]
  --deep                  Full header extraction via astropy/asdf (S3: streams via Range-GETs, no full download)
  --max-range-bytes INT   Override max bytes for fast-path Range-GET (default: max_range_get_bytes from config)

Exit codes:
  0   success
  1   S3 access error
  2   format not supported
  3   parse error (file exists but header unreadable)

s3peek share

s3peek share S3_URI [OPTIONS]

Arguments:
  S3_URI    s3://bucket/key   required

Options:
  --expiry TEXT      Expiry: 1h, 30m, 7d [default: 1h]
  --qr               Print QR code to terminal (requires `qrcode` extra)

Output (stdout):
  Pre-signed URL as plain text; copied to clipboard when possible

s3peek du

s3peek du S3_URI [OPTIONS]

Arguments:
  S3_URI    s3://bucket[/prefix]  required

Options:
  --human-readable / --no-human-readable  Print human-readable size [default: human-readable]

s3peek firefly

s3peek firefly S3_URI [OPTIONS]

Arguments:
  S3_URI    s3://bucket/key   required

Options:
  --server TEXT       Firefly server URL; falls back to config `firefly_url` or `FIREFLY_URL`
  --channel TEXT      Browser tab channel; falls back to config `firefly_channel` or `FIREFLY_CHANNEL`
  --open-browser      Open Firefly in a browser tab
  --preview           Show metadata picker first
  --title TEXT        Display title

Example:
  FIREFLY_URL=http://localhost:8080/firefly s3peek firefly s3://bucket/data/image.fits

Quicklook output contract

Each format reader returns HeaderResult:

from dataclasses import dataclass, field
from typing import Any

@dataclass
class HeaderResult:
    format: str                        # "fits" | "asdf" | "parquet" | "json"
    s3_uri: str
    size_bytes: int | None             # None if unavailable
    headers: list[dict[str, Any]]      # one dict per HDU (FITS) or one (others)
    truncated: bool = False            # True if range-GET hit max_bytes
    error: str | None = None           # set on parse failure

FITS headers entry structure:

{
    "hdu_index": 0,
    "hdu_type": "PrimaryHDU",         # HDU type string from astropy
    "naxis": 2,
    "shape": [2048, 2048],
    "cards": {"SIMPLE": True, "BITPIX": -32, ...}
}

ASDF headers entry structure:

{
    "asdf_version": "1.6.0",
    "tree": { ... }                   # full YAML tree dict, no array data
}

Parquet headers entry structure:

{
    "num_rows": 1048576,
    "num_row_groups": 4,
    "schema": [
        {"name": "ra", "type": "DOUBLE", "nullable": False},
        ...
    ],
    "metadata": { ... }               # file-level key/value metadata
}

JSON headers entry structure:

{
    "type": "object",                  # top-level JSON type
    "keys": ["ra", "dec", "mag"],     # top-level keys if object
    "length": 3                        # array length if top-level is array
}

Data Model

No persistent storage. All runtime state in:

@dataclass
class SessionState:
    bucket: str
    prefix: str = ""
    history: list[str] = field(default_factory=list)   # navigation breadcrumb
    selected_key: str | None = None

Config loaded once at startup into:

class Config(BaseModel):
    aws_profile: str | None = None
    aws_region: str | None = None
    default_bucket: str | None = None
    presign_expiry_seconds: int = 3600
    max_range_get_bytes: int = 65536
    theme: str = "dark"
    firefly_url: str | None = None
    firefly_channel: str | None = None

Error Handling

Error class When raised Exit code User message
S3AccessError NoCredentialsError, ClientError 403 1 "AWS credentials missing or insufficient permissions"
S3KeyNotFoundError ClientError 404 1 "Object not found: s3://..."
FormatNotSupportedError Extension not in supported list 2 "Format not supported. Supported: fits, asdf, parquet, json"
QuicklookParseError Header bytes unreadable 3 "Could not parse header — file may be truncated or corrupt"
PresignExpirySyntaxError Expiry string invalid 1 "Invalid expiry format. Use: 1d, 6h, 30m"
PresignExpiryTooLongError Expiry > 7 days 1 "Maximum expiry is 7 days (604800 seconds)"

All errors -> stderr. stdout reserved for data output only.


Testing

# Run full suite
make test

# With coverage report
make test-cov

# Lint only
make lint

# Single module
pytest tests/test_quicklook.py -v

Test matrix

Suite Scope Fixtures
test_s3.py list, stat, range-GET moto S3 mock; fixtures/ uploaded at setup
test_quicklook.py all four format readers; stream input fixtures/sample.{fits,asdf,parquet,json}
test_presign.py URL generation, expiry parsing, clipboard skip moto + monkeypatched pyperclip
test_cli.py CLI smoke paths; --deep streaming, --max-range-bytes moto + Typer CliRunner
test_streams.py SeekableS3Stream seek/read/cache/EOF/guard mocked S3Client.range_get
test_deep_readers.py FITSReader/ASDFReader _read_deep with stream BytesIO fixtures

Constraint: Tests must never hit real AWS endpoints. moto mocking mandatory.


Deployment / Installation Targets

Homebrew (primary macOS + Linux)

Formula/s3peek.rb auto-generated by release.yml on tag push.

Manual formula update (maintainer):

make brew-bump VERSION=0.2.0 SHA256=<sha256_of_tarball>

Standalone binary (PyInstaller)

make build-binary   # outputs dist/s3peek (macOS) or dist/s3peek-linux

CI builds for macos-latest and ubuntu-latest via GitHub Actions matrix. Artifacts uploaded to GitHub Release assets.

pip / uv

pip install s3peek
uv tool install s3peek   # preferred; isolated env

Linux tarball (no package manager)

# Built by release.yml; SHA256 verified in formula
curl -fsSL https://github.com/ejoliet/s3peek/releases/latest/download/s3peek-linux-x86_64.tar.gz \
  | tar -xz -C ~/.local/bin

Security

  • Pre-signed URLs signed with caller's temporary or long-term AWS credentials. Grant no additional IAM permissions beyond signing identity.
  • Pre-signed URL expiry--expiry accepts values like 30m, 1h, or 7d; effective maximums still depend on AWS credential/session limits.
  • No credentials stored by s3peek. Tool is read-only by design (no s3:PutObject).
  • Clipboard warningshare copies the pre-signed URL when system clipboard support is available, then prints the URL to stdout.
  • Firefly handoff is local by default — the CLI downloads the selected S3 object to a transient local file and passes that path to Firefly. It does not expose bucket credentials or generate remote URLs for Firefly.

Non-Goals (v1)

  • No upload or delete — read-only; no s3:PutObject, s3:DeleteObject
  • No recursive download — use aws s3 sync
  • No full file render — quicklook reads headers/schema only; not FITS image viewer
  • No bucket creation or policy management
  • No Windows support — macOS and Linux only in v1; PRs welcome
  • No auth management — does not create or rotate AWS credentials
  • No HiPS / MOC / VOTable quicklook — v2 candidate
  • No interactive diff between two S3 keys

Open Questions

# Question Blocking? Owner
1 Should s3peek browse support paginating >10k objects with spinner, or limit to first page_size? No — default to limit; add pagination in v2 Emmanuel
2 qrcode extra for --qr flag: include in default install or opt-in extra? No — make it extra: pip install s3peek[qr] Emmanuel
3 Should FITS quicklook display WCS (World Coordinate System) summary if present? No — nice-to-have for v2 open
4 For ASDF files from Roman pipeline: should asdf-astropy be hard dep or optional? No — optional extra: pip install s3peek[astro] Emmanuel

Agent Build Instructions

Authoritative build spec. Implement end-to-end using only this README — no clarifying questions needed.

Build Order

Phase Deliverable Done when
0 Repo scaffold + CI skeleton make lint passes on empty project; GitHub Actions runs
1 S3 abstraction layer (s3.py) test_s3.py passes with moto; list, stat, range-GET work
2 Quicklook engine (quicklook.py) test_quicklook.py passes for all 4 formats against fixtures
3 Presign module (presign.py) test_presign.py passes; clipboard copy mocked; expiry parsing correct
4 CLI commands (cli.py) — non-TUI first test_cli.py passes for peek, share, ls, version
5 TUI browser (browser.py) Dones3peek browse s3://bucket/prefix/ navigates, peek/deep-peek/share/copy/firefly keybindings work; 88 tests pass
6 Build + packaging make build-binary succeeds; brew install from local formula

File Map

File Purpose Key symbols
src/s3peek/config.py Pydantic config model; env var + TOML loading class Config(BaseModel)
src/s3peek/s3.py S3 list, stat, range-GET via boto3 list_prefix(), stat_object(), range_get()
src/s3peek/streams.py Seekable S3-backed file-like object SeekableS3Stream(io.RawIOBase)
src/s3peek/quicklook.py Format dispatch; accepts bytes or stream quicklook(data: bytes | io.RawIOBase, ...)
src/s3peek/presign.py URL generation + expiry parsing + clipboard generate_presigned_url(), parse_expiry(), copy_to_clipboard()
src/s3peek/browser.py Textual TUI app: navigation, listing, quicklook panel S3Browser(App), Entry, ListingReady, QuicklookReady, list_dir()
src/s3peek/cli.py Typer app; all commands app = typer.Typer(), browse, peek, share, ls, version
tests/conftest.py moto fixtures; fixture file upload s3_client, populated_bucket
tests/test_s3.py S3 layer tests test_list_prefix, test_range_get
tests/test_quicklook.py Format reader tests one test per format; error path tests
tests/test_presign.py Presign + expiry tests test_expiry_parsing, test_url_structure
tests/test_cli.py CLI integration tests test_peek_fits, test_share_no_clipboard, test_ls
Makefile Dev commands lint, test, test-cov, build-binary, brew-bump
pyproject.toml Build + deps + entry point [project.scripts] s3peek = "s3peek.cli:app"
Formula/s3peek.rb Homebrew formula url, sha256, depends_on blocks

Constraints

  • Python 3.11+ only. No match on Python < 3.10; use 3.11+ syntax freely.
  • Range-GET for FITS: read first 65536 bytes (configurable via --max-bytes). Parse with astropy.io.fits.open(BytesIO(...)) + ignore_missing_end=True.
  • ASDF range-GET: read first 65536 bytes; open with asdf.open(BytesIO(...), lazy_load=True, copy_arrays=False).
  • Parquet range-GET: use pyarrow.parquet.ParquetFile(pa.BufferReader(bytes)) — reads footer from end; for range-GET, fetch last 65536 bytes (footer at end of file in Parquet format).
  • JSON: fetch first 65536 bytes; parse with json.loads; on failure try json.JSONDecoder().raw_decode() for streaming objects.
  • Pre-signed URL expiry: parse Xd/Xh/Xm → seconds. Cap at 604800 (7 days). Error on invalid format.
  • Clipboard: use pyperclip; catch pyperclip.PyperclipException, fall back to stdout-only with warning.
  • Tests must use moto (@mock_aws decorator). No real boto3 calls in tests.
  • All public functions: typed signatures + docstrings.
  • ruff + mypy must pass at zero warnings.

Acceptance Criteria

  • make test passes with ≥ 80% coverage
  • make lint passes (ruff check + mypy --strict)
  • s3peek peek s3://test-bucket/sample.fits prints HDU table to stdout
  • s3peek share s3://test-bucket/sample.fits prints valid pre-signed URL
  • s3peek share s3://test-bucket/sample.fits --expiry bad exits code 1 with error on stderr
  • s3peek browse s3://test-bucket/ launches TUI without crash (manual check)
  • make build-binary produces standalone executable on macOS and Linux
  • brew install --build-from-source Formula/s3peek.rb succeeds locally
  • All Open Questions resolved or deferred to v2 in CHANGELOG

Next Steps

Completed

Phase Deliverable Status
0 Repo scaffold + CI skeleton Done
1 S3 abstraction layer (s3.py) Done
2 Quicklook engine (quicklook.py) Done
3 Presign module (presign.py) Done
4 CLI commands (cli.py) Done
5 TUI browser (browser.py) Done — PR #11

Upcoming

# Feature Notes
6 Column sorting in TUI — sort object listing by file size or last-modified date s key conflicts; use S (capital) or dedicated sort-cycle binding. DataTable sort via sort() with key= lambda on Entry.size / Entry.last_modified. Toggle asc/desc on repeated press.
7 Filter / fuzzy search in browser / key → Input widget overlaid on DataTable; filter Entry.name in-memory
8 Build + packaging make build-binary → standalone executable; Homebrew formula
9 LocalStack integration tests Optional CI stage; moto covers unit tests
  1. Set up .github/workflows/release.yml (tag → PyPI publish + binary upload + formula bump)
  2. Write Formula/s3peek.rb template; validate with brew audit
  3. Resolve all Open Questions; update CHANGELOG

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3peek-0.2.0.tar.gz (45.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

s3peek-0.2.0-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file s3peek-0.2.0.tar.gz.

File metadata

  • Download URL: s3peek-0.2.0.tar.gz
  • Upload date:
  • Size: 45.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for s3peek-0.2.0.tar.gz
Algorithm Hash digest
SHA256 82572f289a4d54922cf1e6706e794166ed4f41428d75066cf79309a33636c7fc
MD5 751d57b5fce13b7dd530e5c7b2f18c76
BLAKE2b-256 5b470dd4abb785d072e585e5d66a8377e9dbd31a93232f5622144c6602a96682

See more details on using hashes here.

File details

Details for the file s3peek-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: s3peek-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for s3peek-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0afa44d71c6fbb8caeb77e638a63a11ee4437473496af8d4a34a237c0ace985
MD5 94eb2a7d506e979913a9e1718495a4c7
BLAKE2b-256 581af8c15cc0abee1eaa0479ce179d4ce715de4010f255ff0db09e0e43403884

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page