Skip to main content

Download Polygon (Massive) options flat files from S3 and store as compressed Parquet

Project description

polygon-options-puller

Download Polygon / Massive US options (OPRA) flat files from their S3 bucket and store them locally as Snappy-compressed, dictionary-encoded Parquet files, filtered by symbol prefix.

How it works

Polygon ships daily .csv.gz files containing all option tickers for an entire trading day. The quote files alone are ~120 GB compressed each. This tool streams each file directly from S3, filters to your symbol prefix in-flight, and writes only matching rows to Parquet — no temp files, no downloading 120 GB just to keep 500 MB.

Key features:

  • Streaming: decompresses and filters in-flight, never writes the full CSV to disk
  • Parallel: uses a thread pool to download multiple days concurrently
  • NYSE-aware: uses pandas_market_calendars to skip holidays and weekends
  • Idempotent: re-running skips days that already have valid Parquet files
  • Atomic writes: uses temp files + os.replace() to prevent corrupt output

Installation

pip install .
# or in editable mode for development:
pip install -e ".[dev]"

Credentials

You need Polygon / Massive S3 credentials. Get them from your Massive dashboard.

export POLYGON_S3_ACCESS_KEY="your-access-key"
export POLYGON_S3_SECRET_KEY="your-secret-key"

Usage

Download data

# Download AAPL option quotes for a date range
polygon-options-puller download \
    --symbol-prefix AAPL \
    -t quotes \
    --start-date 2025-03-17 \
    --end-date 2025-03-21 \
    -o ./data/aapl

# Download SPXW trades with 16 workers
polygon-options-puller download \
    --symbol-prefix SPXW \
    -t trades \
    --start-date 2025-04-01 \
    --end-date 2025-04-30 \
    -o ./data/spxw \
    --workers 16

# Download both trades and quotes
polygon-options-puller download \
    --symbol-prefix SPY \
    -t both \
    --start-date 2025-04-01 \
    --end-date 2025-04-02 \
    -o ./data/spy

# Download minute aggregates
polygon-options-puller download \
    --symbol-prefix AAPL \
    -t minute_aggs \
    --start-date 2025-01-02 \
    --end-date 2025-01-02 \
    -o ./data/aapl

List available dates

# List all available quote files
polygon-options-puller list-dates

# List files for a specific year/month
polygon-options-puller list-dates --year 2024 --month 3

Python API

from datetime import date
from polygon_options_puller.downloader import pull

written = pull(
    access_key="your-key",
    secret_key="your-secret",
    output_dir="data/aapl",
    data_types=["quotes"],
    symbol_prefix="AAPL",
    start_date=date(2025, 3, 17),
    end_date=date(2025, 3, 21),
    workers=8,
)

Output layout

data/aapl/
├── quotes/
│   ├── 2025-03-17.parquet
│   ├── 2025-03-18.parquet
│   ├── 2025-03-19.parquet
│   ├── 2025-03-20.parquet
│   └── 2025-03-21.parquet
└── trades/
    ├── 2025-03-17.parquet
    └── ...

Each Parquet file contains only rows matching the --symbol-prefix you specified. Namespace different underlyings by using different --output-dir paths.

Data types

Type S3 prefix Description
quotes us_options_opra/quotes_v1 Top-of-book quotes, nanosecond timestamps
trades us_options_opra/trades_v1 Tick-level trades, nanosecond timestamps
day_aggs us_options_opra/day_aggs_v1 Daily OHLCV candles
minute_aggs us_options_opra/minute_aggs_v1 Minute OHLCV candles

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polygon_options_puller-0.2.0.tar.gz (50.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polygon_options_puller-0.2.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file polygon_options_puller-0.2.0.tar.gz.

File metadata

  • Download URL: polygon_options_puller-0.2.0.tar.gz
  • Upload date:
  • Size: 50.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polygon_options_puller-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e7a61073298c39f22dfc0f219f89fab785e613ff86219c82c2ea33429b70966a
MD5 d8d024feb34d1af273f06d81d279a652
BLAKE2b-256 86dd3b915686bec9d18aa26b5a200462072b1079e8c241d267893fd98afb98c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for polygon_options_puller-0.2.0.tar.gz:

Publisher: release.yml on marwinsteiner/polygon-options-puller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polygon_options_puller-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for polygon_options_puller-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7e48b8f94f797e4e11ebdd8e4775d73aa5b76ec9655b5e3c7e7f4b944d34cd5a
MD5 3d8ccc22934e9d10d3b49b6567dccbaa
BLAKE2b-256 de2e22bc9071db61f54d307e311f018945993dec9e978916099b9f6d36dc8205

See more details on using hashes here.

Provenance

The following attestation bundles were made for polygon_options_puller-0.2.0-py3-none-any.whl:

Publisher: release.yml on marwinsteiner/polygon-options-puller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page