Nanobind/C++ parsers for polygon, bulk S3, and websocket market data.

These details have not been verified by PyPI

Project description

massive-speedup

Native C++/nanobind readers for Polygon/Massive flat-file market data.

See INSTALL.md for installation details and DEVELOPMENT.md for release and PyPI publishing notes.

CSV Gzip Files

Install/build the native extension:

pip3 install -e .

Iterate parsed records directly from a .csv.gz file:

import massive_speedup

for trade in massive_speedup.FlatFiles.Stock.Trade.parse("trades.csv.gz"):
    print(trade.ticker, trade.sip_timestamp, trade.price)

for quote in massive_speedup.FlatFiles.Stock.Quote.parse("quotes.csv.gz"):
    print(quote.ticker, quote.bid_price, quote.ask_price)

for quote in massive_speedup.FlatFiles.currency.Quote.parse("currency_quotes.csv.gz"):
    print(quote.ticker, quote.participant_timestamp)

You can also iterate raw CSV fields as bytes tuples:

for row in massive_speedup.FlatFiles.Stock.Trade.parse_raw("trades.csv.gz"):
    print(row[0], row[8])

Example scripts:

examples/howto_csv_gzip_daily_vwap.py computes daily stock-trade VWAP using gzip and csv.DictReader.
examples/howto_database_daily_vwap.py computes the same value from a massive-speedup binary database file using mmap and the native C++ aggregator.

Record Access

Parsed records expose read-only attributes and are iterable in CSV field order:

trade = next(massive_speedup.FlatFiles.Stock.Trade.parse("trades.csv.gz"))

print(trade.ticker)
print(trade.conditions)
print(trade.sip_timestamp)
print(trade.pack())
print(list(trade))

Packed records do not include the ticker. Reconstruct with the ticker from the file name:

packed = trade.pack()
trade2 = massive_speedup.StockTrade.from_packed(packed, trade.ticker)

Window Aggregation

The native aggregators consume iterables of parsed records and yield C++ result objects exposed through nanobind. Result attributes are read-only and lazily converted to Python objects on first access. The aggregation interval and offset are expressed in seconds; the returned window_start is still nanoseconds since epoch.

import massive_speedup

trades = massive_speedup.FlatFiles.Stock.Trade.parse("trades.csv.gz")

for bar in massive_speedup.FlatFiles.Stock.Trade.Aggregator(
    trades,
    interval_seconds=60,
):
    print(
        bar.ticker,
        bar.window_start,
        bar.open,
        bar.close,
        bar.high,
        bar.low,
        bar.avg,
        bar.volume_weighted_avg,
        bar.volume,
        bar.transactions,
        bar.stddev,
    )

Available aggregators:

massive_speedup.StockTradeAggregator / FlatFiles.Stock.Trade.Aggregator
massive_speedup.StockQuoteAggregator / FlatFiles.Stock.Quote.Aggregator
massive_speedup.CurrencyQuoteAggregator / FlatFiles.currency.Quote.Aggregator

Stock trades aggregate price and use size for volume and volume_weighted_avg. Stock quotes aggregate ask and bid prices separately and use ask/bid sizes for ask/bid volume-weighted averages. Currency quotes aggregate ask and bid prices separately and omit volume and volume-weighted averages because the source rows have no size field.

quotes = massive_speedup.StockQuoteDatabase("/data/massive-db", "2026-01-23", "A")

for quote_bar in massive_speedup.StockQuoteAggregator(
    quotes,
    interval_seconds=1,
    offset_seconds=0,
):
    print(quote_bar.ask_open, quote_bar.ask_close, quote_bar.bid_avg)

Aggregators stream consecutive (ticker, window_start) groups. Use input ordered by ticker and timestamp, such as the native database iterators or default Massive/Polygon flat-file order. stddev is population standard deviation.

Build Database Files

Build fixed-length binary database files from one or more input .csv.gz files:

massive-speedup-build-database --database /data/massive-db 2026-01-23.csv.gz

The input type is inferred from the CSV header. Output layout is:

{database}/{stock_trade|stock_quote|currency_quote}/{YYYY-MM-DD}/{ticker}

Existing ticker files are not overwritten by default. The builder keeps reading the input until the next ticker and only writes missing ticker files. Use --force to rebuild existing ticker files, which is useful after a binary record format change:

massive-speedup-build-database --force --database /data/massive-db 2026-01-23.csv.gz

Date-level idempotency uses an .incomplete marker in {database}/{type}/{YYYY-MM-DD}. If the date directory exists without .incomplete, the input file is skipped. If the directory is new, .incomplete is created before processing and removed only after successful completion. Use --force to process a date even when .incomplete is absent.

Use --benchmark to print throughput:

massive-speedup-build-database --benchmark --database /data/massive-db *.csv.gz

Database Files

Open a fixed-length binary file through mmap and iterate records:

records = massive_speedup.StockTradeDatabase(
    "/data/massive-db",
    "2026-01-23",
    "A",
)

for trade in records:
    print(trade.sip_timestamp, trade.price)

Merge stock trades and quotes for one date and ticker in SIP timestamp order:

for trade, quote in massive_speedup.stock_trade_quote_timeline(
    "/data/massive-db",
    "2026-01-23",
    "A",
):
    if trade:
        print("trade", trade.sip_timestamp, trade.price, quote)
    else:
        print("quote", quote.sip_timestamp, quote.bid_price, quote.ask_price)

Quote rows yield (None, current_quote). Trade rows yield (trade, last_quote), where last_quote is None until the first quote has appeared. When a trade and quote have the same SIP timestamp, the quote is yielded first.

Database files support indexing and timestamp search:

first = records[0]
last = records[-1]

index = records.index_before_timestamp(1769161728012983416)
near_open = records.index_before_timestamp(1769161728012983416, galloping=0)
next_index = records.index_after_timestamp(1769161728012983416, galloping=index + 1)

Timestamp arguments are nanoseconds since epoch. Database readers also accept datetime.time values, which are resolved using the reader's date:

import datetime as dt

index = records.index_before_timestamp(dt.time(9, 30))

Find the closest record before or after a participant timestamp:

before = records.find_before_participant_timestamp(
    1769161728012624580,
)
after = records.find_after_participant_timestamp(
    1769161728012624580,
    fuzz=250_000_000,
    galloping=True,
)
strict_before = records.find_before_participant_timestamp(
    1769161728012624580,
    on=False,
)

find_before_participant_timestamp returns the record with the highest participant timestamp less than or equal to the target. find_after_participant_timestamp returns the record with the lowest participant timestamp greater than or equal to the target. Set on=False for strict < or > comparisons. fuzz is a nanosecond scan window around the searched timestamp and defaults to one second (1_000_000_000). Both methods return records, not indexes.

Stock database readers also expose NYSE market session timestamps in nanoseconds:

print(records.market_open)
print(records.market_close)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.4

May 8, 2026

This version

0.1.3

May 8, 2026

0.1.2

May 8, 2026

0.1.1

May 3, 2026

0.1.0

May 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

massive_speedup-0.1.3.tar.gz (58.6 MB view details)

Uploaded May 8, 2026 Source

File details

Details for the file massive_speedup-0.1.3.tar.gz.

File metadata

Download URL: massive_speedup-0.1.3.tar.gz
Upload date: May 8, 2026
Size: 58.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for massive_speedup-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`c3e973561de175df993df76e9106b7be98a8fedbdee99f778338fdbfc5330f2e`
MD5	`9dfcd0e231203e724795e3b798b4b983`
BLAKE2b-256	`c6bed7f65a990061ea67dcea4dc4e9cf11151e0c7d985188a9043013ac70b847`

See more details on using hashes here.

massive-speedup 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

massive-speedup

CSV Gzip Files

Record Access

Window Aggregation

Build Database Files

Database Files

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes