Skip to main content

Market data platform for downloading and storing financial OHLCV data

Project description

marketgoblin

Download, store, and load financial market data — fast and without fuss.

PyPI Python License CI codecov Docs

marketgoblin is a lightweight market data platform built on Polars and yfinance. It fetches multiple datasets (OHLCV, shares-outstanding, dividends), slices them into monthly Parquet files, writes JSON sidecars with metadata, and lets you load them back with a single call.


Features

  • Multi-dataset — OHLCV, shares-outstanding, and dividends selected via a Dataset enum; per-source dispatch makes it easy to add more
  • Tidy stacked OHLCV — adjusted and raw prices live in one frame, distinguished by an is_adjusted bool column; one network call per symbol covers both
  • Single-symbol and batch fetchfetch() and fetch_many() with thread-pool concurrency
  • Disk persistence — monthly .pq slices with atomic writes; JSON sidecar per slice
  • Lazy evaluation — all data paths return pl.LazyFrame (Polars)
  • Date flexibility — dates stored as int32 YYYYMMDD on disk; use parse_dates=True to get pl.Date
  • Retry logicYahooSource retries transient failures with exponential backoff (3 attempts)
  • Rate limitingfetch_many() respects a configurable requests-per-second cap (default: 2 req/s)
  • Input validation — dates are validated before any I/O; unsupported (provider, dataset) pairs raise at the dispatch boundary
  • Pluggable providers — subclass BaseSource, implement _build_dispatch(), register in one line; CSVSource included

Installation

pip install marketgoblin

Or with uv:

uv add marketgoblin

For development:

git clone https://github.com/aexsalomao/marketgoblin
cd marketgoblin
uv sync --extra dev

Quick Start

import polars as pl
from marketgoblin import Dataset, MarketGoblin

goblin = MarketGoblin(provider="yahoo", save_path="./data")

# Fetch and persist OHLCV — tidy stacked frame: each trading day appears
# twice (is_adjusted=True / False). Filter to pick a variant.
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
adjusted = lf.filter(pl.col("is_adjusted")).collect()
print(adjusted)

# Load back from disk (no network call)
lf = goblin.load("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
print(lf.collect())

# Shares outstanding — sparse, corporate-action-driven series
shares = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", dataset=Dataset.SHARES, parse_dates=True)
print(shares.collect())

# Dividends — event-driven (typically quarterly)
dividends = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", dataset=Dataset.DIVIDENDS, parse_dates=True)
print(dividends.collect())

# Batch fetch — failed symbols are logged, never crash the batch
results = goblin.fetch_many(["AAPL", "MSFT", "GOOGL"], "2024-01-01", "2024-03-31")
for symbol, lf in results.items():
    print(f"{symbol}: {lf.collect().height} rows")

Run the full walkthrough:

python example.py

API

MarketGoblin

MarketGoblin(provider: str, api_key: str | None = None, save_path: str | Path | None = None, **source_kwargs)
Method Description
fetch(symbol, start, end, dataset=Dataset.OHLCV, parse_dates=False) Download, save to disk (if save_path set), return LazyFrame
load(symbol, start, end, dataset=Dataset.OHLCV, parse_dates=False) Load from disk; raises RuntimeError if no save_path
fetch_many(symbols, start, end, dataset=Dataset.OHLCV, parse_dates=False, max_workers=8, requests_per_second=2.0) Batch fetch via ThreadPoolExecutor, rate-limited
supported_datasets (property) frozenset[Dataset] of datasets the configured provider supports

Datasets

Dataset Provider support Columns
Dataset.OHLCV yahoo, csv date (int32), open / high / low / close (float32), volume (int64), is_adjusted (bool), symbol
Dataset.SHARES yahoo date (int32), shares (int64), symbol
Dataset.DIVIDENDS yahoo date (int32), dividend (float32), symbol

OHLCV is returned as a tidy stacked frame: each trading day appears twice (is_adjusted=True and is_adjusted=False). Filter downstream (.filter(pl.col("is_adjusted"))) to pick a variant. Adjusted Open/High/Low are derived locally from the Adj Close / Close ratio — verified to match yfinance's auto_adjust=True output exactly while halving network calls.

Data on disk

Property Detail
Date column int32 YYYYMMDD (e.g. 20240101); parse_dates=Truepl.Date
OHLC columns float32
Volume column int64
Shares column int64
Dividend column float32
Parquet path {save_path}/{provider}/{dataset}/{SYMBOL}/{SYMBOL}_{YYYY-MM}.pq
JSON sidecar Same path, .json extension — row count, date range, per-dataset stats (OHLCV also records has_adjusted/has_raw and missing trading days)

Adding a Provider

import polars as pl

from marketgoblin import Dataset
from marketgoblin.sources.base import BaseSource, Fetcher

class MySource(BaseSource):
    name = "mysource"

    def _build_dispatch(self) -> dict[Dataset, Fetcher]:
        return {Dataset.OHLCV: self._fetch_ohlcv}

    def _fetch_ohlcv(self, symbol: str, start: str, end: str) -> pl.LazyFrame:
        ...  # return a normalized LazyFrame with an is_adjusted column

Per-dataset fetchers all share the (symbol, start, end) signature — there is no adjusted toggle, since OHLCV variants are stacked into a single frame distinguished by the is_adjusted column.

Then register it in goblin.py:

_SOURCES = {"yahoo": YahooSource, "csv": CSVSource, "mysource": MySource}

A CSVSource is included out of the box for loading local CSV files (CSVs hold a single variant — pass is_adjusted=... to stamp the flag on every row):

goblin = MarketGoblin(provider="csv", data_dir="./csv_files", is_adjusted=True)
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31")

Running Tests

pytest
pytest --cov=marketgoblin   # with coverage

License

MIT © Antônio Salomão

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marketgoblin-0.3.0.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marketgoblin-0.3.0-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file marketgoblin-0.3.0.tar.gz.

File metadata

  • Download URL: marketgoblin-0.3.0.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marketgoblin-0.3.0.tar.gz
Algorithm Hash digest
SHA256 0df6e8d577798b55eda98e9f0adac6b6237f419057aa0f72eff51f6fd32c5d35
MD5 58c9a646dd7bf37e7c2173338ede77c8
BLAKE2b-256 1cd13cec563dea4bf1fabc95855c66346a1632fe0f57e5b029305afbde46b50e

See more details on using hashes here.

Provenance

The following attestation bundles were made for marketgoblin-0.3.0.tar.gz:

Publisher: publish.yml on aexsalomao/marketgoblin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file marketgoblin-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: marketgoblin-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marketgoblin-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 56b26a7c21b9fd6d4a709c926f0b97bf74822d5b01b9a505dbfdb829890d63ef
MD5 d0af03dddd571644fb6422e22227965d
BLAKE2b-256 83ce99fb0f8ac732d7df1580991fbbcc653ca15f8461eff61d2b4aad355d933c

See more details on using hashes here.

Provenance

The following attestation bundles were made for marketgoblin-0.3.0-py3-none-any.whl:

Publisher: publish.yml on aexsalomao/marketgoblin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page