Skip to main content

Market data platform for downloading and storing financial OHLCV data

Project description

marketgoblin

Download, store, and load financial market data — fast and without fuss.

PyPI Python License CI codecov Docs

marketgoblin is a lightweight market data platform built on Polars and yfinance. It fetches multiple datasets (OHLCV, shares-outstanding, dividends), slices them into monthly Parquet files, writes JSON sidecars with metadata, and lets you load them back with a single call.


Features

  • Multi-dataset — OHLCV, shares-outstanding, and dividends selected via a Dataset enum; per-source dispatch makes it easy to add more
  • Tidy stacked OHLCV — adjusted and raw prices live in one frame, distinguished by an is_adjusted bool column; one network call per symbol covers both
  • Single-symbol and batch fetchfetch() and fetch_many() with thread-pool concurrency
  • Disk persistence — monthly .pq slices with atomic writes; JSON sidecar per slice
  • Lazy evaluation — all data paths return pl.LazyFrame (Polars)
  • Date flexibility — dates stored as int32 YYYYMMDD on disk; use parse_dates=True to get pl.Date
  • Retry logicYahooSource retries transient failures with exponential backoff (3 attempts)
  • Rate limitingfetch_many() respects a configurable requests-per-second cap (default: 2 req/s)
  • Input validation — dates are validated before any I/O; unsupported (provider, dataset) pairs raise at the dispatch boundary
  • Pluggable providers — subclass BaseSource, implement _build_dispatch(), register in one line; CSVSource included

Installation

pip install marketgoblin

Or with uv:

uv add marketgoblin

For development:

git clone https://github.com/aexsalomao/marketgoblin
cd marketgoblin
uv sync --extra dev

Quick Start

import polars as pl
from marketgoblin import Dataset, MarketGoblin

goblin = MarketGoblin(provider="yahoo", save_path="./data")

# Fetch and persist OHLCV — tidy stacked frame: each trading day appears
# twice (is_adjusted=True / False). Filter to pick a variant.
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
adjusted = lf.filter(pl.col("is_adjusted")).collect()
print(adjusted)

# Load back from disk (no network call)
lf = goblin.load("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
print(lf.collect())

# Shares outstanding — sparse, corporate-action-driven series
shares = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", dataset=Dataset.SHARES, parse_dates=True)
print(shares.collect())

# Dividends — event-driven (typically quarterly)
dividends = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", dataset=Dataset.DIVIDENDS, parse_dates=True)
print(dividends.collect())

# Batch fetch — failed symbols are logged, never crash the batch
results = goblin.fetch_many(["AAPL", "MSFT", "GOOGL"], "2024-01-01", "2024-03-31")
for symbol, lf in results.items():
    print(f"{symbol}: {lf.collect().height} rows")

Run the full walkthrough:

python example.py

API

MarketGoblin

MarketGoblin(provider: str, api_key: str | None = None, save_path: str | Path | None = None, **source_kwargs)
Method Description
fetch(symbol, start, end, dataset=Dataset.OHLCV, parse_dates=False) Download, save to disk (if save_path set), return LazyFrame
load(symbol, start, end, dataset=Dataset.OHLCV, parse_dates=False) Load from disk; raises RuntimeError if no save_path
fetch_many(symbols, start, end, dataset=Dataset.OHLCV, parse_dates=False, max_workers=8, requests_per_second=2.0) Batch fetch via ThreadPoolExecutor, rate-limited
supported_datasets (property) frozenset[Dataset] of datasets the configured provider supports

Datasets

Dataset Provider support Columns
Dataset.OHLCV yahoo, csv date (int32), open / high / low / close (float32), volume (int64), is_adjusted (bool), symbol
Dataset.SHARES yahoo date (int32), shares (int64), symbol
Dataset.DIVIDENDS yahoo date (int32), dividend (float32), symbol

OHLCV is returned as a tidy stacked frame: each trading day appears twice (is_adjusted=True and is_adjusted=False). Filter downstream (.filter(pl.col("is_adjusted"))) to pick a variant. Adjusted Open/High/Low are derived locally from the Adj Close / Close ratio — verified to match yfinance's auto_adjust=True output exactly while halving network calls.

Data on disk

Property Detail
Date column int32 YYYYMMDD (e.g. 20240101); parse_dates=Truepl.Date
OHLC columns float32
Volume column int64
Shares column int64
Dividend column float32
Parquet path {save_path}/{provider}/{dataset}/{SYMBOL}/{SYMBOL}_{YYYY-MM}.pq
JSON sidecar Same path, .json extension — row count, date range, per-dataset stats (OHLCV also records has_adjusted/has_raw and missing trading days)

Adding a Provider

import polars as pl

from marketgoblin import Dataset
from marketgoblin.sources.base import BaseSource, Fetcher

class MySource(BaseSource):
    name = "mysource"

    def _build_dispatch(self) -> dict[Dataset, Fetcher]:
        return {Dataset.OHLCV: self._fetch_ohlcv}

    def _fetch_ohlcv(self, symbol: str, start: str, end: str) -> pl.LazyFrame:
        ...  # return a normalized LazyFrame with an is_adjusted column

Per-dataset fetchers all share the (symbol, start, end) signature — there is no adjusted toggle, since OHLCV variants are stacked into a single frame distinguished by the is_adjusted column.

Then register it in goblin.py:

_SOURCES = {"yahoo": YahooSource, "csv": CSVSource, "mysource": MySource}

A CSVSource is included out of the box for loading local CSV files (CSVs hold a single variant — pass is_adjusted=... to stamp the flag on every row):

goblin = MarketGoblin(provider="csv", data_dir="./csv_files", is_adjusted=True)
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31")

Running Tests

pytest
pytest --cov=marketgoblin   # with coverage

License

MIT © Antônio Salomão

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marketgoblin-0.2.0.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marketgoblin-0.2.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file marketgoblin-0.2.0.tar.gz.

File metadata

  • Download URL: marketgoblin-0.2.0.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marketgoblin-0.2.0.tar.gz
Algorithm Hash digest
SHA256 63d2a775371aa28e4fe544bf5bc91fc19b0915092e54341718ae6ba521e8d186
MD5 decf63ff9d3f3c0236dde686897a6e2d
BLAKE2b-256 a9bf8076f861ee48f0c7ddcbe6d5c0fe1de75ede4b763b4472ea4149701ace41

See more details on using hashes here.

Provenance

The following attestation bundles were made for marketgoblin-0.2.0.tar.gz:

Publisher: publish.yml on aexsalomao/marketgoblin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file marketgoblin-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: marketgoblin-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marketgoblin-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ebe6e1c77bcc4b8130e618d9ae151e96dcc25fe5091229148d39ddcacea7133
MD5 ddbf45b4593ac8c04cebdd12e91463e3
BLAKE2b-256 3bd3644c22395549d00cbb743406029cb622c6a6463cd1aeaae2bc257ff6c80e

See more details on using hashes here.

Provenance

The following attestation bundles were made for marketgoblin-0.2.0-py3-none-any.whl:

Publisher: publish.yml on aexsalomao/marketgoblin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page