Skip to main content

Market data platform for downloading and storing financial OHLCV data

Project description

marketgoblin

Download, store, and load financial OHLCV data — fast and without fuss.

PyPI Python License CI codecov Docs

marketgoblin is a lightweight market data platform built on Polars and yfinance. It fetches OHLCV data, slices it into monthly Parquet files, writes JSON sidecars with metadata, and lets you load it back with a single call.


Features

  • Single-symbol and batch fetchfetch() and fetch_many() with thread-pool concurrency
  • Disk persistence — monthly .pq slices with atomic writes; JSON sidecar per slice
  • Lazy evaluation — all data paths return pl.LazyFrame (Polars)
  • Date flexibility — dates stored as int32 YYYYMMDD on disk; use parse_dates=True to get pl.Date
  • Retry logicYahooSource retries transient failures with exponential backoff (3 attempts)
  • Rate limitingfetch_many() respects a configurable requests-per-second cap (default: 2 req/s)
  • Input validation — dates are validated for format and ordering before any I/O
  • Pluggable providers — subclass BaseSource and register in one line; CSVSource included

Installation

pip install marketgoblin

Or with uv:

uv add marketgoblin

For development:

git clone https://github.com/aexsalomao/marketgoblin
cd marketgoblin
uv sync --extra dev

Quick Start

from marketgoblin import MarketGoblin

goblin = MarketGoblin(provider="yahoo", save_path="./data")

# Fetch and persist
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
print(lf.collect())

# Load back from disk
lf = goblin.load("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
print(lf.collect())

# Batch fetch — failed symbols are logged, never crash the batch
results = goblin.fetch_many(["AAPL", "MSFT", "GOOGL"], "2024-01-01", "2024-03-31")
for symbol, lf in results.items():
    print(f"{symbol}: {lf.collect().height} rows")

Run the full walkthrough:

python example.py

API

MarketGoblin

MarketGoblin(provider: str, api_key: str | None = None, save_path: str | Path | None = None)
Method Description
fetch(symbol, start, end, adjusted=True, parse_dates=False) Download, save to disk (if save_path set), return LazyFrame
load(symbol, start, end, adjusted=True, parse_dates=False) Load from disk; raises RuntimeError if no save_path
fetch_many(symbols, start, end, adjusted=True, parse_dates=False, max_workers=8, requests_per_second=2.0) Batch fetch via ThreadPoolExecutor, rate-limited

Data on disk

Property Detail
Date column int32 YYYYMMDD (e.g. 20240101); parse_dates=Truepl.Date
OHLC columns float32
Volume column int64
Parquet path {save_path}/{provider}/ohlcv/{adjusted|raw}/{SYMBOL}/{SYMBOL}_{YYYY-MM}.pq
JSON sidecar Same path, .json extension — row count, date range, OHLCV stats, missing trading days

Adding a Provider

from marketgoblin.sources.base import BaseSource
import polars as pl

class MySource(BaseSource):
    name = "mysource"

    def fetch(self, symbol, start, end, adjusted=True) -> pl.LazyFrame:
        ...  # return a normalized LazyFrame

Then register it in goblin.py:

_SOURCES = {"yahoo": YahooSource, "csv": CSVSource, "mysource": MySource}

A CSVSource is included out of the box for loading local CSV files:

goblin = MarketGoblin(provider="csv", data_dir="./csv_files")
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31")

Running Tests

pytest
pytest --cov=marketgoblin   # with coverage

License

MIT © Antônio Salomão

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marketgoblin-0.1.1.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marketgoblin-0.1.1-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file marketgoblin-0.1.1.tar.gz.

File metadata

  • Download URL: marketgoblin-0.1.1.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marketgoblin-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3907d43c9d5c453ab9554b09ab9b6586e6a9757a9dd95fe0b165db6e9ea1adcf
MD5 924cb803f1baeab969d8371e09c4677b
BLAKE2b-256 6e30eb4e8167d91ba0aad66e90b58398a9afc3ef023db9a9b4ad9e8a3c302152

See more details on using hashes here.

Provenance

The following attestation bundles were made for marketgoblin-0.1.1.tar.gz:

Publisher: publish.yml on aexsalomao/marketgoblin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file marketgoblin-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: marketgoblin-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marketgoblin-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b9930593230439de0b18855cdcc52c10013d930482304a76eb45d56f07d418b6
MD5 743c9cabd419a57a543d0b317a14da83
BLAKE2b-256 5237ad33847284d425ae0cc7ef666eeed7de06afe9d918e3d7bc0b70e3593958

See more details on using hashes here.

Provenance

The following attestation bundles were made for marketgoblin-0.1.1-py3-none-any.whl:

Publisher: publish.yml on aexsalomao/marketgoblin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page