Skip to main content

Market data platform for downloading and storing financial OHLCV data

Project description

marketgoblin

Download, store, and load financial OHLCV data — fast and without fuss.

Python License Status CI codecov

marketgoblin is a lightweight market data platform built on Polars and yfinance. It fetches OHLCV data, slices it into monthly Parquet files, writes JSON sidecars with metadata, and lets you load it back with a single call.


Features

  • Single-symbol and batch fetchfetch() and fetch_many() with thread-pool concurrency
  • Disk persistence — monthly .pq slices with atomic writes; JSON sidecar per slice
  • Lazy evaluation — all data paths return pl.LazyFrame (Polars)
  • Date flexibility — dates stored as int32 YYYYMMDD on disk; use parse_dates=True to get pl.Date
  • Retry logicYahooSource retries transient failures with exponential backoff (3 attempts)
  • Rate limitingfetch_many() respects a configurable requests-per-second cap (default: 2 req/s)
  • Input validation — dates are validated for format and ordering before any I/O
  • Pluggable providers — subclass BaseSource and register in one line; CSVSource included

Installation

pip install marketgoblin

Or with uv:

uv add marketgoblin

For development:

git clone https://github.com/aexsalomao/marketgoblin
cd marketgoblin
uv sync --extra dev

Quick Start

from marketgoblin import MarketGoblin

goblin = MarketGoblin(provider="yahoo", save_path="./data")

# Fetch and persist
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
print(lf.collect())

# Load back from disk
lf = goblin.load("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
print(lf.collect())

# Batch fetch — failed symbols are logged, never crash the batch
results = goblin.fetch_many(["AAPL", "MSFT", "GOOGL"], "2024-01-01", "2024-03-31")
for symbol, lf in results.items():
    print(f"{symbol}: {lf.collect().height} rows")

Run the full walkthrough:

python example.py

API

MarketGoblin

MarketGoblin(provider: str, api_key: str | None = None, save_path: str | Path | None = None)
Method Description
fetch(symbol, start, end, adjusted=True, parse_dates=False) Download, save to disk (if save_path set), return LazyFrame
load(symbol, start, end, adjusted=True, parse_dates=False) Load from disk; raises RuntimeError if no save_path
fetch_many(symbols, start, end, adjusted=True, parse_dates=False, max_workers=8, requests_per_second=2.0) Batch fetch via ThreadPoolExecutor, rate-limited

Data on disk

Property Detail
Date column int32 YYYYMMDD (e.g. 20240101); parse_dates=Truepl.Date
OHLC columns float32
Volume column int64
Parquet path {save_path}/{provider}/ohlcv/{adjusted|raw}/{SYMBOL}/{SYMBOL}_{YYYY-MM}.pq
JSON sidecar Same path, .json extension — row count, date range, OHLCV stats, missing trading days

Adding a Provider

from marketgoblin.sources.base import BaseSource
import polars as pl

class MySource(BaseSource):
    name = "mysource"

    def fetch(self, symbol, start, end, adjusted=True) -> pl.LazyFrame:
        ...  # return a normalized LazyFrame

Then register it in goblin.py:

_SOURCES = {"yahoo": YahooSource, "csv": CSVSource, "mysource": MySource}

A CSVSource is included out of the box for loading local CSV files:

goblin = MarketGoblin(provider="csv", data_dir="./csv_files")
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31")

Running Tests

pytest
pytest --cov=marketgoblin   # with coverage

License

MIT © Antônio Salomão

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marketgoblin-0.1.0.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marketgoblin-0.1.0-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file marketgoblin-0.1.0.tar.gz.

File metadata

  • Download URL: marketgoblin-0.1.0.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marketgoblin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3f4d9eb847e491ba1e8ba38aeb4872314b73df33c7aff649fcef8521525360de
MD5 d6e985602563fc587e11e3bdcc4f08d1
BLAKE2b-256 10bce00fd44668c5615ef134319c9bd51d22ce7aa8536c80e2779587a84e213c

See more details on using hashes here.

Provenance

The following attestation bundles were made for marketgoblin-0.1.0.tar.gz:

Publisher: publish.yml on aexsalomao/marketgoblin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file marketgoblin-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: marketgoblin-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marketgoblin-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90c695a224fc8c803e8ff274fc0158c2cd5b9f0977b93ff1e6173faef3354d55
MD5 72373ef3282d8dff73f4a2e24a8ee334
BLAKE2b-256 1634950d5ea048f4b4566d40a1390c57895c480847ef57b0f6b7563c91933a04

See more details on using hashes here.

Provenance

The following attestation bundles were made for marketgoblin-0.1.0-py3-none-any.whl:

Publisher: publish.yml on aexsalomao/marketgoblin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page