Skip to main content

Market data platform for downloading and storing financial OHLCV data

Project description

marketgoblin

Download, store, and load financial OHLCV data — fast and without fuss.

PyPI Python License CI codecov Docs

marketgoblin is a lightweight market data platform built on Polars and yfinance. It fetches OHLCV data, slices it into monthly Parquet files, writes JSON sidecars with metadata, and lets you load it back with a single call.


Features

  • Single-symbol and batch fetchfetch() and fetch_many() with thread-pool concurrency
  • Disk persistence — monthly .pq slices with atomic writes; JSON sidecar per slice
  • Lazy evaluation — all data paths return pl.LazyFrame (Polars)
  • Date flexibility — dates stored as int32 YYYYMMDD on disk; use parse_dates=True to get pl.Date
  • Retry logicYahooSource retries transient failures with exponential backoff (3 attempts)
  • Rate limitingfetch_many() respects a configurable requests-per-second cap (default: 2 req/s)
  • Input validation — dates are validated for format and ordering before any I/O
  • Pluggable providers — subclass BaseSource and register in one line; CSVSource included

Installation

pip install marketgoblin

Or with uv:

uv add marketgoblin

For development:

git clone https://github.com/aexsalomao/marketgoblin
cd marketgoblin
uv sync --extra dev

Quick Start

from marketgoblin import MarketGoblin

goblin = MarketGoblin(provider="yahoo", save_path="./data")

# Fetch and persist
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
print(lf.collect())

# Load back from disk
lf = goblin.load("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
print(lf.collect())

# Batch fetch — failed symbols are logged, never crash the batch
results = goblin.fetch_many(["AAPL", "MSFT", "GOOGL"], "2024-01-01", "2024-03-31")
for symbol, lf in results.items():
    print(f"{symbol}: {lf.collect().height} rows")

Run the full walkthrough:

python example.py

API

MarketGoblin

MarketGoblin(provider: str, api_key: str | None = None, save_path: str | Path | None = None)
Method Description
fetch(symbol, start, end, adjusted=True, parse_dates=False) Download, save to disk (if save_path set), return LazyFrame
load(symbol, start, end, adjusted=True, parse_dates=False) Load from disk; raises RuntimeError if no save_path
fetch_many(symbols, start, end, adjusted=True, parse_dates=False, max_workers=8, requests_per_second=2.0) Batch fetch via ThreadPoolExecutor, rate-limited

Data on disk

Property Detail
Date column int32 YYYYMMDD (e.g. 20240101); parse_dates=Truepl.Date
OHLC columns float32
Volume column int64
Parquet path {save_path}/{provider}/ohlcv/{adjusted|raw}/{SYMBOL}/{SYMBOL}_{YYYY-MM}.pq
JSON sidecar Same path, .json extension — row count, date range, OHLCV stats, missing trading days

Adding a Provider

from marketgoblin.sources.base import BaseSource
import polars as pl

class MySource(BaseSource):
    name = "mysource"

    def fetch(self, symbol, start, end, adjusted=True) -> pl.LazyFrame:
        ...  # return a normalized LazyFrame

Then register it in goblin.py:

_SOURCES = {"yahoo": YahooSource, "csv": CSVSource, "mysource": MySource}

A CSVSource is included out of the box for loading local CSV files:

goblin = MarketGoblin(provider="csv", data_dir="./csv_files")
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31")

Running Tests

pytest
pytest --cov=marketgoblin   # with coverage

License

MIT © Antônio Salomão

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marketgoblin-0.1.2.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marketgoblin-0.1.2-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file marketgoblin-0.1.2.tar.gz.

File metadata

  • Download URL: marketgoblin-0.1.2.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marketgoblin-0.1.2.tar.gz
Algorithm Hash digest
SHA256 20a94d2c3fcb84013c7053d33f9a45ac616498d9f07c2bf873e1fe51dc4ebb31
MD5 adae7977e209e956994924d4142a38ff
BLAKE2b-256 92128ba93d5680779a1d91c967d091313aad9901568a8e8c2914e320144912e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for marketgoblin-0.1.2.tar.gz:

Publisher: publish.yml on aexsalomao/marketgoblin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file marketgoblin-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: marketgoblin-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marketgoblin-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ba7322a55647a5957d77554d7c63287e1eb57f5c6c129465af49c0afcb58a1cc
MD5 35fe01c8d87fbad44360ee14229a981e
BLAKE2b-256 e81fa138048216004dee447444e546e3deb9891fa95b3989e8a594b7d5637ff9

See more details on using hashes here.

Provenance

The following attestation bundles were made for marketgoblin-0.1.2-py3-none-any.whl:

Publisher: publish.yml on aexsalomao/marketgoblin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page