Market data platform for downloading and storing financial OHLCV data
Project description
marketgoblin
Download, store, and load financial market data — fast and without fuss.
marketgoblin is a lightweight market data platform built on Polars and yfinance. It fetches multiple datasets (OHLCV, shares-outstanding, dividends), slices them into monthly Parquet files, writes JSON sidecars with metadata, and lets you load them back with a single call.
Features
- Multi-dataset — OHLCV, shares-outstanding, and dividends selected via a
Datasetenum; per-source dispatch makes it easy to add more - Tidy stacked OHLCV — adjusted and raw prices live in one frame, distinguished by an
is_adjustedbool column; one network call per symbol covers both - Single-symbol and batch fetch —
fetch()andfetch_many()with thread-pool concurrency - Disk persistence — monthly
.pqslices with atomic writes; JSON sidecar per slice - Lazy evaluation — all data paths return
pl.LazyFrame(Polars) - Date flexibility — dates stored as
int32YYYYMMDD on disk; useparse_dates=Trueto getpl.Date - Retry logic —
YahooSourceretries transient failures with exponential backoff (3 attempts) - Rate limiting —
fetch_many()respects a configurable requests-per-second cap (default: 2 req/s) - Input validation — dates are validated before any I/O; unsupported
(provider, dataset)pairs raise at the dispatch boundary - Pluggable providers — subclass
BaseSource, implement_build_dispatch(), register in one line;CSVSourceincluded
Installation
pip install marketgoblin
Or with uv:
uv add marketgoblin
For development:
git clone https://github.com/aexsalomao/marketgoblin
cd marketgoblin
uv sync --extra dev
Quick Start
import polars as pl
from marketgoblin import Dataset, MarketGoblin
goblin = MarketGoblin(provider="yahoo", save_path="./data")
# Fetch and persist OHLCV — tidy stacked frame: each trading day appears
# twice (is_adjusted=True / False). Filter to pick a variant.
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
adjusted = lf.filter(pl.col("is_adjusted")).collect()
print(adjusted)
# Load back from disk (no network call)
lf = goblin.load("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
print(lf.collect())
# Shares outstanding — sparse, corporate-action-driven series
shares = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", dataset=Dataset.SHARES, parse_dates=True)
print(shares.collect())
# Dividends — event-driven (typically quarterly)
dividends = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", dataset=Dataset.DIVIDENDS, parse_dates=True)
print(dividends.collect())
# Batch fetch — failed symbols are logged, never crash the batch
results = goblin.fetch_many(["AAPL", "MSFT", "GOOGL"], "2024-01-01", "2024-03-31")
for symbol, lf in results.items():
print(f"{symbol}: {lf.collect().height} rows")
Run the full walkthrough:
python example.py
API
MarketGoblin
MarketGoblin(provider: str, api_key: str | None = None, save_path: str | Path | None = None, **source_kwargs)
| Method | Description |
|---|---|
fetch(symbol, start, end, dataset=Dataset.OHLCV, parse_dates=False) |
Download, save to disk (if save_path set), return LazyFrame |
load(symbol, start, end, dataset=Dataset.OHLCV, parse_dates=False) |
Load from disk; raises RuntimeError if no save_path |
fetch_many(symbols, start, end, dataset=Dataset.OHLCV, parse_dates=False, max_workers=8, requests_per_second=2.0) |
Batch fetch via ThreadPoolExecutor, rate-limited |
supported_datasets (property) |
frozenset[Dataset] of datasets the configured provider supports |
Datasets
| Dataset | Provider support | Columns |
|---|---|---|
Dataset.OHLCV |
yahoo, csv |
date (int32), open / high / low / close (float32), volume (int64), is_adjusted (bool), symbol |
Dataset.SHARES |
yahoo |
date (int32), shares (int64), symbol |
Dataset.DIVIDENDS |
yahoo |
date (int32), dividend (float32), symbol |
OHLCV is returned as a tidy stacked frame: each trading day appears twice (is_adjusted=True and is_adjusted=False). Filter downstream (.filter(pl.col("is_adjusted"))) to pick a variant. Adjusted Open/High/Low are derived locally from the Adj Close / Close ratio — verified to match yfinance's auto_adjust=True output exactly while halving network calls.
Data on disk
| Property | Detail |
|---|---|
| Date column | int32 YYYYMMDD (e.g. 20240101); parse_dates=True → pl.Date |
| OHLC columns | float32 |
| Volume column | int64 |
| Shares column | int64 |
| Dividend column | float32 |
| Parquet path | {save_path}/{provider}/{dataset}/{SYMBOL}/{SYMBOL}_{YYYY-MM}.pq |
| JSON sidecar | Same path, .json extension — row count, date range, per-dataset stats (OHLCV also records has_adjusted/has_raw and missing trading days) |
Adding a Provider
import polars as pl
from marketgoblin import Dataset
from marketgoblin.sources.base import BaseSource, Fetcher
class MySource(BaseSource):
name = "mysource"
def _build_dispatch(self) -> dict[Dataset, Fetcher]:
return {Dataset.OHLCV: self._fetch_ohlcv}
def _fetch_ohlcv(self, symbol: str, start: str, end: str) -> pl.LazyFrame:
... # return a normalized LazyFrame with an is_adjusted column
Per-dataset fetchers all share the (symbol, start, end) signature — there is no adjusted toggle, since OHLCV variants are stacked into a single frame distinguished by the is_adjusted column.
Then register it in goblin.py:
_SOURCES = {"yahoo": YahooSource, "csv": CSVSource, "mysource": MySource}
A CSVSource is included out of the box for loading local CSV files (CSVs hold a single variant — pass is_adjusted=... to stamp the flag on every row):
goblin = MarketGoblin(provider="csv", data_dir="./csv_files", is_adjusted=True)
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31")
Running Tests
pytest
pytest --cov=marketgoblin # with coverage
License
MIT © Antônio Salomão
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file marketgoblin-0.2.0.tar.gz.
File metadata
- Download URL: marketgoblin-0.2.0.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63d2a775371aa28e4fe544bf5bc91fc19b0915092e54341718ae6ba521e8d186
|
|
| MD5 |
decf63ff9d3f3c0236dde686897a6e2d
|
|
| BLAKE2b-256 |
a9bf8076f861ee48f0c7ddcbe6d5c0fe1de75ede4b763b4472ea4149701ace41
|
Provenance
The following attestation bundles were made for marketgoblin-0.2.0.tar.gz:
Publisher:
publish.yml on aexsalomao/marketgoblin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marketgoblin-0.2.0.tar.gz -
Subject digest:
63d2a775371aa28e4fe544bf5bc91fc19b0915092e54341718ae6ba521e8d186 - Sigstore transparency entry: 1342774701
- Sigstore integration time:
-
Permalink:
aexsalomao/marketgoblin@db29d11e54df7a5b062679d44eec6eb393fdbc85 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/aexsalomao
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@db29d11e54df7a5b062679d44eec6eb393fdbc85 -
Trigger Event:
release
-
Statement type:
File details
Details for the file marketgoblin-0.2.0-py3-none-any.whl.
File metadata
- Download URL: marketgoblin-0.2.0-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ebe6e1c77bcc4b8130e618d9ae151e96dcc25fe5091229148d39ddcacea7133
|
|
| MD5 |
ddbf45b4593ac8c04cebdd12e91463e3
|
|
| BLAKE2b-256 |
3bd3644c22395549d00cbb743406029cb622c6a6463cd1aeaae2bc257ff6c80e
|
Provenance
The following attestation bundles were made for marketgoblin-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on aexsalomao/marketgoblin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marketgoblin-0.2.0-py3-none-any.whl -
Subject digest:
4ebe6e1c77bcc4b8130e618d9ae151e96dcc25fe5091229148d39ddcacea7133 - Sigstore transparency entry: 1342774719
- Sigstore integration time:
-
Permalink:
aexsalomao/marketgoblin@db29d11e54df7a5b062679d44eec6eb393fdbc85 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/aexsalomao
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@db29d11e54df7a5b062679d44eec6eb393fdbc85 -
Trigger Event:
release
-
Statement type: