Market data platform for downloading and storing financial OHLCV data
Project description
marketgoblin
Download, store, and load financial OHLCV data — fast and without fuss.
marketgoblin is a lightweight market data platform built on Polars and yfinance. It fetches OHLCV data, slices it into monthly Parquet files, writes JSON sidecars with metadata, and lets you load it back with a single call.
Features
- Single-symbol and batch fetch —
fetch()andfetch_many()with thread-pool concurrency - Disk persistence — monthly
.pqslices with atomic writes; JSON sidecar per slice - Lazy evaluation — all data paths return
pl.LazyFrame(Polars) - Date flexibility — dates stored as
int32YYYYMMDD on disk; useparse_dates=Trueto getpl.Date - Retry logic —
YahooSourceretries transient failures with exponential backoff (3 attempts) - Rate limiting —
fetch_many()respects a configurable requests-per-second cap (default: 2 req/s) - Input validation — dates are validated for format and ordering before any I/O
- Pluggable providers — subclass
BaseSourceand register in one line;CSVSourceincluded
Installation
pip install marketgoblin
Or with uv:
uv add marketgoblin
For development:
git clone https://github.com/aexsalomao/marketgoblin
cd marketgoblin
uv sync --extra dev
Quick Start
from marketgoblin import MarketGoblin
goblin = MarketGoblin(provider="yahoo", save_path="./data")
# Fetch and persist
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
print(lf.collect())
# Load back from disk
lf = goblin.load("AAPL", "2024-01-01", "2024-03-31", parse_dates=True)
print(lf.collect())
# Batch fetch — failed symbols are logged, never crash the batch
results = goblin.fetch_many(["AAPL", "MSFT", "GOOGL"], "2024-01-01", "2024-03-31")
for symbol, lf in results.items():
print(f"{symbol}: {lf.collect().height} rows")
Run the full walkthrough:
python example.py
API
MarketGoblin
MarketGoblin(provider: str, api_key: str | None = None, save_path: str | Path | None = None)
| Method | Description |
|---|---|
fetch(symbol, start, end, adjusted=True, parse_dates=False) |
Download, save to disk (if save_path set), return LazyFrame |
load(symbol, start, end, adjusted=True, parse_dates=False) |
Load from disk; raises RuntimeError if no save_path |
fetch_many(symbols, start, end, adjusted=True, parse_dates=False, max_workers=8, requests_per_second=2.0) |
Batch fetch via ThreadPoolExecutor, rate-limited |
Data on disk
| Property | Detail |
|---|---|
| Date column | int32 YYYYMMDD (e.g. 20240101); parse_dates=True → pl.Date |
| OHLC columns | float32 |
| Volume column | int64 |
| Parquet path | {save_path}/{provider}/ohlcv/{adjusted|raw}/{SYMBOL}/{SYMBOL}_{YYYY-MM}.pq |
| JSON sidecar | Same path, .json extension — row count, date range, OHLCV stats, missing trading days |
Adding a Provider
from marketgoblin.sources.base import BaseSource
import polars as pl
class MySource(BaseSource):
name = "mysource"
def fetch(self, symbol, start, end, adjusted=True) -> pl.LazyFrame:
... # return a normalized LazyFrame
Then register it in goblin.py:
_SOURCES = {"yahoo": YahooSource, "csv": CSVSource, "mysource": MySource}
A CSVSource is included out of the box for loading local CSV files:
goblin = MarketGoblin(provider="csv", data_dir="./csv_files")
lf = goblin.fetch("AAPL", "2024-01-01", "2024-03-31")
Running Tests
pytest
pytest --cov=marketgoblin # with coverage
License
MIT © Antônio Salomão
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file marketgoblin-0.1.0.tar.gz.
File metadata
- Download URL: marketgoblin-0.1.0.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f4d9eb847e491ba1e8ba38aeb4872314b73df33c7aff649fcef8521525360de
|
|
| MD5 |
d6e985602563fc587e11e3bdcc4f08d1
|
|
| BLAKE2b-256 |
10bce00fd44668c5615ef134319c9bd51d22ce7aa8536c80e2779587a84e213c
|
Provenance
The following attestation bundles were made for marketgoblin-0.1.0.tar.gz:
Publisher:
publish.yml on aexsalomao/marketgoblin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marketgoblin-0.1.0.tar.gz -
Subject digest:
3f4d9eb847e491ba1e8ba38aeb4872314b73df33c7aff649fcef8521525360de - Sigstore transparency entry: 1319478449
- Sigstore integration time:
-
Permalink:
aexsalomao/marketgoblin@a723af2cfd84115fd503016ef3d9a60b3781a534 -
Branch / Tag:
refs/tags/alpha-release - Owner: https://github.com/aexsalomao
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a723af2cfd84115fd503016ef3d9a60b3781a534 -
Trigger Event:
release
-
Statement type:
File details
Details for the file marketgoblin-0.1.0-py3-none-any.whl.
File metadata
- Download URL: marketgoblin-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90c695a224fc8c803e8ff274fc0158c2cd5b9f0977b93ff1e6173faef3354d55
|
|
| MD5 |
72373ef3282d8dff73f4a2e24a8ee334
|
|
| BLAKE2b-256 |
1634950d5ea048f4b4566d40a1390c57895c480847ef57b0f6b7563c91933a04
|
Provenance
The following attestation bundles were made for marketgoblin-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on aexsalomao/marketgoblin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marketgoblin-0.1.0-py3-none-any.whl -
Subject digest:
90c695a224fc8c803e8ff274fc0158c2cd5b9f0977b93ff1e6173faef3354d55 - Sigstore transparency entry: 1319478558
- Sigstore integration time:
-
Permalink:
aexsalomao/marketgoblin@a723af2cfd84115fd503016ef3d9a60b3781a534 -
Branch / Tag:
refs/tags/alpha-release - Owner: https://github.com/aexsalomao
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a723af2cfd84115fd503016ef3d9a60b3781a534 -
Trigger Event:
release
-
Statement type: