Robust institutional-grade ingestion toolkit for JSE equity data
Project description
jse-tools
Robust institutional-grade ingestion and validation toolkit for Johannesburg Stock Exchange (JSE) equity data.
Overview
jse-tools is a lightweight but structured data engineering toolkit designed for quantitative equity research workflows focused on JSE-listed securities.
It provides:
- JSE ticker discovery via EOD Historical Data (EODHD) API
- Network resilience with retry logic
- Dataset integrity validation guardrails
- Historical OHLCV ingestion via Yahoo Finance
- Z-score–based statistical outlier filtering
- Structured CSV export pipeline
This package is built for:
- Quantitative researchers
- Systematic strategy developers
- Portfolio engineers
- Academic finance projects
- Institutional-style backtesting workflows
Installation (TestPyPI)
pip install -i https://test.pypi.org/simple/ jse-tools
When released to production PyPI:
pip install jse-tools
Quick Start
from jse_tools.core import get_tickers, download_and_process, save_tickers
API_TOKEN = "YOUR_EODHD_TOKEN"
# Step 1: Fetch JSE tickers
tickers = get_tickers(API_TOKEN)
# Step 2: Download and clean historical data
download_and_process(
tickers=tickers[:10],
start_date="2015-01-01",
end_date="2024-01-01",
output_dir="Stocks"
)
# Step 3: Save consolidated ticker list
save_tickers("Stocks")
Architecture
The package follows a layered defensive data engineering approach.
1. Resilience Layer
safe_request(url, retries=3, delay=2)
Provides retry logic for unstable API connections.
Mitigates:
- Temporary server failures
- Network instability
- Timeout errors
Ensures ingestion pipelines do not fail due to transient connectivity issues.
2. Reliability Layer
verify_dataset_integrity(df, required_columns=None)
Enforces strict dataset quality constraints:
- DataFrame must not be empty
- Required columns must exist
- No column may exceed 50% missing data
- Detects structurally corrupted datasets
Raises descriptive ValueError exceptions on failure.
This prevents polluted data from propagating into downstream models.
3. Data Pipeline Functions
get_tickers(api_token)
Downloads JSE-listed instruments from EODHD and formats them for Yahoo Finance (.JO suffix).
Returns:
list[str]
Example output:
['AGL.JO', 'NPN.JO', 'SOL.JO', ...]
download_and_process(tickers, start_date, end_date, output_dir="Stocks")
Downloads historical OHLCV data using yfinance.
Processing steps:
- Downloads adjusted price data
- Keeps required OHLCV columns
- Validates dataset integrity
- Removes statistical outliers using |Z-score| < 3
- Forward-fills and backward-fills missing values
- Exports each ticker to CSV
Output structure:
Stocks/
AGL.JO.csv
NPN.JO.csv
...
save_tickers(folder_path)
Scans a folder for CSV files and generates:
tickers.csv
Returns:
pandas.DataFrame
Statistical Filtering
Outlier removal is performed using:
|Z-score| < 3
Applied row-wise across OHLCV columns.
This reduces:
- Data spikes
- Erroneous price prints
- Corrupted observations
Improves robustness of:
- Volatility calculations
- Factor models
- Backtesting systems
- Risk analytics
Defensive Data Design Philosophy
The package follows three engineering principles:
- Fail early on corrupted datasets
- Retry transient failures automatically
- Clean statistical anomalies before storage
This design reduces downstream model fragility.
Dependencies
- pandas
- numpy
- scipy
- requests
- yfinance
Versioning
Semantic versioning is followed:
- PATCH (0.2.x): Documentation & minor fixes
- MINOR (0.x.0): Feature additions
- MAJOR (x.0.0): Breaking changes
Roadmap
Planned institutional enhancements: PostgreSQL export adapter Corporate actions reconciliation Market-cap & sector enrichment Structured metrics dashboard Performance analytics module Cloud storage backends (S3, GCS) CLI entrypoint YAML-based configuration layer
License
MIT License
Disclaimer
This package is for research and educational purposes only. It does not constitute financial advice.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jse_tools-0.3.0.tar.gz.
File metadata
- Download URL: jse_tools-0.3.0.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85bd4bd3f19694baa7862c1d29fe81b35709907a6067bd88c2a4d4766ce4cc2c
|
|
| MD5 |
8b21fb77da17840b2fedae817557812f
|
|
| BLAKE2b-256 |
97972d026527dc38ad570da00d9ab0e82ad509577680022ce515aabe3c0cda41
|
File details
Details for the file jse_tools-0.3.0-py3-none-any.whl.
File metadata
- Download URL: jse_tools-0.3.0-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56b285e0792610e6a2ed94000ad7bbb63ddb04c2c5377ee4ee876ab72e2d16be
|
|
| MD5 |
931411d008336aae28223ef5571895e6
|
|
| BLAKE2b-256 |
7f8c87a13c41eb365961f7cfdb2c63f349bb6200f6255f555beed3367534223f
|