Skip to main content

Robust institutional-grade ingestion toolkit for JSE equity data

Project description

jse-tools

Robust institutional-grade ingestion and validation toolkit for Johannesburg Stock Exchange (JSE) equity data.


Overview

jse-tools is a lightweight but structured data engineering toolkit designed for quantitative equity research workflows focused on JSE-listed securities.

It provides:

  • JSE ticker discovery via EOD Historical Data (EODHD) API
  • Network resilience with retry logic
  • Dataset integrity validation guardrails
  • Historical OHLCV ingestion via Yahoo Finance
  • Z-score–based statistical outlier filtering
  • Structured CSV export pipeline

This package is built for:

  • Quantitative researchers
  • Systematic strategy developers
  • Portfolio engineers
  • Academic finance projects
  • Institutional-style backtesting workflows

Installation (TestPyPI)

pip install -i https://test.pypi.org/simple/ jse-tools

When released to production PyPI:

pip install jse-tools

Quick Start

from jse_tools.core import get_tickers, download_and_process, save_tickers

API_TOKEN = "YOUR_EODHD_TOKEN"

# Step 1: Fetch JSE tickers
tickers = get_tickers(API_TOKEN)

# Step 2: Download and clean historical data
download_and_process(
    tickers=tickers[:10],
    start_date="2015-01-01",
    end_date="2024-01-01",
    output_dir="Stocks"
)

# Step 3: Save consolidated ticker list
save_tickers("Stocks")

Architecture

The package follows a layered defensive data engineering approach.


1. Resilience Layer

safe_request(url, retries=3, delay=2)

Provides retry logic for unstable API connections.

Mitigates:

  • Temporary server failures
  • Network instability
  • Timeout errors

Ensures ingestion pipelines do not fail due to transient connectivity issues.


2. Reliability Layer

verify_dataset_integrity(df, required_columns=None)

Enforces strict dataset quality constraints:

  • DataFrame must not be empty
  • Required columns must exist
  • No column may exceed 50% missing data
  • Detects structurally corrupted datasets

Raises descriptive ValueError exceptions on failure.

This prevents polluted data from propagating into downstream models.


3. Data Pipeline Functions

get_tickers(api_token)

Downloads JSE-listed instruments from EODHD and formats them for Yahoo Finance (.JO suffix).

Returns:

list[str]

Example output:

['AGL.JO', 'NPN.JO', 'SOL.JO', ...]

download_and_process(tickers, start_date, end_date, output_dir="Stocks")

Downloads historical OHLCV data using yfinance.

Processing steps:

  1. Downloads adjusted price data
  2. Keeps required OHLCV columns
  3. Validates dataset integrity
  4. Removes statistical outliers using |Z-score| < 3
  5. Forward-fills and backward-fills missing values
  6. Exports each ticker to CSV

Output structure:

Stocks/
    AGL.JO.csv
    NPN.JO.csv
    ...

save_tickers(folder_path)

Scans a folder for CSV files and generates:

tickers.csv

Returns:

pandas.DataFrame

Statistical Filtering

Outlier removal is performed using:

|Z-score| < 3

Applied row-wise across OHLCV columns.

This reduces:

  • Data spikes
  • Erroneous price prints
  • Corrupted observations

Improves robustness of:

  • Volatility calculations
  • Factor models
  • Backtesting systems
  • Risk analytics

Defensive Data Design Philosophy

The package follows three engineering principles:

  1. Fail early on corrupted datasets
  2. Retry transient failures automatically
  3. Clean statistical anomalies before storage

This design reduces downstream model fragility.


Dependencies

  • pandas
  • numpy
  • scipy
  • requests
  • yfinance

Versioning

Semantic versioning is followed:

  • PATCH (0.2.x): Documentation & minor fixes
  • MINOR (0.x.0): Feature additions
  • MAJOR (x.0.0): Breaking changes

Roadmap

Planned institutional enhancements: PostgreSQL export adapter Corporate actions reconciliation Market-cap & sector enrichment Structured metrics dashboard Performance analytics module Cloud storage backends (S3, GCS) CLI entrypoint YAML-based configuration layer


License

MIT License


Disclaimer

This package is for research and educational purposes only. It does not constitute financial advice.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jse_tools-0.3.0.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jse_tools-0.3.0-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file jse_tools-0.3.0.tar.gz.

File metadata

  • Download URL: jse_tools-0.3.0.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for jse_tools-0.3.0.tar.gz
Algorithm Hash digest
SHA256 85bd4bd3f19694baa7862c1d29fe81b35709907a6067bd88c2a4d4766ce4cc2c
MD5 8b21fb77da17840b2fedae817557812f
BLAKE2b-256 97972d026527dc38ad570da00d9ab0e82ad509577680022ce515aabe3c0cda41

See more details on using hashes here.

File details

Details for the file jse_tools-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: jse_tools-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for jse_tools-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 56b285e0792610e6a2ed94000ad7bbb63ddb04c2c5377ee4ee876ab72e2d16be
MD5 931411d008336aae28223ef5571895e6
BLAKE2b-256 7f8c87a13c41eb365961f7cfdb2c63f349bb6200f6255f555beed3367534223f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page