Skip to main content

Ultra-Low Latency C++ HFT Engine for Python by falcon7

Project description

⚡SHAURYA v.0.2.0 - Scalable High-Frequency Architecture for Ultra-low Response Yield Access

Language Latency Architecture Parsing Compiler AI Risk Pipeline Training Fusion Execution Inference Risk Networking

🧠 Introduction

Shaurya (hft.shaurya) is an ultra-low latency heterogeneous high-frequency trading (HFT) framework that bridges Python-based AI model development with deterministic C++ execution performance.

Designed for:

  • 📈 Quantitative Researchers
  • 🏢 Proprietary Trading Engineers
  • ⚙️ Systems Programmers
  • 🎓 HPC & Compiler Enthusiasts

Shaurya enables deep learning inference, hardware-style risk control, and lock-free networking in a unified deterministic execution pipeline.

⚡ Full pipeline latency: ~88µs
(Network → FIX Parse → AI Inference → FPGA Risk → Routing)


📑 Table of Contents


🧱 Architecture Overview

Python Training → Model Export → LLVM Fusion → Vectorized CPU Execution │ ├── Eigen AI Inference ├── FPGA Risk Firewall └── Lock-Free Networking

Shaurya follows a Heterogeneous Software-in-the-Loop (SIL) architecture:

Layer Purpose
🐍 Python Train ML models (TensorFlow/Keras)
⚙️ C++ Deterministic inference execution
🔌 RTL-style Risk Hardware-like safety validation
🔧 LLVM/Clang Whole-program optimization + LTO fusion

🚀 Key Features

✅ Deterministic AI Inference

  • No Python runtime
  • No GIL
  • No garbage collection pauses
  • Header-only inference
  • Eigen-backed linear algebra

✅ FPGA-Style Risk Firewall

  • Fat-finger protection
  • Kill-switch logic
  • Rate limiting
  • Price-range validation
  • Branchless logic design

✅ Lock-Free Networking

  • SPSC ring buffer
  • std::atomic synchronization
  • Cache-line aligned memory (alignas(64))
  • Zero-copy FIX handling

✅ LLVM Fusion

  • -flto Link-Time Optimization
  • Cross-module inlining
  • Dead code elimination
  • -march=native AVX2 vectorization
  • -ffast-math throughput optimization

📦 Installation

🛣️ Python Gateway

pip install hft.shaurya==0.2.0

C++ Core Requires:

  1. LLVM/Clang
  2. lld linker
  3. C++17 compatible compiler

Build using provided scripts:

clang++ -O3 -flto -march=native -ffast-math ...

Then run:

bin\Shaurya.exe

🔨Usage Guide

🕐 Step 1: Start Market Gateway

python -m hft.shaurya.gateway

or

python bridge.py

The Python layer:

  1. Aggregates exchange feeds

  2. Streams FIX messages locally

  3. Forwards data to C++ core

🕑 Step 2: Launch LLVM C++ Core

bin\Shaurya.exe

Startup Process:

  1. Loads AI weights
  2. Warms CPU instruction cache
  3. Initializes ring buffers
  4. Begins live tick processing

🕒 Step 3 — Review Metrics

After shutdown (Ctrl + C) 🌠 Shaurya_Metrics.txt includes:

  1. Average latency
  2. 99th percentile
  3. Tail latency distribution
  4. Message throughput

🔬 Technical Deep Dive

1️⃣ LLVM/Clang Infrastructure

Shaurya prioritizes LLVM over GCC for:

  • Whole-program analysis

  • Cross-module inlining

  • Vectorized math fusion

  • Aggressive dead-code elimination

  • Compiler flags used:

-flto
-march=native
-ffast-math

2️⃣ Deep Learning Alpha Engine

Model Pipeline

.h5 (Keras)
   ↓
fdeep_model.json
   ↓
Header-only C++ inference

Benefits:

  1. No Python interpreter
  2. No runtime framework
  3. Cache-friendly execution
  4. Deterministic latency

3️⃣ Software-in-the-Loop FPGA Risk Engine

Traditional systems:

if(price > limit) { block(); }

🗺️ Shaurya approach:

  1. Gate-style evaluation
  2. Branchless evaluation trees
  3. Avoids branch predictor penalties
  4. Emulates RTL-style hardware logic

Sample output:

[FPGA: BLOCKED (FAT FINGER)]

4️⃣ Zero-Copy Lock-Free Pipeline

  1. Single-producer single-consumer (SPSC)
  2. Atomic pointer arithmetic
  3. Cache-aligned buffers
  4. No mutex locks
  5. No scheduler interference

📊 Performance Metrics

💪🏻 Benchmark Method

  • Windows QueryPerformanceCounter
  • Full tick lifecycle measurement:
    • Network Buffer
    • FIX Parse
    • AI Inference
    • FPGA Risk Gate
    • Routing

✅ Results

Metric Value
Messages Tested 1000+
Minimum Latency 3.6 µs
Average Latency 88.38 µs
99th Percentile 237.0 µs

99% of trades complete in under 0.25 milliseconds, even under OS scheduler load.


🔩 Configuration

Key Optimization Flags

-O3
-flto
-march=native
-ffast-math

🍁 Recommended System Tuning

  • Disable power-saving modes
  • Pin threads to dedicated CPU cores
  • Use performance CPU governor (Linux)
  • Disable unnecessary background processes

💡 Examples

Running a Trained Model

  1. Train model in Python
  2. Export .h5
  3. Convert to fdeep_model.json
  4. Place model in inference directory
  5. Launch core engine

Risk Rule Example

RiskGate fatFinger( max_notional = 1'000'000 );
RiskGate priceClamp( max_slippage = 0.5% );

🎯 Who Benefits?

📈 Retail & Quant Traders

  • AI-driven live execution
  • Sub-millisecond architecture
  • Institutional-grade safety

🏢 Proprietary Firms

  • Rapid FPGA prototyping (SIL)
  • Deterministic backtesting
  • Infrastructure experimentation

🎓 Computer Science Students

Real-world examples of:

  • Lock-free systems
  • LLVM optimization
  • Vectorized math
  • HPC finance pipelines

🚀 Roadmap

  • GPU kernel fusion experiments
  • Native FPGA backend
  • Linux ultra-low-latency build
  • Advanced order routing simulator
  • Real exchange connectivity modules

🧪 Troubleshooting

📈 High Latency Spikes

  • Verify CPU scaling disabled
  • Ensure LTO enabled
  • Confirm AVX2 available

🙅🏻‍♀️ Model Not Loading

  • Validate fdeep_model.json
  • Ensure correct path
  • Check weight precision compatibility

🏢 Build Issues

  • Confirm Clang version compatibility
  • Ensure lld installed
  • Rebuild with verbose logging

☢️ Disclaimer

Shaurya is intended for:

  • Research
  • Education
  • Systems experimentation

It is **not financial advice** and **not production-certified trading infrastructure**.

Users assume full responsibility for:

  • Trading decisions
  • Compliance
  • Regulatory adherence
  • Capital risk

🏁 Final Note

The engine is solely contributed by Harshit Kumar Singh, me(;

Shaurya v.0.2.0 represents a shift toward democratized institutional-grade infrastructure : merging AI, compiler engineering, and hardware-style safety into a single deterministic execution engine.

If this project helps you, consider ⭐ starring the repository and contributing to future releases and till then happy coding 😊.

ad astra per aspera 🛩️

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hft_shaurya-0.2.0.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hft_shaurya-0.2.0-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file hft_shaurya-0.2.0.tar.gz.

File metadata

  • Download URL: hft_shaurya-0.2.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for hft_shaurya-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4e0ad054e3e787ede3ab16bec1a08e5c866416dbe5e60d3f98afcb6156e68f11
MD5 2becd938eba6f5c2773fc53d48f0a05b
BLAKE2b-256 04b4f70b486abfd3d12ba2e9d980173333c71ed2f11a5d5203192822f5795ea1

See more details on using hashes here.

File details

Details for the file hft_shaurya-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: hft_shaurya-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for hft_shaurya-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd632bf611f9ceea23f20a9173193918a802f27cbffcbf96f8d39cc12b223754
MD5 3f37636710b4b10394dd6fab3e7b1baa
BLAKE2b-256 497403c82bdfb67361fad8d454c1926cb518aa584d052d71b21e5eef87b0446f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page