Ultra-Low Latency C++ HFT Engine for Python by falcon7

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- Microsoft :: Windows
- POSIX :: Linux
Programming Language
- C++
- Python :: 3

Project description

⚡SHAURYA v.0.2.0 - Scalable High-Frequency Architecture for Ultra-low Response Yield Access

🧠 Introduction

Shaurya (hft.shaurya) is an ultra-low latency heterogeneous high-frequency trading (HFT) framework that bridges Python-based AI model development with deterministic C++ execution performance.

Designed for:

📈 Quantitative Researchers
🏢 Proprietary Trading Engineers
⚙️ Systems Programmers
🎓 HPC & Compiler Enthusiasts

Shaurya enables deep learning inference, hardware-style risk control, and lock-free networking in a unified deterministic execution pipeline.

⚡ Full pipeline latency: ~88µs
(Network → FIX Parse → AI Inference → FPGA Risk → Routing)

🧱 Architecture Overview

Python Training → Model Export → LLVM Fusion → Vectorized CPU Execution │ ├── Eigen AI Inference ├── FPGA Risk Firewall └── Lock-Free Networking

Shaurya follows a Heterogeneous Software-in-the-Loop (SIL) architecture:

Layer	Purpose
🐍 Python	Train ML models (TensorFlow/Keras)
⚙️ C++	Deterministic inference execution
🔌 RTL-style Risk	Hardware-like safety validation
🔧 LLVM/Clang	Whole-program optimization + LTO fusion

🚀 Key Features

✅ Deterministic AI Inference

No Python runtime
No GIL
No garbage collection pauses
Header-only inference
Eigen-backed linear algebra

✅ FPGA-Style Risk Firewall

Fat-finger protection
Kill-switch logic
Rate limiting
Price-range validation
Branchless logic design

✅ Lock-Free Networking

SPSC ring buffer
std::atomic synchronization
Cache-line aligned memory (alignas(64))
Zero-copy FIX handling

✅ LLVM Fusion

-flto Link-Time Optimization
Cross-module inlining
Dead code elimination
-march=native AVX2 vectorization
-ffast-math throughput optimization

📦 Installation

🛣️ Python Gateway

pip install hft.shaurya==0.2.0

C++ Core Requires:

LLVM/Clang
lld linker
C++17 compatible compiler

Build using provided scripts:

clang++ -O3 -flto -march=native -ffast-math ...

Then run:

bin\Shaurya.exe

🔨Usage Guide

🕐 Step 1: Start Market Gateway

python -m hft.shaurya.gateway

python bridge.py

The Python layer:

Aggregates exchange feeds
Streams FIX messages locally
Forwards data to C++ core

🕑 Step 2: Launch LLVM C++ Core

bin\Shaurya.exe

Startup Process:

Loads AI weights
Warms CPU instruction cache
Initializes ring buffers
Begins live tick processing

🕒 Step 3 — Review Metrics

After shutdown (Ctrl + C) 🌠 Shaurya_Metrics.txt includes:

Average latency
99th percentile
Tail latency distribution
Message throughput

🔬 Technical Deep Dive

1️⃣ LLVM/Clang Infrastructure

Shaurya prioritizes LLVM over GCC for:

Whole-program analysis
Cross-module inlining
Vectorized math fusion
Aggressive dead-code elimination
Compiler flags used:

-flto
-march=native
-ffast-math

2️⃣ Deep Learning Alpha Engine

Model Pipeline

.h5 (Keras)
   ↓
fdeep_model.json
   ↓
Header-only C++ inference

Benefits:

No Python interpreter
No runtime framework
Cache-friendly execution
Deterministic latency

3️⃣ Software-in-the-Loop FPGA Risk Engine

Traditional systems:

if(price > limit) { block(); }

🗺️ Shaurya approach:

Gate-style evaluation
Branchless evaluation trees
Avoids branch predictor penalties
Emulates RTL-style hardware logic

Sample output:

[FPGA: BLOCKED (FAT FINGER)]

4️⃣ Zero-Copy Lock-Free Pipeline

Single-producer single-consumer (SPSC)
Atomic pointer arithmetic
Cache-aligned buffers
No mutex locks
No scheduler interference

📊 Performance Metrics

💪🏻 Benchmark Method

Windows QueryPerformanceCounter
Full tick lifecycle measurement:
- Network Buffer
- FIX Parse
- AI Inference
- FPGA Risk Gate
- Routing

✅ Results

Metric	Value
Messages Tested	1000+
Minimum Latency	3.6 µs
Average Latency	88.38 µs
99th Percentile	237.0 µs

99% of trades complete in under 0.25 milliseconds, even under OS scheduler load.

🔩 Configuration

Key Optimization Flags

-O3
-flto
-march=native
-ffast-math

🍁 Recommended System Tuning

Disable power-saving modes
Pin threads to dedicated CPU cores
Use performance CPU governor (Linux)
Disable unnecessary background processes

💡 Examples

Running a Trained Model

Train model in Python
Export .h5
Convert to fdeep_model.json
Place model in inference directory
Launch core engine

Risk Rule Example

RiskGate fatFinger( max_notional = 1'000'000 );
RiskGate priceClamp( max_slippage = 0.5% );

🎯 Who Benefits?

📈 Retail & Quant Traders

AI-driven live execution
Sub-millisecond architecture
Institutional-grade safety

🏢 Proprietary Firms

Rapid FPGA prototyping (SIL)
Deterministic backtesting
Infrastructure experimentation

🎓 Computer Science Students

Real-world examples of:

Lock-free systems
LLVM optimization
Vectorized math
HPC finance pipelines

🚀 Roadmap

GPU kernel fusion experiments
Native FPGA backend
Linux ultra-low-latency build
Advanced order routing simulator
Real exchange connectivity modules

🧪 Troubleshooting

📈 High Latency Spikes

Verify CPU scaling disabled
Ensure LTO enabled
Confirm AVX2 available

🙅🏻‍♀️ Model Not Loading

Validate fdeep_model.json
Ensure correct path
Check weight precision compatibility

🏢 Build Issues

Confirm Clang version compatibility
Ensure lld installed
Rebuild with verbose logging

☢️ Disclaimer

Shaurya is intended for:

Research
Education
Systems experimentation

It is **not financial advice** and **not production-certified trading infrastructure**.

Users assume full responsibility for:

Trading decisions
Compliance
Regulatory adherence
Capital risk

🏁 Final Note

The engine is solely contributed by Harshit Kumar Singh, me(;

Shaurya v.0.2.0 represents a shift toward democratized institutional-grade infrastructure : merging AI, compiler engineering, and hardware-style safety into a single deterministic execution engine.

If this project helps you, consider ⭐ starring the repository and contributing to future releases and till then happy coding 😊.

ad astra per aspera 🛩️

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- Microsoft :: Windows
- POSIX :: Linux
Programming Language
- C++
- Python :: 3

Release history Release notifications | RSS feed

0.3.2

Apr 6, 2026

0.3.1

Apr 6, 2026

0.3.0

Apr 6, 2026

This version

0.2.0

Feb 18, 2026

0.1.0

Dec 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hft_shaurya-0.2.0.tar.gz (5.3 kB view details)

Uploaded Feb 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hft_shaurya-0.2.0-py3-none-any.whl (5.0 kB view details)

Uploaded Feb 18, 2026 Python 3

File details

Details for the file hft_shaurya-0.2.0.tar.gz.

File metadata

Download URL: hft_shaurya-0.2.0.tar.gz
Upload date: Feb 18, 2026
Size: 5.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for hft_shaurya-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`4e0ad054e3e787ede3ab16bec1a08e5c866416dbe5e60d3f98afcb6156e68f11`
MD5	`2becd938eba6f5c2773fc53d48f0a05b`
BLAKE2b-256	`04b4f70b486abfd3d12ba2e9d980173333c71ed2f11a5d5203192822f5795ea1`

See more details on using hashes here.

File details

Details for the file hft_shaurya-0.2.0-py3-none-any.whl.

File metadata

Download URL: hft_shaurya-0.2.0-py3-none-any.whl
Upload date: Feb 18, 2026
Size: 5.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for hft_shaurya-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dd632bf611f9ceea23f20a9173193918a802f27cbffcbf96f8d39cc12b223754`
MD5	`3f37636710b4b10394dd6fab3e7b1baa`
BLAKE2b-256	`497403c82bdfb67361fad8d454c1926cb518aa584d052d71b21e5eef87b0446f`

See more details on using hashes here.

hft.shaurya 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

⚡SHAURYA v.0.2.0 - Scalable High-Frequency Architecture for Ultra-low Response Yield Access

🧠 Introduction

📑 Table of Contents

🧱 Architecture Overview

🚀 Key Features

✅ Deterministic AI Inference

✅ FPGA-Style Risk Firewall

✅ Lock-Free Networking

✅ LLVM Fusion

📦 Installation

🛣️ Python Gateway

🔨Usage Guide

🕐 Step 1: Start Market Gateway

🕑 Step 2: Launch LLVM C++ Core

🕒 Step 3 — Review Metrics

🔬 Technical Deep Dive

1️⃣ LLVM/Clang Infrastructure

2️⃣ Deep Learning Alpha Engine

3️⃣ Software-in-the-Loop FPGA Risk Engine

🗺️ Shaurya approach:

4️⃣ Zero-Copy Lock-Free Pipeline

📊 Performance Metrics

💪🏻 Benchmark Method

✅ Results

🔩 Configuration

💡 Examples

🎯 Who Benefits?

📈 Retail & Quant Traders

🏢 Proprietary Firms

🎓 Computer Science Students

🚀 Roadmap

🧪 Troubleshooting

📈 High Latency Spikes

🙅🏻‍♀️ Model Not Loading

🏢 Build Issues

☢️ Disclaimer

🏁 Final Note

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes