Ultra-Low Latency C++ HFT Engine for Python by falcon7
Project description
⚡SHAURYA v.0.2.0 - Scalable High-Frequency Architecture for Ultra-low Response Yield Access
🧠 Introduction
Shaurya (hft.shaurya) is an ultra-low latency heterogeneous high-frequency trading (HFT) framework that bridges Python-based AI model development with deterministic C++ execution performance.
Designed for:
- 📈 Quantitative Researchers
- 🏢 Proprietary Trading Engineers
- ⚙️ Systems Programmers
- 🎓 HPC & Compiler Enthusiasts
Shaurya enables deep learning inference, hardware-style risk control, and lock-free networking in a unified deterministic execution pipeline.
⚡ Full pipeline latency: ~88µs
(Network → FIX Parse → AI Inference → FPGA Risk → Routing)
📑 Table of Contents
- Architecture Overview
- Key Features
- Installation
- Usage Guide
- Technical Deep Dive
- Performance Metrics
- Configuration
- Examples
- Troubleshooting
- Roadmap
🧱 Architecture Overview
Python Training → Model Export → LLVM Fusion → Vectorized CPU Execution │ ├── Eigen AI Inference ├── FPGA Risk Firewall └── Lock-Free Networking
Shaurya follows a Heterogeneous Software-in-the-Loop (SIL) architecture:
| Layer | Purpose |
|---|---|
| 🐍 Python | Train ML models (TensorFlow/Keras) |
| ⚙️ C++ | Deterministic inference execution |
| 🔌 RTL-style Risk | Hardware-like safety validation |
| 🔧 LLVM/Clang | Whole-program optimization + LTO fusion |
🚀 Key Features
✅ Deterministic AI Inference
- No Python runtime
- No GIL
- No garbage collection pauses
- Header-only inference
- Eigen-backed linear algebra
✅ FPGA-Style Risk Firewall
- Fat-finger protection
- Kill-switch logic
- Rate limiting
- Price-range validation
- Branchless logic design
✅ Lock-Free Networking
- SPSC ring buffer
std::atomicsynchronization- Cache-line aligned memory (
alignas(64)) - Zero-copy FIX handling
✅ LLVM Fusion
-fltoLink-Time Optimization- Cross-module inlining
- Dead code elimination
-march=nativeAVX2 vectorization-ffast-maththroughput optimization
📦 Installation
🛣️ Python Gateway
pip install hft.shaurya==0.2.0
C++ Core Requires:
- LLVM/Clang
lld linker- C++17 compatible compiler
Build using provided scripts:
clang++ -O3 -flto -march=native -ffast-math ...
Then run:
bin\Shaurya.exe
🔨Usage Guide
🕐 Step 1: Start Market Gateway
python -m hft.shaurya.gateway
or
python bridge.py
The Python layer:
-
Aggregates exchange feeds
-
Streams FIX messages locally
-
Forwards data to C++ core
🕑 Step 2: Launch LLVM C++ Core
bin\Shaurya.exe
Startup Process:
- Loads AI weights
- Warms CPU instruction cache
- Initializes ring buffers
- Begins live tick processing
🕒 Step 3 — Review Metrics
After shutdown (Ctrl + C) 🌠 Shaurya_Metrics.txt includes:
- Average latency
- 99th percentile
- Tail latency distribution
- Message throughput
🔬 Technical Deep Dive
1️⃣ LLVM/Clang Infrastructure
Shaurya prioritizes LLVM over GCC for:
-
Whole-program analysis
-
Cross-module inlining
-
Vectorized math fusion
-
Aggressive dead-code elimination
-
Compiler flags used:
-flto
-march=native
-ffast-math
2️⃣ Deep Learning Alpha Engine
Model Pipeline
.h5 (Keras)
↓
fdeep_model.json
↓
Header-only C++ inference
Benefits:
- No Python interpreter
- No runtime framework
- Cache-friendly execution
- Deterministic latency
3️⃣ Software-in-the-Loop FPGA Risk Engine
Traditional systems:
if(price > limit) { block(); }
🗺️ Shaurya approach:
- Gate-style evaluation
- Branchless evaluation trees
- Avoids branch predictor penalties
- Emulates RTL-style hardware logic
Sample output:
[FPGA: BLOCKED (FAT FINGER)]
4️⃣ Zero-Copy Lock-Free Pipeline
- Single-producer single-consumer (SPSC)
- Atomic pointer arithmetic
- Cache-aligned buffers
- No mutex locks
- No scheduler interference
📊 Performance Metrics
💪🏻 Benchmark Method
- Windows
QueryPerformanceCounter - Full tick lifecycle measurement:
- Network Buffer
- FIX Parse
- AI Inference
- FPGA Risk Gate
- Routing
✅ Results
| Metric | Value |
|---|---|
| Messages Tested | 1000+ |
| Minimum Latency | 3.6 µs |
| Average Latency | 88.38 µs |
| 99th Percentile | 237.0 µs |
99% of trades complete in under 0.25 milliseconds, even under OS scheduler load.
🔩 Configuration
Key Optimization Flags
-O3
-flto
-march=native
-ffast-math
🍁 Recommended System Tuning
- Disable power-saving modes
- Pin threads to dedicated CPU cores
- Use performance CPU governor (Linux)
- Disable unnecessary background processes
💡 Examples
Running a Trained Model
- Train model in Python
- Export
.h5 - Convert to
fdeep_model.json - Place model in inference directory
- Launch core engine
Risk Rule Example
RiskGate fatFinger( max_notional = 1'000'000 );
RiskGate priceClamp( max_slippage = 0.5% );
🎯 Who Benefits?
📈 Retail & Quant Traders
- AI-driven live execution
- Sub-millisecond architecture
- Institutional-grade safety
🏢 Proprietary Firms
- Rapid FPGA prototyping (SIL)
- Deterministic backtesting
- Infrastructure experimentation
🎓 Computer Science Students
Real-world examples of:
- Lock-free systems
- LLVM optimization
- Vectorized math
- HPC finance pipelines
🚀 Roadmap
- GPU kernel fusion experiments
- Native FPGA backend
- Linux ultra-low-latency build
- Advanced order routing simulator
- Real exchange connectivity modules
🧪 Troubleshooting
📈 High Latency Spikes
- Verify CPU scaling disabled
- Ensure LTO enabled
- Confirm AVX2 available
🙅🏻♀️ Model Not Loading
- Validate
fdeep_model.json - Ensure correct path
- Check weight precision compatibility
🏢 Build Issues
- Confirm Clang version compatibility
- Ensure
lldinstalled - Rebuild with verbose logging
☢️ Disclaimer
Shaurya is intended for:
- Research
- Education
- Systems experimentation
It is **not financial advice** and **not production-certified trading infrastructure**.
Users assume full responsibility for:
- Trading decisions
- Compliance
- Regulatory adherence
- Capital risk
🏁 Final Note
The engine is solely contributed by Harshit Kumar Singh, me(;
Shaurya v.0.2.0 represents a shift toward democratized institutional-grade infrastructure : merging AI, compiler engineering, and hardware-style safety into a single deterministic execution engine.
If this project helps you, consider ⭐ starring the repository and contributing to future releases and till then happy coding 😊.
ad astra per aspera 🛩️
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hft_shaurya-0.2.0.tar.gz.
File metadata
- Download URL: hft_shaurya-0.2.0.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e0ad054e3e787ede3ab16bec1a08e5c866416dbe5e60d3f98afcb6156e68f11
|
|
| MD5 |
2becd938eba6f5c2773fc53d48f0a05b
|
|
| BLAKE2b-256 |
04b4f70b486abfd3d12ba2e9d980173333c71ed2f11a5d5203192822f5795ea1
|
File details
Details for the file hft_shaurya-0.2.0-py3-none-any.whl.
File metadata
- Download URL: hft_shaurya-0.2.0-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd632bf611f9ceea23f20a9173193918a802f27cbffcbf96f8d39cc12b223754
|
|
| MD5 |
3f37636710b4b10394dd6fab3e7b1baa
|
|
| BLAKE2b-256 |
497403c82bdfb67361fad8d454c1926cb518aa584d052d71b21e5eef87b0446f
|