Fast, robots.txt-respecting NSE India market data collector for swing trading, quant research, and backtesting
Project description
nsefast
Fast NSE India data collector for swing trading, quant research, AI training, backtesting, and market intelligence.
⚠️ Ethics & Compliance:
nsefastonly uses publicly downloadable NSE reports and pages allowed by NSE'srobots.txt. It does not bypass logins, captchas, Cloudflare, anti-bot systems, or rate limits. Add appropriate delays and use responsibly. You are responsible for complying with NSE's terms of service.
Features
- Polite, retrying HTTP client with
robots.txtchecks - Modular collectors for equity, derivatives, corporate, deals, indices, surveillance, calendar, and master data
- Polars for fast dataframe processing
- Parquet primary storage, partitioned by dataset/date
- DuckDB local analytics layer
- Optional PostgreSQL storage
- Optional Rust core (
rust-core/) for hashing / dedup / large parsing - Typer-based CLI
Install
pip install nsefast
Optional extras:
pip install "nsefast[pandas]" # pandas export helpers
pip install "nsefast[postgres]" # PostgreSQL sink
pip install "nsefast[api]" # FastAPI server scaffold
pip install "nsefast[dev]" # pytest, ruff, build, twine
For development:
git clone https://github.com/nikhilshinde/nsefast
cd nsefast
pip install -e ".[dev]"
pytest -q
Quick start
# Discover all downloadable report links from NSE public pages
nsefast collect-reports
# Run the full scaffold
nsefast collect-all
# Equity bhavcopy for a date
nsefast collect equity-bhavcopy --date 2026-05-07
# Corporate announcements range
nsefast collect corporate-announcements --start 2026-05-01 --end 2026-05-07
# Build swing-trading features
nsefast features swing --date 2026-05-07
# Export a dataset to Parquet
nsefast export parquet --dataset daily_bhavcopy
In Python:
from nsefast.collectors.report_links import collect_report_links
from nsefast.storage.parquet_store import save_parquet
df = collect_report_links() # polars DataFrame
save_parquet(df, dataset="report_links")
Project layout
nsefast/
├── pyproject.toml
├── requirements.txt
├── main.py
├── README.md
│
├── nsefast/
│ ├── config.py # URLs, headers, paths
│ ├── http_client.py # session + retries
│ ├── robots.py # robots.txt checker
│ ├── collectors/ # one module per data domain
│ ├── processing/ # normalize, features, technicals
│ ├── storage/ # parquet, duckdb, postgres
│ └── cli.py # Typer CLI
│
└── rust-core/ # optional pyo3 module
├── Cargo.toml
└── src/lib.rs
Storage zones
data/raw/— raw downloads exactly as fetcheddata/clean/— normalized intermediate filesdata/parquet/— partitioned Parquet, the canonical store
Rust core (optional)
The rust-core/ crate exposes a nsefast_core Python module via
PyO3 for CPU-bound work (SHA-256 hashing, dedup,
fast CSV normalization). HTTP scraping stays in Python — it's I/O bound.
Build with maturin:
cd rust-core
maturin develop --release
Verify your install
pip install nsefast
nsefast verify # offline checks: imports, parquet, duckdb
nsefast verify --network # also pings NSE warm-up + robots.txt
nsefast version
Cache, logging, partitioning
# Cache (5-min TTL by default; collectors opt in via cached_get())
nsefast cache stats
nsefast cache clear
# Structured JSON logs (for production / log shippers)
NSEFAST_LOG_FORMAT=json NSEFAST_LOG_LEVEL=INFO nsefast collect bulk-deals --start 2026-04-01 --end 2026-05-07
# Hive-partitioned parquet writes
from nsefast.storage.parquet_store import (
save_parquet_partitioned, read_parquet_partitioned, derive_date_partitions,
)
df = derive_date_partitions(df, "trade_date", parts=("year", "month"))
save_parquet_partitioned(df, dataset="daily_bhavcopy", by=["year", "month"])
# -> data/parquet/daily_bhavcopy/year=2026/month=05/*.parquet
q1 = read_parquet_partitioned("daily_bhavcopy",
filters=[("year","==",2026), ("month",">=",4)])
# DuckDB analytics
from nsefast.storage.duckdb_store import (
connect, register_all, top_gainers, sector_leaderboard,
)
con = connect()
register_all(con)
top_gainers(con, dataset="all_indices", n=10)
sector_leaderboard(con, dataset="sector_strength")
Documentation
docs/USAGE.md— full Python + CLI usage, canonical schemas, polite-use rulesdocs/PUBLISHING.md— how to release new versions to PyPICHANGELOG.md— version history
Failure semantics
Every public collector returns a Polars DataFrame with its canonical schema on any failure (invalid input, network error, malformed payload, polars error, robots block). Collectors never raise — your pipelines stay crash-proof.
Tests
pytest -q # 77 unit tests, no network calls
License
MIT — see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nsefast-0.1.1.tar.gz.
File metadata
- Download URL: nsefast-0.1.1.tar.gz
- Upload date:
- Size: 39.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93623dc1ab596eea36657d3876aae9fe1f7a52c3f0cb662fbd1a05efd957db39
|
|
| MD5 |
1924283f494f23a52f036a6d082125af
|
|
| BLAKE2b-256 |
ab96cc8bca00f2b15add18233cb6effbe7e3c2d7845bd7ba98285a45b1cba25f
|
File details
Details for the file nsefast-0.1.1-py3-none-any.whl.
File metadata
- Download URL: nsefast-0.1.1-py3-none-any.whl
- Upload date:
- Size: 45.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
137e3ea00a9bfd73c3987ff97d0dda6b1b9da3c9ea290bcb1953af77c66ca6a7
|
|
| MD5 |
ccec3d20a30d2dec670d17715a7edb02
|
|
| BLAKE2b-256 |
fde917ffbe0103f8ade3b65597b18342e6990cb089fc2d741cc57da54a6da7bf
|