Safe, robots.txt-respecting scraper for public NDTV Profit news data — for research and NLP sentiment training.
Project description
ndtv_profit_scraper_safe
A safe, robots.txt-respecting scraper library for collecting public NDTV Profit / NDTV business-news data for Indian market sentiment, swing trading research, and NLP training.
Rules
- Respects
robots.txt. - Does NOT bypass login, paywall, captcha, Cloudflare, anti-bot systems, or rate limits.
- Adds delay, retry, timeout, and logging.
- Saves raw and clean data; returns pandas DataFrames.
- Intended for research and NLP sentiment training only.
Data Categories
Latest news, Markets, Stocks, Business, Economy, Companies, IPO, Personal Finance, Mutual Funds, Commodities, Currency, Videos metadata, Market analysis, Expert views.
Folder Structure
ndtv_profit_scraper_safe/
├── requirements.txt
├── main.py
├── README.md
├── ndtv_profit_scraper/
│ ├── __init__.py
│ ├── config.py
│ ├── http_client.py
│ ├── robots_checker.py
│ ├── url_collector.py
│ ├── html_collector.py
│ ├── parser.py
│ ├── sentiment.py
│ └── storage.py
└── data/
├── raw/
└── clean/
Run
pip install -r requirements.txt
python main.py
Next Improvements
- Add sitemap collector
- Add RSS collector if feed endpoints are confirmed
- Add stock-symbol mapping using NSE master list
- Add impact score: Low / Medium / High
- Add SQLite upsert to avoid duplicates
- Add FastAPI endpoints
- Add scheduler
- Add PostgreSQL storage
- Combine with Moneycontrol, ET, LiveMint, CNBC-TV18, Business Standard
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ndtv_profit_scraper_safe-0.1.0.tar.gz.
File metadata
- Download URL: ndtv_profit_scraper_safe-0.1.0.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c70b407361fda743eb3242a9e2abc4fe8f96572fd8fc626266b45c2092c0c58
|
|
| MD5 |
a828c50551bbe00a411858a7b706f9f7
|
|
| BLAKE2b-256 |
ce7acf2885e1551196868f247c59bbbcdaae5ed63556e3a88e4b8bfa9ad6faa6
|
File details
Details for the file ndtv_profit_scraper_safe-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ndtv_profit_scraper_safe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2851c40018775bcc35b4db0d4f0a3a07d9995710372fb9e6789b40764962833
|
|
| MD5 |
99adbcac150551bea1ab21ba19362f8f
|
|
| BLAKE2b-256 |
611dd73243b2e7f46847f8078925825cee9c450af0ed6423fac15e3430d7b158
|