Skip to main content

Safe, robots.txt-respecting scraper for public NDTV Profit news data — for research and NLP sentiment training.

Project description

ndtv_profit_scraper_safe

A safe, robots.txt-respecting scraper library for collecting public NDTV Profit / NDTV business-news data for Indian market sentiment, swing trading research, and NLP training.

Rules

  • Respects robots.txt.
  • Does NOT bypass login, paywall, captcha, Cloudflare, anti-bot systems, or rate limits.
  • Adds delay, retry, timeout, and logging.
  • Saves raw and clean data; returns pandas DataFrames.
  • Intended for research and NLP sentiment training only.

Data Categories

Latest news, Markets, Stocks, Business, Economy, Companies, IPO, Personal Finance, Mutual Funds, Commodities, Currency, Videos metadata, Market analysis, Expert views.

Folder Structure

ndtv_profit_scraper_safe/
├── requirements.txt
├── main.py
├── README.md
├── ndtv_profit_scraper/
│   ├── __init__.py
│   ├── config.py
│   ├── http_client.py
│   ├── robots_checker.py
│   ├── url_collector.py
│   ├── html_collector.py
│   ├── parser.py
│   ├── sentiment.py
│   └── storage.py
└── data/
    ├── raw/
    └── clean/

Run

pip install -r requirements.txt
python main.py

Next Improvements

  1. Add sitemap collector
  2. Add RSS collector if feed endpoints are confirmed
  3. Add stock-symbol mapping using NSE master list
  4. Add impact score: Low / Medium / High
  5. Add SQLite upsert to avoid duplicates
  6. Add FastAPI endpoints
  7. Add scheduler
  8. Add PostgreSQL storage
  9. Combine with Moneycontrol, ET, LiveMint, CNBC-TV18, Business Standard

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ndtv_profit_scraper_safe-0.1.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ndtv_profit_scraper_safe-0.1.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file ndtv_profit_scraper_safe-0.1.0.tar.gz.

File metadata

  • Download URL: ndtv_profit_scraper_safe-0.1.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for ndtv_profit_scraper_safe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1c70b407361fda743eb3242a9e2abc4fe8f96572fd8fc626266b45c2092c0c58
MD5 a828c50551bbe00a411858a7b706f9f7
BLAKE2b-256 ce7acf2885e1551196868f247c59bbbcdaae5ed63556e3a88e4b8bfa9ad6faa6

See more details on using hashes here.

File details

Details for the file ndtv_profit_scraper_safe-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ndtv_profit_scraper_safe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2851c40018775bcc35b4db0d4f0a3a07d9995710372fb9e6789b40764962833
MD5 99adbcac150551bea1ab21ba19362f8f
BLAKE2b-256 611dd73243b2e7f46847f8078925825cee9c450af0ed6423fac15e3430d7b158

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page