A polite, RSS-first Economic Times news and sentiment collector for market research.
Project description
et-scraper-safe
A polite, RSS-first Python library for collecting public Economic Times news headlines and tagging them with simple sentiment — built for market and swing-trading research pipelines.
Why "safe"?
This library is designed to be a good citizen of the web:
- ✅ Respects
robots.txtbefore fetching any HTML page - ✅ Prefers RSS feeds over HTML scraping
- ✅ Adds a configurable delay between HTML requests
- ✅ Sends a clear, identifiable
User-Agent - ❌ Does not bypass logins, paywalls, captchas, Cloudflare, or rate limits
- ❌ Does not scrape any content the publisher has restricted
If robots.txt disallows a URL, the request is skipped — full stop.
Install
pip install et-scraper-safe
Requires Python 3.9+.
Quick start
1. As a command-line tool
After install, a console command is available:
et-scraper-safe
This will:
- Fetch all configured Economic Times RSS feeds.
- Score each headline's sentiment.
- Save raw + cleaned CSVs into
./data/raw/and./data/clean/. - Print a summary of bullish / bearish / neutral counts.
2. As a Python library
from et_scraper import (
fetch_all_rss_news,
sentiment_score,
sentiment_label,
save_dataframe,
)
df = fetch_all_rss_news()
df["sentiment_score"] = df["title"].apply(sentiment_score)
df["sentiment_label"] = df["sentiment_score"].apply(sentiment_label)
print(df.head())
save_dataframe(df, folder="data/clean", name="et_news_clean")
Categories collected
| Category | Source |
|---|---|
latest |
Top stories RSS |
markets |
Markets RSS |
stocks |
Stocks RSS |
economy |
Economy RSS |
business |
Company / business RSS |
ipo |
IPO RSS |
mutual_funds |
Mutual funds RSS |
commodities |
Commodities RSS |
forex |
Forex RSS |
Feed URLs are defined in et_scraper/config.py and can be extended.
Output schema
Each row of the returned pandas.DataFrame has:
| Column | Description |
|---|---|
source |
Always "economic_times" |
category |
One of the categories above |
title |
Article headline |
summary |
RSS summary / description |
link |
Canonical article URL |
published |
Publish timestamp from the feed |
fetched_at |
UTC ISO timestamp when the row was collected |
sentiment_score |
Integer = positive_word_count − negative_word_count |
sentiment_label |
"Bullish", "Bearish", or "Neutral" |
Example:
source,category,title,summary,link,published,sentiment_score,sentiment_label
economic_times,stocks,Tata Motors shares rally...,...,link,...,2,Bullish
economic_times,economy,Rupee falls against dollar...,...,link,...,-1,Bearish
Public API
| Symbol | What it does |
|---|---|
fetch_all_rss_news() -> DataFrame |
Fetch all configured RSS feeds into a DataFrame. |
sentiment_score(text: str) -> int |
Lexicon-based score: positive − negative word counts. |
sentiment_label(score: int) -> str |
Map a score to "Bullish" / "Bearish" / "Neutral". |
save_dataframe(df, folder, name) |
Save a DataFrame to a timestamped CSV; returns the path. |
__version__ |
Library version string. |
Lower-level helpers (use only if you really need raw HTML):
| Symbol | Module |
|---|---|
can_fetch(url, user_agent="*") -> bool |
et_scraper.robots_checker |
fetch_public_page(url) -> BeautifulSoup|None |
et_scraper.html_collector |
extract_headlines(soup) -> list[str] |
et_scraper.parser |
extract_article_text(soup) -> str |
et_scraper.parser |
Use in a swing trading pipeline
Economic Times News (this library)
↓
Headline Sentiment (this library)
↓
Stock Symbol Mapping (your code)
↓
Technical Indicators (your code)
↓
Final Swing Score (your code)
Project layout
et_scraper_safe/
├── pyproject.toml # PyPI packaging metadata
├── LICENSE # MIT
├── README.md
├── requirements.txt # For running from source
├── main.py # Convenience runner (same as the CLI)
├── et_scraper/
│ ├── __init__.py
│ ├── cli.py # Entry point for `et-scraper-safe` console command
│ ├── config.py # RSS feed URLs, headers, timeouts
│ ├── robots_checker.py # robots.txt enforcement
│ ├── rss_collector.py # RSS → DataFrame
│ ├── html_collector.py # Polite, robots-aware HTML fetcher
│ ├── parser.py # Headline / article-text extraction
│ ├── sentiment.py # Lexicon-based sentiment
│ └── storage.py # Timestamped CSV writer
└── data/
├── raw/ # Raw scraped CSVs
└── clean/ # Cleaned + scored CSVs
Development
Run from source:
git clone <your-fork>
cd et_scraper_safe
pip install -r requirements.txt
python main.py
Build + publish a new version (maintainers only):
# 1. Bump version in pyproject.toml and et_scraper/__init__.py
# 2. Build and upload
rm -rf dist build *.egg-info
python -m build
TWINE_USERNAME=__token__ TWINE_PASSWORD="$PYPI_API_TOKEN" python -m twine upload dist/*
Disclaimer
This library only collects data that the Economic Times publishes openly via
RSS or pages allowed by their robots.txt. It is intended for personal
research and educational use. You are responsible for complying with the
Economic Times' Terms of Service and any applicable laws when using this
library or the data it collects.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file et_scraper_safe-0.2.0.tar.gz.
File metadata
- Download URL: et_scraper_safe-0.2.0.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ace99dc325b4975ee259a418664790f376268f89831dd93062161fb082d28d41
|
|
| MD5 |
e6e6d5aaa270c9688c7cc2cb70296588
|
|
| BLAKE2b-256 |
f1af01c4df8ef38ed5db6b591b229dadb2276598c0cc481cba390276230648e1
|
File details
Details for the file et_scraper_safe-0.2.0-py3-none-any.whl.
File metadata
- Download URL: et_scraper_safe-0.2.0-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd2e6e63795a78df06e50b3fccd7f9365770f86ac92d8fa85663f25746bc4835
|
|
| MD5 |
6bad8a3d5df176d8340d923c86c7f350
|
|
| BLAKE2b-256 |
4e50dc4f15d820bae020d2b73229d7674e09f9d00b18a4c9d9ac340cf6bd8652
|