A polite, RSS-first Economic Times news and sentiment collector for market research.

These details have not been verified by PyPI

Project links

Project description

et-scraper-safe

A polite, RSS-first Python library for collecting public Economic Times news headlines and tagging them with simple sentiment — built for market and swing-trading research pipelines.

PyPI: https://pypi.org/project/et-scraper-safe/

Why "safe"?

This library is designed to be a good citizen of the web:

✅ Respects robots.txt before fetching any HTML page
✅ Prefers RSS feeds over HTML scraping
✅ Adds a configurable delay between HTML requests
✅ Sends a clear, identifiable User-Agent
❌ Does not bypass logins, paywalls, captchas, Cloudflare, or rate limits
❌ Does not scrape any content the publisher has restricted

If robots.txt disallows a URL, the request is skipped — full stop.

Install

pip install et-scraper-safe

Requires Python 3.9+.

Quick start

1. As a command-line tool

After install, a console command is available:

et-scraper-safe

This will:

Fetch all configured Economic Times RSS feeds.
Score each headline's sentiment.
Save raw + cleaned CSVs into ./data/raw/ and ./data/clean/.
Print a summary of bullish / bearish / neutral counts.

2. As a Python library

from et_scraper import (
    fetch_all_rss_news,
    sentiment_score,
    sentiment_label,
    save_dataframe,
)

df = fetch_all_rss_news()
df["sentiment_score"] = df["title"].apply(sentiment_score)
df["sentiment_label"] = df["sentiment_score"].apply(sentiment_label)

print(df.head())
save_dataframe(df, folder="data/clean", name="et_news_clean")

Categories collected

Category	Source
`latest`	Top stories RSS
`markets`	Markets RSS
`stocks`	Stocks RSS
`economy`	Economy RSS
`business`	Company / business RSS
`ipo`	IPO RSS
`mutual_funds`	Mutual funds RSS
`commodities`	Commodities RSS
`forex`	Forex RSS

Feed URLs are defined in et_scraper/config.py and can be extended.

Output schema

Each row of the returned pandas.DataFrame has:

Column	Description
`source`	Always `"economic_times"`
`category`	One of the categories above
`title`	Article headline
`summary`	RSS summary / description
`link`	Canonical article URL
`published`	Publish timestamp from the feed
`fetched_at`	UTC ISO timestamp when the row was collected
`sentiment_score`	Integer = positive_word_count − negative_word_count
`sentiment_label`	`"Bullish"`, `"Bearish"`, or `"Neutral"`

Example:

source,category,title,summary,link,published,sentiment_score,sentiment_label
economic_times,stocks,Tata Motors shares rally...,...,link,...,2,Bullish
economic_times,economy,Rupee falls against dollar...,...,link,...,-1,Bearish

Public API

Symbol	What it does
`fetch_all_rss_news() -> DataFrame`	Fetch all configured RSS feeds into a DataFrame.
`sentiment_score(text: str) -> int`	Lexicon-based score: `positive − negative` word counts.
`sentiment_label(score: int) -> str`	Map a score to `"Bullish"` / `"Bearish"` / `"Neutral"`.
`save_dataframe(df, folder, name)`	Save a DataFrame to a timestamped CSV; returns the path.
`__version__`	Library version string.

Lower-level helpers (use only if you really need raw HTML):

Symbol	Module
`can_fetch(url, user_agent="*") -> bool`	`et_scraper.robots_checker`
`fetch_public_page(url) -> BeautifulSoup\|None`	`et_scraper.html_collector`
`extract_headlines(soup) -> list[str]`	`et_scraper.parser`
`extract_article_text(soup) -> str`	`et_scraper.parser`

Use in a swing trading pipeline

Economic Times News (this library)
        ↓
Headline Sentiment (this library)
        ↓
Stock Symbol Mapping (your code)
        ↓
Technical Indicators (your code)
        ↓
Final Swing Score (your code)

Project layout

et_scraper_safe/
├── pyproject.toml          # PyPI packaging metadata
├── LICENSE                 # MIT
├── README.md
├── requirements.txt        # For running from source
├── main.py                 # Convenience runner (same as the CLI)
├── et_scraper/
│   ├── __init__.py
│   ├── cli.py              # Entry point for `et-scraper-safe` console command
│   ├── config.py           # RSS feed URLs, headers, timeouts
│   ├── robots_checker.py   # robots.txt enforcement
│   ├── rss_collector.py    # RSS → DataFrame
│   ├── html_collector.py   # Polite, robots-aware HTML fetcher
│   ├── parser.py           # Headline / article-text extraction
│   ├── sentiment.py        # Lexicon-based sentiment
│   └── storage.py          # Timestamped CSV writer
└── data/
    ├── raw/                # Raw scraped CSVs
    └── clean/              # Cleaned + scored CSVs

Development

Run from source:

git clone <your-fork>
cd et_scraper_safe
pip install -r requirements.txt
python main.py

Build + publish a new version (maintainers only):

# 1. Bump version in pyproject.toml and et_scraper/__init__.py
# 2. Build and upload
rm -rf dist build *.egg-info
python -m build
TWINE_USERNAME=__token__ TWINE_PASSWORD="$PYPI_API_TOKEN" python -m twine upload dist/*

Disclaimer

This library only collects data that the Economic Times publishes openly via RSS or pages allowed by their robots.txt. It is intended for personal research and educational use. You are responsible for complying with the Economic Times' Terms of Service and any applicable laws when using this library or the data it collects.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

May 6, 2026

This version

0.2.0

May 6, 2026

0.1.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

et_scraper_safe-0.2.0.tar.gz (16.5 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

et_scraper_safe-0.2.0-py3-none-any.whl (17.3 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file et_scraper_safe-0.2.0.tar.gz.

File metadata

Download URL: et_scraper_safe-0.2.0.tar.gz
Upload date: May 6, 2026
Size: 16.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for et_scraper_safe-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`ace99dc325b4975ee259a418664790f376268f89831dd93062161fb082d28d41`
MD5	`e6e6d5aaa270c9688c7cc2cb70296588`
BLAKE2b-256	`f1af01c4df8ef38ed5db6b591b229dadb2276598c0cc481cba390276230648e1`

See more details on using hashes here.

File details

Details for the file et_scraper_safe-0.2.0-py3-none-any.whl.

File metadata

Download URL: et_scraper_safe-0.2.0-py3-none-any.whl
Upload date: May 6, 2026
Size: 17.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for et_scraper_safe-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd2e6e63795a78df06e50b3fccd7f9365770f86ac92d8fa85663f25746bc4835`
MD5	`6bad8a3d5df176d8340d923c86c7f350`
BLAKE2b-256	`4e50dc4f15d820bae020d2b73229d7674e09f9d00b18a4c9d9ac340cf6bd8652`

See more details on using hashes here.

et-scraper-safe 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

et-scraper-safe

Why "safe"?

Install

Quick start

1. As a command-line tool

2. As a Python library

Categories collected

Output schema

Public API

Use in a swing trading pipeline

Project layout

Development

Disclaimer

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes