Multi-source financial news library with quality scoring and ticker extraction

These details have not been verified by PyPI

Project links

Project description

newsquant

The financial news data library for Python.

pip install newsquant

Fetch, extract, and score financial news articles from 13 sources in one call. Returns clean Article objects with full text, extracted tickers, and a quality score — no database required.

What gap does this fill? Existing options are either single-source (newsapi-python, feedparser), return no article body (yfinance), or have no quality layer at all. newsquant covers multiple sources, runs full-text extraction via trafilatura, deduplicates across sources, and scores every article on a 0–1 quality scale.

Install

pip install newsquant

Python 3.9+ required. For PostgreSQL persistence add the optional extra:

pip install "newsquant[postgres]"

Quickstart

from newsquant import Scraper

scraper = Scraper(sources=["yahoofinance", "cnbc"])
articles = scraper.fetch(days_back=1)

for a in articles:
    print(a.title, a.tickers, a.quality_score)

Examples

Filter by ticker

scraper = Scraper(sources=["yahoofinance", "benzinga", "cnbc"])
articles = scraper.fetch(tickers=["AAPL", "MSFT"], days_back=7)

for a in articles:
    print(f"[{', '.join(a.tickers)}] {a.title}")
    print(f"  source={a.source_name}  quality={a.quality_score:.2f}  words={a.word_count}")

Use API sources

scraper = Scraper(
    sources=["newsapi", "finnhub"],
    newsapi_key="YOUR_KEY",
    finnhub_api_key="YOUR_KEY",
)
articles = scraper.fetch(tickers=["NVDA"], days_back=3, min_quality=0.8)

API keys can also be set via environment variables — see Configuration.

Persist to a database

# SQLite
articles = scraper.fetch(save_to="sqlite:///./financial_news.db")

# PostgreSQL
articles = scraper.fetch(save_to="postgresql://user:pw@localhost/mydb")

Tables are created automatically if they don't exist. fetch() always returns the Article list regardless.

Custom source

Subclass BaseFetcher to add any source the pipeline doesn't cover.

from newsquant import Scraper, BaseFetcher, SourceConfig
from scraper.models.article import RawArticle
from datetime import datetime, timezone

class MySource(BaseFetcher):
    def __init__(self, api_key: str):
        super().__init__(SourceConfig(name="my_source", type="api", rate_limit_rps=2.0))
        self._api_key = api_key

    def fetch(self, from_dt=None, to_dt=None, ticker=None, **kwargs) -> list[RawArticle]:
        resp = self._get(
            "https://api.example.com/news",
            params={"key": self._api_key, "symbol": ticker},
        )
        return [
            RawArticle(
                url=item["url"],
                title=item["title"],
                published_at=datetime.fromisoformat(item["published_at"]),
                source_name=self.config.name,
                summary=item.get("summary"),
            )
            for item in resp.json()["articles"]
        ]

# Mix custom and built-in sources freely
scraper = Scraper(sources=[MySource(api_key="secret"), "cnbc", "yahoofinance"])
articles = scraper.fetch(tickers=["AAPL"])

BaseFetcher provides _get() and _post() with automatic rate limiting and exponential-backoff retries. The rest of the pipeline (extraction, dedup, quality scoring, ticker extraction) runs automatically.

Built-in sources

Name	Type	Full text	Notes
`yahoofinance`	RSS	✓
`cnbc`	RSS	✓
`motleyfool`	RSS	✓
`benzinga`	RSS	✓
`businessinsider`	RSS	✓
`fortune`	RSS	✓
`prnewswire`	RSS	✓	Financial press releases only
`bloomberg`	RSS	—	Title + summary (paywalled)
`wsj`	RSS	—	Title + summary (paywalled)
`ft`	RSS	—	Title + summary (paywalled)
`seekingalpha`	RSS	—	Title + summary (paywalled)
`newsapi`	API	✓	Key required
`finnhub`	API	✓	Key required

Scraper() with no sources argument defaults to all RSS sources.

The `Article` object

Every item returned by fetch() is a Pydantic model with these fields:

Field	Type	Description
`title`	`str`	Article headline
`body`	`str`	Full extracted text
`summary`	`str`	Lead paragraph or RSS summary
`url`	`str`	Canonical URL
`source_name`	`str`	Source identifier (e.g. `"cnbc_rss"`)
`author`	`str \| None`	Byline if available
`published_at`	`datetime`	Publication time (UTC)
`tickers`	`list[str]`	Extracted ticker symbols
`quality_score`	`float`	0–1 composite quality score
`quality_flags`	`list[str]`	Flags that reduced the score
`word_count`	`int`	Body word count
`language`	`str`	Detected language code
`is_paywall`	`bool`	Paywall detected
`is_duplicate`	`bool`	Exact duplicate (URL or body hash)
`is_near_duplicate`	`bool`	Near-duplicate (SimHash)
`is_metadata_only`	`bool`	Full-text extraction skipped

Configuration

Set these in a .env file or as environment variables. Only the API keys for sources you actually use are required.

# Required for Scraper(sources=["newsapi"])
NEWSAPI_KEY=

# Required for Scraper(sources=["finnhub"])
FINNHUB_API_KEY=

# Optional — defaults shown
DATABASE_URL=sqlite:///./financial_news.db
MIN_WORD_COUNT=150
LANGUAGE_CONFIDENCE_THRESHOLD=0.95
REQUEST_TIMEOUT_SECONDS=30
MAX_RETRIES=3
LOG_LEVEL=INFO

Copy .env.example to get started:

cp .env.example .env

CLI

A command-line interface ships alongside the Python API for ops tasks:

# One-time setup
scraper db init

# Run sources
scraper scrape --all
scraper scrape --source cnbc_rss

# Historical backfill (GDELT and Wayback Machine)
scraper backfill --source gdelt --start 2020-01-01 --end 2025-01-01 --workers 4

# Query stored articles
scraper query --ticker AAPL --min-quality 0.8 --format csv

# Real-time daemon
scraper scheduler start --daemon

Contributing

Bug reports and pull requests are welcome. For major changes, open an issue first to discuss what you'd like to change.

git clone https://github.com/your-username/newsquant
cd newsquant
pip install -e ".[dev]"
pytest

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newsquant-0.1.1.tar.gz (40.4 kB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

newsquant-0.1.1-py3-none-any.whl (49.1 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file newsquant-0.1.1.tar.gz.

File metadata

Download URL: newsquant-0.1.1.tar.gz
Upload date: Mar 16, 2026
Size: 40.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for newsquant-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`3c0ffafa17a0af9d676020e84874c0f56bd9ef71101b0263e7d60ab474a79ca8`
MD5	`9414d103ad48840362b799b515b11667`
BLAKE2b-256	`bb9c31c7a25ebdff0642e09ebb91d97438949f5e5394dc4f0b5068475a8e2e88`

See more details on using hashes here.

File details

Details for the file newsquant-0.1.1-py3-none-any.whl.

File metadata

Download URL: newsquant-0.1.1-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 49.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for newsquant-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9834b80d06789eb60073991a8e37ae5c797f003545d63d5f093a62f817aa179d`
MD5	`6437964b006aba5dbc8c11bf6def79c6`
BLAKE2b-256	`1e4cdcf95d6588042a53d3ad5c62fa3135265e04e72d20f53faf14f37be142dd`

See more details on using hashes here.

newsquant 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

newsquant

Install

Quickstart

Examples

Filter by ticker

Use API sources

Persist to a database

Custom source

Built-in sources

The `Article` object

Configuration

CLI

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

newsquant 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

newsquant

Install

Quickstart

Examples

Filter by ticker

Use API sources

Persist to a database

Custom source

Built-in sources

The Article object

Configuration

CLI

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

The `Article` object