Skip to main content

Multi-source financial news library with quality scoring and ticker extraction

Project description

finews

The financial news data library for Python.

pip install financial-news-scraper

PyPI Python License: MIT


Fetch, extract, and score financial news articles from 13 sources in one call. Returns clean Article objects with full text, extracted tickers, and a quality score — no database required.

What gap does this fill? Existing options are either single-source (newsapi-python, feedparser), return no article body (yfinance), or have no quality layer at all. finews covers multiple sources, runs full-text extraction via trafilatura, deduplicates across sources, and scores every article on a 0–1 quality scale.


Install

pip install financial-news-scraper

Python 3.9+ required. For PostgreSQL persistence add the optional extra:

pip install "financial-news-scraper[postgres]"

Quickstart

from finews import Scraper

scraper = Scraper(sources=["yahoofinance", "cnbc"])
articles = scraper.fetch(days_back=1)

for a in articles:
    print(a.title, a.tickers, a.quality_score)

Examples

Filter by ticker

scraper = Scraper(sources=["yahoofinance", "benzinga", "cnbc"])
articles = scraper.fetch(tickers=["AAPL", "MSFT"], days_back=7)

for a in articles:
    print(f"[{', '.join(a.tickers)}] {a.title}")
    print(f"  source={a.source_name}  quality={a.quality_score:.2f}  words={a.word_count}")

Use API sources

scraper = Scraper(
    sources=["newsapi", "finnhub"],
    newsapi_key="YOUR_KEY",
    finnhub_api_key="YOUR_KEY",
)
articles = scraper.fetch(tickers=["NVDA"], days_back=3, min_quality=0.8)

API keys can also be set via environment variables — see Configuration.

Persist to a database

# SQLite
articles = scraper.fetch(save_to="sqlite:///./financial_news.db")

# PostgreSQL
articles = scraper.fetch(save_to="postgresql://user:pw@localhost/mydb")

Tables are created automatically if they don't exist. fetch() always returns the Article list regardless.

Custom source

Subclass BaseFetcher to add any source the pipeline doesn't cover.

from finews import Scraper, BaseFetcher, SourceConfig
from scraper.models.article import RawArticle
from datetime import datetime, timezone

class MySource(BaseFetcher):
    def __init__(self, api_key: str):
        super().__init__(SourceConfig(name="my_source", type="api", rate_limit_rps=2.0))
        self._api_key = api_key

    def fetch(self, from_dt=None, to_dt=None, ticker=None, **kwargs) -> list[RawArticle]:
        resp = self._get(
            "https://api.example.com/news",
            params={"key": self._api_key, "symbol": ticker},
        )
        return [
            RawArticle(
                url=item["url"],
                title=item["title"],
                published_at=datetime.fromisoformat(item["published_at"]),
                source_name=self.config.name,
                summary=item.get("summary"),
            )
            for item in resp.json()["articles"]
        ]

# Mix custom and built-in sources freely
scraper = Scraper(sources=[MySource(api_key="secret"), "cnbc", "yahoofinance"])
articles = scraper.fetch(tickers=["AAPL"])

BaseFetcher provides _get() and _post() with automatic rate limiting and exponential-backoff retries. The rest of the pipeline (extraction, dedup, quality scoring, ticker extraction) runs automatically.


Built-in sources

Name Type Full text Notes
yahoofinance RSS
cnbc RSS
motleyfool RSS
benzinga RSS
businessinsider RSS
fortune RSS
prnewswire RSS Financial press releases only
bloomberg RSS Title + summary (paywalled)
wsj RSS Title + summary (paywalled)
ft RSS Title + summary (paywalled)
seekingalpha RSS Title + summary (paywalled)
newsapi API Key required
finnhub API Key required

Scraper() with no sources argument defaults to all RSS sources.


The Article object

Every item returned by fetch() is a Pydantic model with these fields:

Field Type Description
title str Article headline
body str Full extracted text
summary str Lead paragraph or RSS summary
url str Canonical URL
source_name str Source identifier (e.g. "cnbc_rss")
author str | None Byline if available
published_at datetime Publication time (UTC)
tickers list[str] Extracted ticker symbols
quality_score float 0–1 composite quality score
quality_flags list[str] Flags that reduced the score
word_count int Body word count
language str Detected language code
is_paywall bool Paywall detected
is_duplicate bool Exact duplicate (URL or body hash)
is_near_duplicate bool Near-duplicate (SimHash)
is_metadata_only bool Full-text extraction skipped

Configuration

Set these in a .env file or as environment variables. Only the API keys for sources you actually use are required.

# Required for Scraper(sources=["newsapi"])
NEWSAPI_KEY=

# Required for Scraper(sources=["finnhub"])
FINNHUB_API_KEY=

# Optional — defaults shown
DATABASE_URL=sqlite:///./financial_news.db
MIN_WORD_COUNT=150
LANGUAGE_CONFIDENCE_THRESHOLD=0.95
REQUEST_TIMEOUT_SECONDS=30
MAX_RETRIES=3
LOG_LEVEL=INFO

Copy .env.example to get started:

cp .env.example .env

CLI

A command-line interface ships alongside the Python API for ops tasks:

# One-time setup
scraper db init

# Run sources
scraper scrape --all
scraper scrape --source cnbc_rss

# Historical backfill (GDELT and Wayback Machine)
scraper backfill --source gdelt --start 2020-01-01 --end 2025-01-01 --workers 4

# Query stored articles
scraper query --ticker AAPL --min-quality 0.8 --format csv

# Real-time daemon
scraper scheduler start --daemon

Contributing

Bug reports and pull requests are welcome. For major changes, open an issue first to discuss what you'd like to change.

git clone https://github.com/your-username/financial-news-scraper
cd financial-news-scraper
pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

financial_news_scraper-0.1.0.tar.gz (36.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

financial_news_scraper-0.1.0-py3-none-any.whl (49.3 kB view details)

Uploaded Python 3

File details

Details for the file financial_news_scraper-0.1.0.tar.gz.

File metadata

  • Download URL: financial_news_scraper-0.1.0.tar.gz
  • Upload date:
  • Size: 36.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for financial_news_scraper-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fec97f39c9da65c10eac41183b8b6e88132db5aef451dcbacd64eafae5a3751d
MD5 58f053df749060f4f98b62a5c78265fa
BLAKE2b-256 f34fc4e17a2b8890553852fb8e25c5a2138018c525d422797efad1155ed27b0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for financial_news_scraper-0.1.0.tar.gz:

Publisher: publish.yml on fayaz21/financial-news-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file financial_news_scraper-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for financial_news_scraper-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f12914709b88fdf5bfd93075a304b2b0eb8336bfee4c3e239a699a11e2e553d6
MD5 12035b3936209f3bcb7709d5086d7183
BLAKE2b-256 c2de6440af6f29cb5e9aa121a75b501cf6e61742532fd7b143e66448d15ce665

See more details on using hashes here.

Provenance

The following attestation bundles were made for financial_news_scraper-0.1.0-py3-none-any.whl:

Publisher: publish.yml on fayaz21/financial-news-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page