Skip to main content

news-please is an open source easy-to-use news extractor that just works.

Project description

news-please is an open source, easy-to-use news crawler that extracts structured information from almost any news website. It can follow recursively internal hyperlinks and read RSS feeds to fetch both most recent and also old, archived articles. You only need to provide the root URL of the news website. Furthermore, its API allows developers to access the exctraction functionality within their software. news-please also implements a workflow optimized for the news archive provided by commoncrawl.org, allowing users to efficiently crawl and extract news articles including various filter options.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

news_please-1.6.15.tar.gz (75.7 kB view details)

Uploaded Source

Built Distribution

news_please-1.6.15-py3-none-any.whl (96.2 kB view details)

Uploaded Python 3

File details

Details for the file news_please-1.6.15.tar.gz.

File metadata

  • Download URL: news_please-1.6.15.tar.gz
  • Upload date:
  • Size: 75.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for news_please-1.6.15.tar.gz
Algorithm Hash digest
SHA256 827fb4a88622e673308ad1570fbc449c84a2d65d582fc489f05ae38864a79885
MD5 4d4993a54439eff405a87b29189e9df3
BLAKE2b-256 4d3354c0fc47f8ffba9257449b91eddbdb12e8813bf01a2bd87c681e2e612c67

See more details on using hashes here.

File details

Details for the file news_please-1.6.15-py3-none-any.whl.

File metadata

  • Download URL: news_please-1.6.15-py3-none-any.whl
  • Upload date:
  • Size: 96.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for news_please-1.6.15-py3-none-any.whl
Algorithm Hash digest
SHA256 1d9cd771889b7897e609aae5b2ef96849d3f29b96014f7c250e41862c99404e5
MD5 0b5b0651eb46607615444803f7945e67
BLAKE2b-256 da34e0fc3818233cafde263f7f71f4cff3cbe67fb7c59557fe61986a75b78446

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page