Skip to main content

news-please is an open source easy-to-use news extractor that just works.

Project description

news-please is an open source, easy-to-use news crawler that extracts structured information from almost any news website. It can follow recursively internal hyperlinks and read RSS feeds to fetch both most recent and also old, archived articles. You only need to provide the root URL of the news website. Furthermore, its API allows developers to access the exctraction functionality within their software. news-please also implements a workflow optimized for the news archive provided by commoncrawl.org, allowing users to efficiently crawl and extract news articles including various filter options.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

news-please-1.2.52.tar.gz (57.3 kB view details)

Uploaded Source

File details

Details for the file news-please-1.2.52.tar.gz.

File metadata

  • Download URL: news-please-1.2.52.tar.gz
  • Upload date:
  • Size: 57.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.7

File hashes

Hashes for news-please-1.2.52.tar.gz
Algorithm Hash digest
SHA256 28d73381aebd6ed318c5a792c5e0c131d3ee1bb68b0b9754dbc6a0c01897a7d9
MD5 241c4b698108e07e733e094752e34573
BLAKE2b-256 41fc6e71921f49bc9dfce50fd619af5c35bb40a8a70da0bac9f782e0eed2ffab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page