Skip to main content

news-please is an open source easy-to-use news extractor that just works.

Project description

news-please is an open source, easy-to-use news crawler that extracts structured information from almost any news website. It can follow recursively internal hyperlinks and read RSS feeds to fetch both most recent and also old, archived articles. You only need to provide the root URL of the news website. Furthermore, its API allows developers to access the exctraction functionality within their software. news-please also implements a workflow optimized for the news archive provided by commoncrawl.org, allowing users to efficiently crawl and extract news articles including various filter options.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

news-please-1.2.41.tar.gz (55.8 kB view details)

Uploaded Source

File details

Details for the file news-please-1.2.41.tar.gz.

File metadata

  • Download URL: news-please-1.2.41.tar.gz
  • Upload date:
  • Size: 55.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for news-please-1.2.41.tar.gz
Algorithm Hash digest
SHA256 7245ecd7a4fe3b24413f24bc611636297dff50783e08dfccf7af498733ba15c1
MD5 e5b28f5f8f32f32b21744c94ca440616
BLAKE2b-256 e0f7f00c4ede64b4acf31540f1684ffb2e49f702a725dc3fa84fc0344c7bedcb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page