Skip to main content

Sitemap scraper for news article selection within a certain time range

Project description

About

This module provides the SitemapRange class and a tool to allow command-line usage sitemap_fetch.py.

The class SitemapRange is meant primarily as a generic building block for creating news aggregating applications where the datasources are spec-compliant news websites.

There are some fault-tolerance features included to deal with some inconsistencies in sitemaps.

Install

To install from pypi:

pip install --user sitemap-range-fetch

Usage

Fetching all news articles on cnn.com in the past 6 days, and format the result as JSON:

sitemap_fetch.py --site "https://cnn.com" --format json --daysago 6

More custom filtering can be done by using the SitemapRange class can be used

Details

This module is provided as is under MIT License. For extensions, customizations or business inquiries you can get in touch here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sitemap-range-fetch-0.9.1.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sitemap_range_fetch-0.9.1-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file sitemap-range-fetch-0.9.1.tar.gz.

File metadata

  • Download URL: sitemap-range-fetch-0.9.1.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.21.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.3

File hashes

Hashes for sitemap-range-fetch-0.9.1.tar.gz
Algorithm Hash digest
SHA256 7de94d780be5e30128a1d7c2ce8a2dd13152ca8c6bb13ad4d5415e00596f55c7
MD5 06b3e1833255cccdd8292ab125cbb351
BLAKE2b-256 d2e4971755a85a5581125c2800904ff2c037298fe1fd92b3c2833a72f14835ca

See more details on using hashes here.

File details

Details for the file sitemap_range_fetch-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: sitemap_range_fetch-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.21.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.3

File hashes

Hashes for sitemap_range_fetch-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e7f9f4285cabff10c9f3e9040c90a2c7888dbbbccf6144c07ff4bb1e1f699c94
MD5 b4f1c826f68fc0652f2d9c1e8e1cd842
BLAKE2b-256 53187bd406935574188f05c243916795d3d69ef1fd12cf21e38e550fc8af4c4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page