Google News Sitemap Parser
Project description
NewsGrabber is a Python library that parses Google News sitemap structures into Python objects, enabling developers to easily extract and analyze news-related metadata.
Features
Parses Google News sitemaps into structured Python objects.
Handles sitemap parsing with robust error tolerance.
Lightweight and efficient, leveraging:
lxml for fast XML parsing.
requests for HTTP requests.
python-dateutil for flexible date parsing.
Python 3.11+ compatible.
Installation
Install NewsGrabber via pip:
pip install newsgrabber
Usage
Parsing a Google News Sitemap
from newsgrabber import NewsGrabber
grabber = NewsGrabber("https://www.bbc.com/sitemaps/https-sitemap-com-news-1.xml")
grabber.parse()
print("\n".join(x.title for x in grabber.news_urls[:5]))
Example Output
BBC Look East: Latest weather forecast for the East
Syria country profile
Namibia country profile
How working parents can get 15 and 30 hours free childcare
South East England weather forecast
Requirements
NewsGrabber requires Python 3.11+ and the following dependencies:
lxml>=5.3.0: For XML parsing.
requests>=2.32.3: For HTTP requests.
python-dateutil>=2.1,<3.0.0: For flexible date parsing.
Development and Testing
To set up a development environment:
Clone the repository: bash git clone https://github.com/yibudak/newsgrabber cd newsgrabber
Install dependencies: bash pip install -e .[test]
Run tests: bash pytest
Contributing
Contributions are welcome! If you’d like to contribute, please fork the repository and submit a pull request. Make sure to include tests for any new functionality.
License
This library is licensed under the AGPL-3.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file newsgrabber-24.1.2.tar.gz.
File metadata
- Download URL: newsgrabber-24.1.2.tar.gz
- Upload date:
- Size: 43.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd6cd546f7d836f7ce4930ccaff4f151af3783f5158cb06cfca22e97a056a08a
|
|
| MD5 |
ff7b3afd39dd80deebe05d72d03a4c9f
|
|
| BLAKE2b-256 |
c4817c8f542af57ea5654000b02f753d0603e41ace7cff13c08507c4ce96eaa1
|
Provenance
The following attestation bundles were made for newsgrabber-24.1.2.tar.gz:
Publisher:
release.yml on yibudak/newsgrabber
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
newsgrabber-24.1.2.tar.gz -
Subject digest:
bd6cd546f7d836f7ce4930ccaff4f151af3783f5158cb06cfca22e97a056a08a - Sigstore transparency entry: 157047113
- Sigstore integration time:
-
Permalink:
yibudak/newsgrabber@16d04a0d6339be41e53f0000255e982904a628c6 -
Branch / Tag:
refs/tags/v24.1.2 - Owner: https://github.com/yibudak
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@16d04a0d6339be41e53f0000255e982904a628c6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file newsgrabber-24.1.2-py3-none-any.whl.
File metadata
- Download URL: newsgrabber-24.1.2-py3-none-any.whl
- Upload date:
- Size: 33.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3745bd39c2b45198d64ff9329039d51d819ed73d0743575f3ac5be3491522a00
|
|
| MD5 |
ccc7cd3961d21dfbab5d2a8654115132
|
|
| BLAKE2b-256 |
2781784afa49991ae24b6bed716ed710e2672a25df20616bb4627717e5e0f386
|
Provenance
The following attestation bundles were made for newsgrabber-24.1.2-py3-none-any.whl:
Publisher:
release.yml on yibudak/newsgrabber
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
newsgrabber-24.1.2-py3-none-any.whl -
Subject digest:
3745bd39c2b45198d64ff9329039d51d819ed73d0743575f3ac5be3491522a00 - Sigstore transparency entry: 157047114
- Sigstore integration time:
-
Permalink:
yibudak/newsgrabber@16d04a0d6339be41e53f0000255e982904a628c6 -
Branch / Tag:
refs/tags/v24.1.2 - Owner: https://github.com/yibudak
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@16d04a0d6339be41e53f0000255e982904a628c6 -
Trigger Event:
push
-
Statement type: