Skip to main content

No project description provided

Project description

Google News API Client

PyPI Downloads Python Version PyPI Version

A robust Python client library for the Google News RSS feed API that provides both synchronous and asynchronous implementations with built-in rate limiting, caching, and error handling.

Features

  • ✨ Comprehensive news search and retrieval functionality
    • Search by keywords with advanced filtering
    • Get top news by topic (WORLD, NATION, BUSINESS, TECHNOLOGY, etc.)
    • Batch search support for multiple queries
    • URL decoding for original article sources
  • 🔄 Both synchronous and asynchronous APIs
    • GoogleNewsClient for synchronous operations
    • AsyncGoogleNewsClient for async/await support
  • 🕒 Advanced time-based search capabilities
    • Date range filtering (after/before)
    • Relative time filtering (e.g., "1h", "24h", "7d")
    • Maximum 100 results for date-based searches
  • 🚀 High performance features
    • In-memory caching with configurable TTL
    • Built-in rate limiting with token bucket algorithm
    • Automatic retries with exponential backoff
    • Concurrent batch searches in async mode
  • 🌍 Multi-language and country support
    • ISO 639-1 language codes (e.g., "en", "fr", "de")
    • ISO 3166-1 country codes (e.g., "US", "GB", "DE")
    • Language-country combinations (e.g., "en-US", "fr-FR")
  • 🛡️ Robust error handling
    • Specific exceptions for different error scenarios
    • Detailed error messages with context
    • Graceful fallbacks and retries
  • 📦 Modern Python packaging with Poetry

Requirements

  • Python 3.9 or higher
  • Poetry (recommended for installation)

Installation

Using Poetry (recommended)

# Install using Poetry
poetry add google-news-api

# Or clone and install from source
git clone https://github.com/ma2za/google-news-api.git
cd google-news-api
poetry install

Using pip

pip install google-news-api

Usage Examples

Synchronous Client

from google_news_api import GoogleNewsClient

# Initialize client with custom configuration
client = GoogleNewsClient(
    language="en",
    country="US",
    requests_per_minute=60,
    cache_ttl=300
)

try:
    # Get top news by topic
    world_news = client.top_news(topic="WORLD", max_results=5)
    tech_news = client.top_news(topic="TECHNOLOGY", max_results=3)
    
    # Search with date range
    date_articles = client.search(
        "Ukraine war",
        after="2024-01-01",
        before="2024-03-01",
        max_results=5
    )
    
    # Search with relative time
    recent_articles = client.search(
        "climate change",
        when="24h",  # Last 24 hours
        max_results=5
    )
    
    # Batch search multiple queries
    batch_results = client.batch_search(
        queries=["AI", "machine learning", "deep learning"],
        when="7d",  # Last 7 days
        max_results=3
    )
    
    # Process results
    for topic, articles in batch_results.items():
        print(f"\nTop {topic} news:")
        for article in articles:
            print(f"- {article['title']} ({article['source']})")
            print(f"  Published: {article['published']}")
            print(f"  Summary: {article['summary'][:100]}...")

except Exception as e:
    print(f"An error occurred: {e}")
finally:
    # Clean up resources
    del client

Asynchronous Client

from google_news_api import AsyncGoogleNewsClient
import asyncio

async def main():
    async with AsyncGoogleNewsClient(
        language="en",
        country="US",
        requests_per_minute=60
    ) as client:
        # Fetch multiple news categories concurrently
        world_news = await client.top_news(topic="WORLD", max_results=3)
        tech_news = await client.top_news(topic="TECHNOLOGY", max_results=3)
        
        # Batch search with concurrent execution
        batch_results = await client.batch_search(
            queries=["AI", "machine learning", "deep learning"],
            when="7d",
            max_results=3
        )
        
        # Decode Google News URLs to original sources
        for topic, articles in batch_results.items():
            print(f"\nTop {topic} news:")
            for article in articles:
                original_url = await client.decode_url(article['link'])
                print(f"- {article['title']} ({article['source']})")
                print(f"  Original URL: {original_url}")

if __name__ == "__main__":
    asyncio.run(main())

Configuration

The library provides extensive configuration options through the client initialization:

Parameter Description Default Example Values
language Two-letter language code (ISO 639-1) or language-country format "en" "en", "fr", "de", "en-US", "fr-FR"
country Two-letter country code (ISO 3166-1 alpha-2) "US" "US", "GB", "DE", "JP"
requests_per_minute Rate limit threshold for API requests 60 30, 100, 120
cache_ttl Cache duration in seconds for responses 300 600, 1800, 3600

Available Topics

The top_news() method supports the following topics:

  • "WORLD" - World news
  • "NATION" - National news
  • "BUSINESS" - Business news
  • "TECHNOLOGY" - Technology news
  • "ENTERTAINMENT" - Entertainment news
  • "SPORTS" - Sports news
  • "SCIENCE" - Science news
  • "HEALTH" - Health news

Time-Based Search

The library supports two types of time-based search:

  1. Date Range Search

    • Use after and before parameters
    • Format: YYYY-MM-DD
    • Maximum 100 results
    • Example: after="2024-01-01", before="2024-03-01"
  2. Relative Time Search

    • Use the when parameter
    • Hours: "1h" to "101h"
    • Days: Any number of days (e.g., "7d", "30d")
    • Cannot be used with after/before
    • Example: when="24h" for last 24 hours

Article Structure

Each article in the results contains the following fields:

  • title: Article title
  • link: Google News article URL
  • published: Publication date and time
  • summary: Article summary/description
  • source: News source name

Error Handling

The library provides specific exceptions for different error scenarios:

from google_news_api.exceptions import (
    ConfigurationError,  # Invalid client configuration
    ValidationError,     # Invalid parameters
    HTTPError,          # Network or server issues
    RateLimitError,     # Rate limit exceeded
    ParsingError        # RSS feed parsing errors
)

try:
    articles = client.search("technology")
except RateLimitError as e:
    print(f"Rate limit exceeded. Retry after {e.retry_after} seconds")
except HTTPError as e:
    print(f"HTTP error {e.status_code}: {str(e)}")
except ValidationError as e:
    print(f"Invalid parameters: {str(e)}")
except Exception as e:
    print(f"Unexpected error: {str(e)}")

Best Practices

Resource Management

  • Use context managers (async with) for async clients
  • Explicitly close synchronous clients when done
  • Implement proper error handling and cleanup

Performance Optimization

  • Utilize caching for frequently accessed queries
  • Use the async client for concurrent operations
  • Batch related requests to maximize cache efficiency
  • Configure appropriate cache TTL based on your needs

Rate Limiting

  • Set requests_per_minute based on your requirements
  • Implement exponential backoff for rate limit errors
  • Monitor rate limit usage in production

Development

Setting up the Development Environment

# Clone the repository
git clone https://github.com/ma2za/google-news-api.git
cd google-news-api

# Install development dependencies
poetry install --with dev

# Set up pre-commit hooks
pre-commit install

Running Tests

# Run tests with Poetry
poetry run pytest

# Run tests with coverage
poetry run pytest --cov=google_news_api

# Run pre-commit on all files
pre-commit run --all-files

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests and linting (poetry run pytest and poetry run flake8)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Paolo Mazza (mazzapaolo2019@gmail.com)

Acknowledgments

Support

For issues, feature requests, or questions:

  • Open an issue on GitHub
  • Contact the author via email
  • Check the examples directory for more usage scenarios

Time-Based Search

The library supports two types of time-based search:

Date Range Search

Use after and before parameters to search within a specific date range:

articles = client.search(
    "Ukraine war",
    after="2024-01-01",  # Start date (YYYY-MM-DD)
    before="2024-03-01", # End date (YYYY-MM-DD)
    max_results=5
)

Relative Time Search

Use the when parameter for relative time searches:

# Last hour
articles = client.search("climate change", when="1h")

# Last 24 hours
articles = client.search("climate change", when="24h")

# Last 7 days
articles = client.search("climate change", when="7d")

Notes:

  • Date range parameters (after/before) must be in YYYY-MM-DD format
  • Relative time (when) supports:
    • Hours (h): 1-101 hours (e.g., "1h", "24h", "101h")
    • Days (d): Any number of days (e.g., "1d", "7d", "30d")
  • when parameter cannot be used together with after or before
  • All searches return articles sorted by relevance and recency

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google_news_api-0.0.6.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

google_news_api-0.0.6-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file google_news_api-0.0.6.tar.gz.

File metadata

  • Download URL: google_news_api-0.0.6.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.0 Linux/6.11.0-1013-azure

File hashes

Hashes for google_news_api-0.0.6.tar.gz
Algorithm Hash digest
SHA256 333a5d49a700c5b06a79a8676e7230404d2c5329d27f101bb6ad136e3cacc5f8
MD5 167e5ca8e0e052b651b91f69cb1faa80
BLAKE2b-256 ebc41284d70281671e0a822e1765ea3a92a6a3b06f6c1e7eec2aabf6811b258b

See more details on using hashes here.

File details

Details for the file google_news_api-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: google_news_api-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.0 Linux/6.11.0-1013-azure

File hashes

Hashes for google_news_api-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1d5a51545e8b98c25187c90a15d4ccac613692b6753637269e03aa047cc72153
MD5 96c79f213589ef0860880635dfaa8a9c
BLAKE2b-256 f997d45120bdd68bf0184ffa6c0785d16d5b101151b8fbf20502a9f6cbe2890c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page