Skip to main content

No project description provided

Project description

Newscatcher Python Library

fern shield pypi

The Newscatcher Python library gives you convenient access to the Newscatcher API from Python.

Documentation

View API reference documentation at newscatcherapi.com/docs.

Installation

pip install newscatcher-sdk

Reference

A full reference for this library is available here.

Usage

Create and use the client:

import datetime

from newscatcher import NewscatcherApi

client = NewscatcherApi(
    api_key="YOUR_API_KEY",
)
client.search.post(
    q="renewable energy",
    predefined_sources=["top 50 US"],
    lang=["en"],
    from_=datetime.datetime.fromisoformat(
        "2024-01-01 00:00:00+00:00",
    ),
    to=datetime.datetime.fromisoformat(
        "2024-06-30 00:00:00+00:00",
    ),
    additional_domain_info=True,
    is_news_domain=True,
)

Async client

Use the AsyncNewscatcherApi client to make non-blocking calls to the API:

import asyncio
import datetime

from newscatcher import AsyncNewscatcherApi

client = AsyncNewscatcherApi(
    api_key="YOUR_API_KEY",
)


async def main() -> None:
    await client.search.post(
        q="renewable energy",
        predefined_sources=["top 50 US"],
        lang=["en"],
        from_=datetime.datetime.fromisoformat(
            "2024-01-01 00:00:00+00:00",
        ),
        to=datetime.datetime.fromisoformat(
            "2024-06-30 00:00:00+00:00",
        ),
        additional_domain_info=True,
        is_news_domain=True,
    )


asyncio.run(main())

Exception handling

The SDK raises an ApiError when the API returns a non-success status code (4xx or 5xx response):

from newscatcher.core.api_error import ApiError

try:
    client.search.post(...)
except ApiError as e:
    print(e.status_code)
    print(e.body)

Retrieving more articles

The standard News API endpoints have a limit of 10,000 articles per query. To retrieve more articles when needed, use these methods that automatically break down your request into smaller time chunks:

Get all articles

import datetime
from newscatcher import NewscatcherApi

client = NewscatcherApi(api_key="YOUR_API_KEY")

# Get articles about renewable energy from the past 10 days
articles = client.get_all_articles(
    q="renewable energy",
    from_="10d",  # Last 10 days
    time_chunk_size="1d",  # Split into 1-day chunks
    max_articles=50000,    # Limit to 50,000 articles
    show_progress=True     # Show progress indicator
)

print(f"Retrieved {len(articles)} articles")

Get all latest headlines

from newscatcher import NewscatcherApi

client = NewscatcherApi(api_key="YOUR_API_KEY")

# Get all technology headlines from the past week
articles = client.get_all_headlines(
    when="7d",
    time_chunk_size="1h",  # Split into 1-hour chunks
    show_progress=True
)

print(f"Retrieved {len(articles)} articles")

These methods handle pagination and deduplication automatically, giving you a seamless experience for retrieving large datasets.

You can also use async versions of these methods with the AsyncNewscatcherApi client.

Query validation

The SDK includes client-side query validation to help you catch syntax errors before making API calls:

from newscatcher import NewscatcherApi

client = NewscatcherApi(api_key="YOUR_API_KEY")

# Validate query syntax
is_valid, error_message = client.validate_query("machine learning")
if is_valid:
    print("Query is valid!")
else:
    print(f"Invalid query: {error_message}")

Automatic validation

Query validation is enabled by default in methods like get_all_articles() and will raise a ValueError for invalid queries. You can disable validation by setting validate_query=False:

# Enable validation (default)
articles = client.get_all_articles(
    q="AI OR \"artificial intelligence\"",  # Valid query
    validate_query=True,  # Optional, True by default
    from_="7d"
)

# Disable validation (not recommended)
articles = client.get_all_articles(
    q="some query",
    validate_query=False,  # Skip client-side validation
    from_="7d"
)

For complete validation rules, bulk validation techniques, and troubleshooting, see Validate queries with Python SDK.

Advanced

Retries

The SDK includes automatic retries with exponential backoff. The SDK retries a request when the request is retriable and the number of retry attempts is less than the configured retry limit (default: 2).

The SDK retries requests when the API returns these HTTP status codes:

  • 408 (Timeout)
  • 429 (Too Many Requests)
  • 5XX (Internal Server Errors)

Use the max_retries request option to configure this behavior:

client.search.post(..., request_options={
    "max_retries": 1
})

Timeouts

The SDK uses a 60-second timeout by default. Configure timeouts at the client or request level:

from newscatcher import NewscatcherApi

client = NewscatcherApi(
    ...,
    timeout=20.0,
)


# Override timeout for a specific method
client.search.post(..., request_options={
    "timeout_in_seconds": 1
})

Custom client

Override the httpx client to customize it for your use case. Common use cases include proxies and transports:

import httpx
from newscatcher import NewscatcherApi

client = NewscatcherApi(
    ...,
    httpx_client=httpx.Client(
        proxies="http://my.test.proxy.example.com",
        transport=httpx.HTTPTransport(local_address="0.0.0.0"),
    ),
)

Contributing

We value open-source contributions to this SDK. This library is generated programmatically, but we can implement custom methods and use .fernignore to preserve certain files. However, implementing custom solutions in the SDK involves a complex process. Please open an issue first to discuss your ideas before submitting a PR.

On the other hand, contributions to the README are always very welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newscatcher_sdk-1.4.0.tar.gz (153.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

newscatcher_sdk-1.4.0-py3-none-any.whl (223.1 kB view details)

Uploaded Python 3

File details

Details for the file newscatcher_sdk-1.4.0.tar.gz.

File metadata

  • Download URL: newscatcher_sdk-1.4.0.tar.gz
  • Upload date:
  • Size: 153.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.8.18 Linux/6.11.0-1018-azure

File hashes

Hashes for newscatcher_sdk-1.4.0.tar.gz
Algorithm Hash digest
SHA256 55b0f2626999b0eb0b519ba81bd36666c73f41ffe21112ed465b1730e81c2daf
MD5 3685bfc95b93d30bc8297d7c78f2fe37
BLAKE2b-256 2e95201ce1c7e0ca07c85ca64f56421f8bbaebbb3346d5243ff3627981378148

See more details on using hashes here.

File details

Details for the file newscatcher_sdk-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: newscatcher_sdk-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 223.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.8.18 Linux/6.11.0-1018-azure

File hashes

Hashes for newscatcher_sdk-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b76c7b499bb12ae6b2c74ec7f0e3a9feeb203c57eaf496fbd90608c1559c4de8
MD5 4165a58d62e74e8e64719e607f8a8882
BLAKE2b-256 ce9a951dd5e1cc508a1352055b5fd350cffccc45f592138b1b11534462a32c30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page