Skip to main content

No project description provided

Project description

Newscatcher Python Library

fern shield pypi

The Newscatcher Python library provides convenient access to the Newscatcher API from Python.

Documentation

API reference documentation is available here.

Installation

pip install newscatcher-sdk

Reference

A full reference for this library is available here.

Usage

Instantiate and use the client with the following:

import datetime

from newscatcher import NewscatcherApi

client = NewscatcherApi(
    api_key="YOUR_API_KEY",
)
client.search.post(
    q="renewable energy",
    predefined_sources=["top 50 US"],
    lang=["en"],
    from_=datetime.datetime.fromisoformat(
        "2024-01-01 00:00:00+00:00",
    ),
    to=datetime.datetime.fromisoformat(
        "2024-06-30 00:00:00+00:00",
    ),
    additional_domain_info=True,
    is_news_domain=True,
)

Async Client

The SDK also exports an async client so that you can make non-blocking calls to our API.

import asyncio
import datetime

from newscatcher import AsyncNewscatcherApi

client = AsyncNewscatcherApi(
    api_key="YOUR_API_KEY",
)


async def main() -> None:
    await client.search.post(
        q="renewable energy",
        predefined_sources=["top 50 US"],
        lang=["en"],
        from_=datetime.datetime.fromisoformat(
            "2024-01-01 00:00:00+00:00",
        ),
        to=datetime.datetime.fromisoformat(
            "2024-06-30 00:00:00+00:00",
        ),
        additional_domain_info=True,
        is_news_domain=True,
    )


asyncio.run(main())

Exception Handling

When the API returns a non-success status code (4xx or 5xx response), a subclass of the following error will be thrown.

from newscatcher.core.api_error import ApiError

try:
    client.search.post(...)
except ApiError as e:
    print(e.status_code)
    print(e.body)

Retrieving More Articles

The standard News API endpoints have a limit of 10,000 articles per query. To help retrieve more articles when needed, this SDK provides methods that automatically break down your request into smaller time chunks:

Get All Articles

import datetime
from newscatcher import NewscatcherApi

client = NewscatcherApi(api_key="YOUR_API_KEY")

# Get articles about renewable energy from the past 10 days
articles = client.get_all_articles(
    q="renewable energy",
    from_="10d",  # Last 10 days
    time_chunk_size="1d",  # Split into 1-day chunks
    max_articles=50000,    # Limit to 50,000 articles
    show_progress=True     # Show progress indicator
)

print(f"Retrieved {len(articles)} articles")

Get All Latest Headlines

from newscatcher import NewscatcherApi

client = NewscatcherApi(api_key="YOUR_API_KEY")

# Get all technology headlines from the past week
articles = client.get_all_headlines(
    when="7d",
    time_chunk_size="1h",  # Split into 1-hour chunks
    show_progress=True
)

print(f"Retrieved {len(articles)} articles")

These methods handle pagination and deduplication automatically, providing a seamless experience for retrieving large datasets.

The async versions of these methods are also available with the AsyncNewscatcherApi client.

Advanced

Retries

The SDK is instrumented with automatic retries with exponential backoff. A request will be retried as long as the request is deemed retriable and the number of retry attempts has not grown larger than the configured retry limit (default: 2).

A request is deemed retriable when any of the following HTTP status codes is returned:

  • 408 (Timeout)
  • 429 (Too Many Requests)
  • 5XX (Internal Server Errors)

Use the max_retries request option to configure this behavior.

client.search.post(..., request_options={
    "max_retries": 1
})

Timeouts

The SDK defaults to a 60 second timeout. You can configure this with a timeout option at the client or request level.

from newscatcher import NewscatcherApi

client = NewscatcherApi(
    ...,
    timeout=20.0,
)


# Override timeout for a specific method
client.search.post(..., request_options={
    "timeout_in_seconds": 1
})

Custom Client

You can override the httpx client to customize it for your use-case. Some common use-cases include support for proxies and transports.

import httpx
from newscatcher import NewscatcherApi

client = NewscatcherApi(
    ...,
    httpx_client=httpx.Client(
        proxies="http://my.test.proxy.example.com",
        transport=httpx.HTTPTransport(local_address="0.0.0.0"),
    ),
)

Contributing

While we value open-source contributions to this SDK, this library is generated programmatically. Additions made directly to this library would have to be moved over to our generation code, otherwise they would be overwritten upon the next generated release. Feel free to open a PR as a proof of concept, but know that we will not be able to merge it as-is. We suggest opening an issue first to discuss with us!

On the other hand, contributions to the README are always very welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newscatcher_sdk-1.3.0.tar.gz (150.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

newscatcher_sdk-1.3.0-py3-none-any.whl (220.1 kB view details)

Uploaded Python 3

File details

Details for the file newscatcher_sdk-1.3.0.tar.gz.

File metadata

  • Download URL: newscatcher_sdk-1.3.0.tar.gz
  • Upload date:
  • Size: 150.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.8.18 Linux/6.11.0-1018-azure

File hashes

Hashes for newscatcher_sdk-1.3.0.tar.gz
Algorithm Hash digest
SHA256 af632906ebdae68fc77521479611702819b4da76aa82ef519d2c13256680292e
MD5 cea4d9a95647ea805fa25cab8f0e24f8
BLAKE2b-256 e1603183a8c9821c5e96a92e48edb4825624aeeeb1c5e82779828688c511f2fd

See more details on using hashes here.

File details

Details for the file newscatcher_sdk-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: newscatcher_sdk-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 220.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.8.18 Linux/6.11.0-1018-azure

File hashes

Hashes for newscatcher_sdk-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 43da08d165a994d51976b87a9351e695ba0c85a415e9b0b47bc6abcad8ce16d9
MD5 9a8a683aac60aea5b6e08264cdce4f2e
BLAKE2b-256 290a9b9bf314084f2ce6ef8db3c9dd3bc6fdf78d07e5f2c275d1a21da02593f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page