Skip to main content

Structured API clients for 17 external data sources - academic, government, news, and web APIs

Project description

Research Data Clients

Structured API clients for 17 external data sources. Provides consistent interfaces for retrieving data from academic, government, news, and web APIs.

Features

  • Unified Factory Pattern: Create any client with ClientFactory.create_client('arxiv')
  • Consistent Interface: All clients follow similar patterns
  • Dataclasses: Structured return types for papers, articles, snapshots
  • Convenience Functions: Quick helpers for common operations
  • Full Type Hints: IDE-friendly development

Available Clients

Client Source Requires API Key
ArxivClient Academic papers from arXiv.org No
SemanticScholarClient Research papers with citations No
PubMedClient Biomedical literature from NCBI No
ArchiveClient Internet Archive / Wayback Machine No
MultiArchiveClient Wayback, Archive.is, Memento, 12ft No
CensusClient US Census Bureau data Yes
FECClient Campaign finance data Yes
JudiciaryClient Court records No
GitHubClient Repository and user data Optional
NASAClient Space imagery and data Yes (free)
NewsClient News articles Yes
WikipediaClient Wikipedia content No
WeatherClient Weather forecasts Yes
OpenLibraryClient Book metadata No
YouTubeClient Video metadata Yes
FinanceClient Stock data Yes
MALClient Anime/manga data Yes
WolframAlphaClient Computational queries Yes

Installation

pip install research-data-clients

Quick Start

Using the Factory

from research_data_clients import ClientFactory

# Create any client by name
arxiv = ClientFactory.create_client('arxiv')
papers = arxiv.search('machine learning', max_results=10)

github = ClientFactory.create_client('github', token='your-token')
repos = github.search_repositories('python framework')

# List all available clients
sources = ClientFactory.list_sources()

ArxivClient - Academic Papers

from research_data_clients import ArxivClient, search_arxiv

client = ArxivClient()
papers = client.search(
    query='quantum computing',
    max_results=10,
    sort_by='relevance'
)

for paper in papers:
    print(f"{paper.title}")
    print(f"Authors: {paper.authors}")
    print(f"PDF: {paper.pdf_url}")

# Convenience function
results = search_arxiv('neural networks', max_results=5)

SemanticScholarClient - Citations

from research_data_clients import SemanticScholarClient, get_paper_by_doi

client = SemanticScholarClient()
papers = client.search(
    query='transformers attention',
    limit=10,
    fields=['title', 'authors', 'citations']
)

# Get paper by DOI
paper = get_paper_by_doi('10.1000/xyz123')
print(f"Citations: {paper.citation_count}")

PubMedClient - Medical Literature

from research_data_clients import PubMedClient

client = PubMedClient()
articles = client.search('COVID-19 treatment', max_results=10)

# Specialized searches
trials = client.search_clinical_trials('diabetes', max_results=5)
reviews = client.search_reviews('cancer immunotherapy', max_results=5)
by_mesh = client.search_by_mesh('Alzheimer Disease', max_results=5)
by_author = client.search_by_author('Fauci AS', max_results=5)

ArchiveClient - Wayback Machine

from research_data_clients import ArchiveClient, archive_url

client = ArchiveClient()

# Save page to archive
snapshot = client.save_page('https://example.com')
print(f"Archived: {snapshot.archive_url}")

# Get historical snapshots
snapshots = client.get_snapshots(
    url='https://example.com',
    from_date='2020-01-01',
    to_date='2024-01-01'
)

# Convenience function
archived = archive_url('https://example.com')

MultiArchiveClient - Multiple Providers

from research_data_clients import MultiArchiveClient

client = MultiArchiveClient()

# Try specific provider
result = client.get_archive('https://example.com', provider='wayback')
result = client.get_archive('https://example.com', provider='archiveis')
result = client.get_archive('https://example.com', provider='memento')
result = client.get_archive('https://example.com', provider='12ft')

# Try all providers
all_results = client.get_all_archives('https://example.com')
for provider, result in all_results.items():
    if result.success:
        print(f"{provider}: {result.archive_url}")

GitHubClient - Repositories

from research_data_clients import GitHubClient

client = GitHubClient(token='your-github-token')

# Search repos
repos = client.search_repositories(
    query='machine learning',
    language='python',
    sort='stars'
)

# Get repo details
repo = client.get_repository('owner/repo')
print(f"Stars: {repo['stargazers_count']}")

WikipediaClient - Articles

from research_data_clients import WikipediaClient

client = WikipediaClient()

# Search
results = client.search('quantum physics')

# Get article
article = client.get_article('Quantum_mechanics')
print(article['summary'])

# Different language
article_de = client.get_article('Quantenmechanik', lang='de')

NASAClient - Space Data

from research_data_clients import NASAClient

client = NASAClient(api_key='your-nasa-key')

# Astronomy Picture of the Day
apod = client.get_apod(date='2024-01-01')

# Mars rover photos
photos = client.get_mars_photos(
    rover='curiosity',
    sol=1000,
    camera='FHAZ'
)

# Near Earth Objects
neos = client.get_near_earth_objects(
    start_date='2024-01-01',
    end_date='2024-01-07'
)

WolframAlphaClient - Computation

from research_data_clients import WolframAlphaClient, wolfram_query

client = WolframAlphaClient(api_key='your-wolfram-key')

# Simple query
result = client.query('What is the population of France?')
print(result.result)

# Mathematical calculation
calc = client.calculate('integrate x^2 from 0 to 1')

# Unit conversion
converted = client.convert('100', 'miles', 'kilometers')

# Full query with all pods
full = client.query_full('derivative of sin(x)')

CensusClient - Demographics

from research_data_clients import CensusClient

client = CensusClient(api_key='your-census-key')

# Population data
data = client.get_population(
    year=2020,
    geography='state',
    state='CA'
)

# Economic data
econ = client.get_economic_data(
    dataset='acs/acs5',
    year=2020,
    variables=['B01001_001E'],
    geography='county:*',
    state='CA'
)

Other Clients

# News
news = ClientFactory.create_client('news', api_key='key')
articles = news.search('AI regulation', language='en')

# Weather
weather = ClientFactory.create_client('weather', api_key='key')
forecast = weather.get_forecast(lat=37.77, lon=-122.42, days=7)

# YouTube
youtube = ClientFactory.create_client('youtube', api_key='key')
videos = youtube.search('python tutorial', max_results=10)

# Finance
finance = ClientFactory.create_client('finance', api_key='key')
stock = finance.get_quote('AAPL')

# Open Library
books = ClientFactory.create_client('openlibrary')
results = books.search('science fiction')
book = books.get_book_by_isbn('9780451524935')

Data Classes

from research_data_clients import (
    ArxivPaper,
    SemanticScholarPaper,
    PubMedArticle,
    ArchivedSnapshot,
    ArchiveResult,
    WolframResult
)

# Structured return types with attributes
paper = ArxivPaper(
    id='2103.12345',
    title='Paper Title',
    authors=['Author 1', 'Author 2'],
    abstract='...',
    published='2021-03-15',
    pdf_url='https://arxiv.org/pdf/...'
)

API Key Management

Set via environment variables:

export CENSUS_API_KEY=...
export NASA_API_KEY=...
export NEWS_API_KEY=...
export WEATHER_API_KEY=...
export YOUTUBE_API_KEY=...
export ALPHAVANTAGE_API_KEY=...
export WOLFRAM_APP_ID=...
export GITHUB_TOKEN=...

Or pass directly:

client = NASAClient(api_key='your-key-here')

Error Handling

import requests

try:
    client = ArxivClient()
    papers = client.search('query')
except requests.exceptions.RequestException as e:
    print(f"Network error: {e}")
except ValueError as e:
    print(f"Invalid parameters: {e}")

Dependencies

Core: requests

Optional (for specific clients):

  • arxiv - ArxivClient
  • feedparser - RSS parsing

License

MIT License - see LICENSE file

Author

Luke Steuber

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

research_data_clients-0.1.0.tar.gz (39.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

research_data_clients-0.1.0-py3-none-any.whl (46.6 kB view details)

Uploaded Python 3

File details

Details for the file research_data_clients-0.1.0.tar.gz.

File metadata

  • Download URL: research_data_clients-0.1.0.tar.gz
  • Upload date:
  • Size: 39.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for research_data_clients-0.1.0.tar.gz
Algorithm Hash digest
SHA256 274b8af2baaad16f4fd9472de36914f947cad53aa3fbd66ab08a41d3b0f7a472
MD5 1053df8b9d860068b39b55d31e087f4f
BLAKE2b-256 045904f719ff7ae4dd617716190f81524d6021573213711a4dc2f050b37322f8

See more details on using hashes here.

File details

Details for the file research_data_clients-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for research_data_clients-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 57177f59e3d78f1b3834dfe12ea5e0e4bc991f2b5b19d16b124893722df34403
MD5 aa3f0712775c1f795cca0258860c323c
BLAKE2b-256 e37f708594d69c73059787ea7e37b3bf500427a499d594c40181db8ea474be50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page