Structured API clients for 17 external data sources - academic, government, news, and web APIs
Project description
Research Data Clients
Structured API clients for 17 external data sources. Provides consistent interfaces for retrieving data from academic, government, news, and web APIs.
Features
- Unified Factory Pattern: Create any client with
ClientFactory.create_client('arxiv') - Consistent Interface: All clients follow similar patterns
- Dataclasses: Structured return types for papers, articles, snapshots
- Convenience Functions: Quick helpers for common operations
- Full Type Hints: IDE-friendly development
Available Clients
| Client | Source | Requires API Key |
|---|---|---|
| ArxivClient | Academic papers from arXiv.org | No |
| SemanticScholarClient | Research papers with citations | No |
| PubMedClient | Biomedical literature from NCBI | No |
| ArchiveClient | Internet Archive / Wayback Machine | No |
| MultiArchiveClient | Wayback, Archive.is, Memento, 12ft | No |
| CensusClient | US Census Bureau data | Yes |
| FECClient | Campaign finance data | Yes |
| JudiciaryClient | Court records | No |
| GitHubClient | Repository and user data | Optional |
| NASAClient | Space imagery and data | Yes (free) |
| NewsClient | News articles | Yes |
| WikipediaClient | Wikipedia content | No |
| WeatherClient | Weather forecasts | Yes |
| OpenLibraryClient | Book metadata | No |
| YouTubeClient | Video metadata | Yes |
| FinanceClient | Stock data | Yes |
| MALClient | Anime/manga data | Yes |
| WolframAlphaClient | Computational queries | Yes |
Installation
pip install research-data-clients
Quick Start
Using the Factory
from research_data_clients import ClientFactory
# Create any client by name
arxiv = ClientFactory.create_client('arxiv')
papers = arxiv.search('machine learning', max_results=10)
github = ClientFactory.create_client('github', token='your-token')
repos = github.search_repositories('python framework')
# List all available clients
sources = ClientFactory.list_sources()
ArxivClient - Academic Papers
from research_data_clients import ArxivClient, search_arxiv
client = ArxivClient()
papers = client.search(
query='quantum computing',
max_results=10,
sort_by='relevance'
)
for paper in papers:
print(f"{paper.title}")
print(f"Authors: {paper.authors}")
print(f"PDF: {paper.pdf_url}")
# Convenience function
results = search_arxiv('neural networks', max_results=5)
SemanticScholarClient - Citations
from research_data_clients import SemanticScholarClient, get_paper_by_doi
client = SemanticScholarClient()
papers = client.search(
query='transformers attention',
limit=10,
fields=['title', 'authors', 'citations']
)
# Get paper by DOI
paper = get_paper_by_doi('10.1000/xyz123')
print(f"Citations: {paper.citation_count}")
PubMedClient - Medical Literature
from research_data_clients import PubMedClient
client = PubMedClient()
articles = client.search('COVID-19 treatment', max_results=10)
# Specialized searches
trials = client.search_clinical_trials('diabetes', max_results=5)
reviews = client.search_reviews('cancer immunotherapy', max_results=5)
by_mesh = client.search_by_mesh('Alzheimer Disease', max_results=5)
by_author = client.search_by_author('Fauci AS', max_results=5)
ArchiveClient - Wayback Machine
from research_data_clients import ArchiveClient, archive_url
client = ArchiveClient()
# Save page to archive
snapshot = client.save_page('https://example.com')
print(f"Archived: {snapshot.archive_url}")
# Get historical snapshots
snapshots = client.get_snapshots(
url='https://example.com',
from_date='2020-01-01',
to_date='2024-01-01'
)
# Convenience function
archived = archive_url('https://example.com')
MultiArchiveClient - Multiple Providers
from research_data_clients import MultiArchiveClient
client = MultiArchiveClient()
# Try specific provider
result = client.get_archive('https://example.com', provider='wayback')
result = client.get_archive('https://example.com', provider='archiveis')
result = client.get_archive('https://example.com', provider='memento')
result = client.get_archive('https://example.com', provider='12ft')
# Try all providers
all_results = client.get_all_archives('https://example.com')
for provider, result in all_results.items():
if result.success:
print(f"{provider}: {result.archive_url}")
GitHubClient - Repositories
from research_data_clients import GitHubClient
client = GitHubClient(token='your-github-token')
# Search repos
repos = client.search_repositories(
query='machine learning',
language='python',
sort='stars'
)
# Get repo details
repo = client.get_repository('owner/repo')
print(f"Stars: {repo['stargazers_count']}")
WikipediaClient - Articles
from research_data_clients import WikipediaClient
client = WikipediaClient()
# Search
results = client.search('quantum physics')
# Get article
article = client.get_article('Quantum_mechanics')
print(article['summary'])
# Different language
article_de = client.get_article('Quantenmechanik', lang='de')
NASAClient - Space Data
from research_data_clients import NASAClient
client = NASAClient(api_key='your-nasa-key')
# Astronomy Picture of the Day
apod = client.get_apod(date='2024-01-01')
# Mars rover photos
photos = client.get_mars_photos(
rover='curiosity',
sol=1000,
camera='FHAZ'
)
# Near Earth Objects
neos = client.get_near_earth_objects(
start_date='2024-01-01',
end_date='2024-01-07'
)
WolframAlphaClient - Computation
from research_data_clients import WolframAlphaClient, wolfram_query
client = WolframAlphaClient(api_key='your-wolfram-key')
# Simple query
result = client.query('What is the population of France?')
print(result.result)
# Mathematical calculation
calc = client.calculate('integrate x^2 from 0 to 1')
# Unit conversion
converted = client.convert('100', 'miles', 'kilometers')
# Full query with all pods
full = client.query_full('derivative of sin(x)')
CensusClient - Demographics
from research_data_clients import CensusClient
client = CensusClient(api_key='your-census-key')
# Population data
data = client.get_population(
year=2020,
geography='state',
state='CA'
)
# Economic data
econ = client.get_economic_data(
dataset='acs/acs5',
year=2020,
variables=['B01001_001E'],
geography='county:*',
state='CA'
)
Other Clients
# News
news = ClientFactory.create_client('news', api_key='key')
articles = news.search('AI regulation', language='en')
# Weather
weather = ClientFactory.create_client('weather', api_key='key')
forecast = weather.get_forecast(lat=37.77, lon=-122.42, days=7)
# YouTube
youtube = ClientFactory.create_client('youtube', api_key='key')
videos = youtube.search('python tutorial', max_results=10)
# Finance
finance = ClientFactory.create_client('finance', api_key='key')
stock = finance.get_quote('AAPL')
# Open Library
books = ClientFactory.create_client('openlibrary')
results = books.search('science fiction')
book = books.get_book_by_isbn('9780451524935')
Data Classes
from research_data_clients import (
ArxivPaper,
SemanticScholarPaper,
PubMedArticle,
ArchivedSnapshot,
ArchiveResult,
WolframResult
)
# Structured return types with attributes
paper = ArxivPaper(
id='2103.12345',
title='Paper Title',
authors=['Author 1', 'Author 2'],
abstract='...',
published='2021-03-15',
pdf_url='https://arxiv.org/pdf/...'
)
API Key Management
Set via environment variables:
export CENSUS_API_KEY=...
export NASA_API_KEY=...
export NEWS_API_KEY=...
export WEATHER_API_KEY=...
export YOUTUBE_API_KEY=...
export ALPHAVANTAGE_API_KEY=...
export WOLFRAM_APP_ID=...
export GITHUB_TOKEN=...
Or pass directly:
client = NASAClient(api_key='your-key-here')
Error Handling
import requests
try:
client = ArxivClient()
papers = client.search('query')
except requests.exceptions.RequestException as e:
print(f"Network error: {e}")
except ValueError as e:
print(f"Invalid parameters: {e}")
Dependencies
Core: requests
Optional (for specific clients):
arxiv- ArxivClientfeedparser- RSS parsing
License
MIT License - see LICENSE file
Author
Luke Steuber
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file research_data_clients-0.1.0.tar.gz.
File metadata
- Download URL: research_data_clients-0.1.0.tar.gz
- Upload date:
- Size: 39.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
274b8af2baaad16f4fd9472de36914f947cad53aa3fbd66ab08a41d3b0f7a472
|
|
| MD5 |
1053df8b9d860068b39b55d31e087f4f
|
|
| BLAKE2b-256 |
045904f719ff7ae4dd617716190f81524d6021573213711a4dc2f050b37322f8
|
File details
Details for the file research_data_clients-0.1.0-py3-none-any.whl.
File metadata
- Download URL: research_data_clients-0.1.0-py3-none-any.whl
- Upload date:
- Size: 46.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57177f59e3d78f1b3834dfe12ea5e0e4bc991f2b5b19d16b124893722df34403
|
|
| MD5 |
aa3f0712775c1f795cca0258860c323c
|
|
| BLAKE2b-256 |
e37f708594d69c73059787ea7e37b3bf500427a499d594c40181db8ea474be50
|