Skip to main content

A Python wrapper for the Semantic Scholar Dataset API that provides easy access to academic papers, citations, and related data

Project description

Semantic Scholar Dataset API Wrapper

A Python wrapper for the Semantic Scholar Dataset API that provides easy access to academic papers, citations, and related data.

Description

This library provides a simple interface to interact with the Semantic Scholar Dataset API, allowing you to:

  • Access various academic datasets (papers, citations, authors, etc.)
  • Download dataset releases
  • Get diffs between releases
  • Manage large dataset downloads efficiently

Installation

pip install semanticscholar-datasetapi

Requirements

  • Python 3.7+
  • requests

Basic Usage

from semanticscholar_datasetapi import SemanticScholarDataset
import os

# Initialize the client with your API key
api_key = os.getenv("SEMANTIC_SCHOLAR_API_KEY")
client = SemanticScholarDataset(api_key=api_key)

# List available datasets
datasets = client.get_available_datasets()
print(datasets)

# Get latest release information
releases = client.get_available_releases()
print(releases)

# Download latest release of a specific dataset
client.download_latest_release(datasetname="papers")

# Get diffs between releases
client.download_diffs(
    start_release_id="2024-12-31",
    end_release_id="latest",
    datasetname="papers"
)

Available Datasets

The API provides access to the following datasets:

  • abstracts
  • authors
  • citations
  • embeddings-specter_v1
  • embeddings-specter_v2
  • paper-ids
  • papers
  • publication-venues
  • s2orc
  • tldrs

API Reference

Main Methods

SemanticScholarDataset(api_key: Optional[str] = None)

Initialize the API client with an optional API key.

get_available_releases() -> list

Get a list of all available dataset releases.

get_available_datasets() -> list

Get a list of all available datasets.

download_latest_release(datasetname: Optional[str] = None) -> None

Download the latest release of a specific dataset.

download_past_release(release_id: str, datasetname: Optional[str] = None) -> None

Download a specific past release of a dataset.

download_diffs(start_release_id: str, end_release_id: str, datasetname: Optional[str] = None) -> None

Download the differences between two releases of a dataset.

Error Handling

The library includes comprehensive error handling for:

  • Invalid dataset names
  • Missing API keys
  • Network errors
  • Invalid release IDs

Environment Variables

  • SEMANTIC_SCHOLAR_API_KEY: Your API key for the Semantic Scholar Dataset API

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

  • Semantic Scholar for providing the Dataset API
  • The academic community for maintaining and contributing to the datasets

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semanticscholar_datasetapi-0.1.0.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semanticscholar_datasetapi-0.1.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file semanticscholar_datasetapi-0.1.0.tar.gz.

File metadata

File hashes

Hashes for semanticscholar_datasetapi-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6cd165f203d97b7981cbc3ec335c2966488cb81e740e2a6ba8dda50f6a5c4afa
MD5 687bcf6c2815167a54ca8b34488f8dff
BLAKE2b-256 6ba3ab96b76d24a0aa48877704797bb4800bbdfc2e1f0c5befa808c7ff75a9f6

See more details on using hashes here.

File details

Details for the file semanticscholar_datasetapi-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for semanticscholar_datasetapi-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f401353dfb66c0998a97a71f6cc066e2dfca3c391f268c28eed5e2e25eaf96c3
MD5 18bfdc490adacac9aa930b156fa34c71
BLAKE2b-256 cf3455ff73d723b7c6ba4c1bfa72679b1bd7e3aad978cee2f0c04c6d3d2561ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page