Skip to main content

The official python client for the connected papers API.

Project description

connectedpapers-py

The official python client for the connected papers API.

Installation

pip install connectedpapers-py

Usage

from connectedpapers import ConnectedPapersClient

DEEPFRUITS_PAPER_ID = "9397e7acd062245d37350f5c05faf56e9cfae0d6"

# TEST_TOKEN allows access ONLY to the paper with the id DEEPFRUITS_PAPER_ID
client = ConnectedPapersClient(access_token="TEST_TOKEN")
remaining_uses_count = client.get_remaining_usages_sync()
print(f"Remaining uses count: {remaining_uses_count}")
free_access_papers = client.get_free_access_papers_sync()
print(f"Free access papers: {free_access_papers}")
graph = client.get_graph_sync(DEEPFRUITS_PAPER_ID)
assert graph.graph_json.start_id == DEEPFRUITS_PAPER_ID

See more on the usage samples directory.

See graph structure at graph.py.

Configuring the API key

There are multiple ways to configure the server address and API key:

  1. Set the environment variable CONNECTED_PAPERS_API_KEY:
    export CONNECTED_PAPERS_API_KEY="YOUR_API_KEY"
    
    Then you can use the client without parameters:
     from connectedpapers import ConnectedPapersClient
    
     client = ConnectedPapersClient()
    
  2. Send parameters to the client's constructur:
    from connectedpapers import ConnectedPapersClient
    
    client = ConnectedPapersClient(access_token="YOUR_API_KEY")
    

Getting an access token

Contact us at hello@connectedpapers.com to get an early-access access token.

Using the token TEST_TOKEN or not passing a token at all, will allow you to access the graph of the paper with the id 9397e7acd062245d37350f5c05faf56e9cfae0d6 for testing purposes.

Paper IDs

We use the ShaIDs from Semantic Scholar as paper IDs. You can find the ShaID of a paper by searching for it on Semantic Scholar and copying the ID from the URL.

API

Most functions offer a synchronous and an asynchronous version.

If a graph is already built, you can get it within a standard API call latency. If it is outdated (older than 1 month), you can still get it using the fresh_only=False parameter. If you wait for a rebuild (either fresh_only=True and graph older than 30 days, or no graph built for the paper) a rebuild will be triggered; A graph build can take up to 1 minute, and usually around 10 seconds.

The graph structure is documented at graph.py.

API structure

We have the following API calls available:

  • Fetching a graph
  • Getting the remaining usages count for your API key
  • Getting the list of papers that are free to access

Free re-access to papers

If you have fetched a paper, it remains free to access for a month after the first access without counting towards your usages count.

Synchronous API

from connectedpapers import ConnectedPapersClient

client = ConnectedPapersClient(access_token="YOUR_API_KEY")
client.get_graph_sync("YOUR_PAPER_ID")  # Fetch a graph for a single paper
client.get_remaining_usages_sync()  # Get the remaining usages count for your API key
client.get_free_access_papers_sync()  # Get the list of papers that are free to access

Asynchronous API

from connectedpapers import ConnectedPapersClient

client = ConnectedPapersClient(access_token="YOUR_API_KEY")


async def usage_sample() -> None:
   await client.get_graph_async("YOUR_PAPER_ID")  # Fetch a graph for a single paper
   await client.get_remaining_usages_async()  # Get the remaining usages count for your API key
   await client.get_free_access_papers_async()  # Get the list of papers that are free to access

usage_sample()

Async iterator API

The client offers support for Python's asynchronous iterator access to the API, allowing for real-time monitoring of the progress of graph builds and retrieval of both current and rebuilt papers.

The method get_graph_async_iterator returns an asynchronous iterator, generating values of the GraphResponse type. Here's what the GraphResponse and related GraphResponseStatuses looks like in Python:

class GraphResponseStatuses(Enum):
    BAD_ID = "BAD_ID"
    ERROR = "ERROR"
    NOT_IN_DB = "NOT_IN_DB"
    OLD_GRAPH = "OLD_GRAPH"
    FRESH_GRAPH = "FRESH_GRAPH"
    IN_PROGRESS = "IN_PROGRESS"
    QUEUED = "QUEUED"
    BAD_TOKEN = "BAD_TOKEN"
    BAD_REQUEST = "BAD_REQUEST"
    OUT_OF_REQUESTS = "OUT_OF_REQUESTS"
    OVERLOADED = "OVERLOADED"


@dataclasses.dataclass
class GraphResponse:
    status: GraphResponseStatuses
    graph_json: Optional[Graph] = None
    progress: Optional[float] = None
    remaining_requests: Optional[int] = None

Once the status falls into one of the terminal states (BAD_ID, ERROR, NOT_IN_DB, BAD_TOKEN, BAD_REQUEST, OUT_OF_REQUESTS), the iterator will cease to yield further values.

Here's the signature for invoking this method:

class ConnectedPapersClient:
    # ...
    async def get_graph_async_iterator(
        self, paper_id: str, fresh_only: bool = False, loop_until_fresh: bool = True
    ) -> AsyncIterator[GraphResponse]:
        # ...

Call this method with fresh_only=False and loop_until_fresh=True to request the existing graph and continue waiting for a rebuild if necessary.

The initial response will contain the status GraphResponseStatuses.OLD_GRAPH, then transition through GraphResponseStatuses.QUEUED and GraphResponseStatuses.IN_PROGRESS, with the progress field reflecting the percentage of the graph build completed. Upon completion of the rebuild, the status will change to GraphResponseStatuses.FRESH_GRAPH, and the iteration will end.

The graph_json field will contain the graph corresponding to each of these responses, and will remain a non-None as long as there is any version of the graph available.

Rate Limiting and Overload Handling

OVERLOADED Status

As of the 2025-11 API update, the API may return an OVERLOADED status when:

  • Your API key exceeds the rate limit (default: 5 builds per minute)
  • The system is temporarily overloaded

Previously, throttling returned HTTP 500 errors. Now it returns HTTP 200 with status: "OVERLOADED".

Automatic Retry with Exponential Backoff

By default, the client automatically retries when the server returns OVERLOADED status with exponential backoff delays of 5, 10, 20, and 40 seconds.

from connectedpapers import ConnectedPapersClient

# Default behavior: automatic retries enabled
client = ConnectedPapersClient(access_token="YOUR_API_KEY")

# The client will automatically retry with exponential backoff: 5s, 10s, 20s, 40s
graph = client.get_graph_sync("YOUR_PAPER_ID")

When retry_on_overload=True (default):

  • The client will automatically retry up to 4 times with delays of 5, 10, 20, and 40 seconds
  • If all retries are exhausted, it will return the OVERLOADED status

To disable automatic retries:

# Disable automatic retries
client = ConnectedPapersClient(
    access_token="YOUR_API_KEY",
    retry_on_overload=False
)

When retry_on_overload=False:

  • The client will immediately return the OVERLOADED status without retrying
  • You can handle this status in your code and implement your own retry logic

Avoiding Rate Limits

To minimize rate limit hits, you can check which papers are available for free re-access:

# Papers accessed within the last 31 days don't count toward rate limit
free_papers = client.get_free_access_papers_sync()
print(f"Free access papers: {free_papers}")

Papers accessed within 31 days can be re-accessed without counting toward your rate limit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

connectedpapers_py-0.2.3.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

connectedpapers_py-0.2.3-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file connectedpapers_py-0.2.3.tar.gz.

File metadata

  • Download URL: connectedpapers_py-0.2.3.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for connectedpapers_py-0.2.3.tar.gz
Algorithm Hash digest
SHA256 540917638d61d194e4d504d931b72bd802cd22445893575b1bec9092e694abd1
MD5 1a5ec027778ba3a5eb382d6199d570f2
BLAKE2b-256 41742a4817c1707294868f6b2523451d4c41785fd19ca7df5a0e70e24fe7e9c1

See more details on using hashes here.

File details

Details for the file connectedpapers_py-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for connectedpapers_py-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5b2b1a15a29a90c147af2238f234a88c71da16b4ada497dea131ece1f8644824
MD5 1c269904b7675aeca390531d681c2205
BLAKE2b-256 0972898609299c091649e1eca49da68ec007fa3550d9dd727f9fcd476c88d13b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page