A publication ranking and citation network analysis tools.

Project description

paperank

A Publication Ranking and Citation Network Analysis Tools

paperank is a Python package for analyzing scholarly impact using citation networks. It provides tools to build citation graphs from DOIs, compute PapeRank (a PageRank-like score), fetch publication metadata, and export ranked results. The package is designed for researchers, bibliometricians, and developers interested in quantifying publication influence within local or global citation networks.

For a discussion on the use of PageRank-like scores beyond the web see Gleich, 2014.

Use cases.

Features

Citation Graph Construction:
Automatically builds a citation network from a starting DOI, including both cited and citing works, with configurable depth.
PapeRank Computation:
Calculates PageRank-like scores for all publications in the network, quantifying their relative importance.
Metadata Retrieval:
Fetches publication metadata (authors, title, year, etc.) from Crossref and OpenCitations.
Export Ranked Results:
Outputs ranked publication lists to JSON or CSV files, including scores and metadata.
Robust HTTP Handling:
Uses retry logic for API requests to handle rate limits and transient errors.

Installation

Install via pip (recommended):

pip install paperank

Or clone the repository and install locally:

git clone https://github.com/gwr3n/paperank.git
cd paperank
pip install .

Dependencies are managed via pyproject.toml and include:

numpy
scipy
requests
tqdm
urllib3

Quick Start

Here’s a minimal example to rank publications in a citation neighborhood:

from paperank.paperank_core import crawl_and_rank

# Set your target DOI
doi = "10.1016/j.ejor.2005.01.053"

# Run the analysis
results = crawl_and_rank(
    doi=doi,
    forward_steps=2,
    backward_steps=2,
    alpha=0.85,
    output_format="json",  # or "csv"
    debug=False,
    progress=True
)

This will:

Collect the citation neighborhood around the DOI
Compute PapeRank scores
Save results to a file (<DOI>.json or <DOI>.csv)

Advanced Parameters

You can fine-tune the PageRank iteration from rank and crawl_and_rank:

tol: Convergence tolerance (default 1e-12).
max_iter: Maximum number of iterations (default 10000).
teleport: Optional teleportation distribution (numpy array of size N), non-negative and summing to 1. If None, a uniform distribution is used.

Example:

results = crawl_and_rank(
    doi=doi,
    forward_steps=1,
    backward_steps=1,
    alpha=0.85,
    tol=1e-12,
    max_iter=20000,
    teleport=None,
)

Deprecated Function

apply_random_jump is deprecated. It materializes a dense Google matrix and is intended only for very small graphs.
Prefer compute_publication_rank_teleport, which applies teleportation during iteration without building a dense matrix.
If you already used apply_random_jump, pass the result to compute_publication_rank (not the teleport variant).

Main API

crawl_and_rank:
End-to-end workflow for crawling a citation network and ranking publications.
rank:
Compute PapeRank scores for a list of DOIs.
rank_and_save_publications_JSON:
Save ranked results to a JSON file.
rank_and_save_publications_CSV:
Save ranked results to a CSV file.
get_citation_neighborhood:
Collects DOIs in the citation neighborhood of a target publication.

Submodules

citation_crawler:
Functions for recursive citation/citing DOI collection.
citation_matrix:
Builds sparse adjacency matrices for citation graphs.
paperank_matrix:
Matrix utilities for stochastic and PageRank computations.
crossref:
Metadata retrieval from Crossref.
open_citations:
Citing DOI retrieval from OpenCitations.
doi_utils:
DOI normalization and utility functions.

Example

See example.py for a comprehensive script demonstrating the workflow (including advanced parameters).

Testing

Unit tests are provided in the tests directory. Run with:

python -m unittest discover tests

License

MIT License. See LICENSE for details.

Citation

If you use paperank in published work, please cite the repository:

@software{rossi2025paperank,
  author = {Roberto Rossi},
  title = {paperank: a publication ranking and citation network analysis tools},
  year = {2025},
  url = {https://github.com/gwr3n/paperank}
}

Support & Contributions

Issues and feature requests: GitHub Issues
Pull requests welcome!

Project Homepage

https://github.com/gwr3n/paperank

Project details

Release history Release notifications | RSS feed

0.3.0

Aug 31, 2025

0.2.3

Aug 31, 2025

0.2.2

Aug 31, 2025

0.2.1

Aug 31, 2025

0.2.0

Aug 31, 2025

This version

0.1.1

Aug 29, 2025

0.1.0

Aug 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperank-0.1.1.tar.gz (17.5 kB view details)

Uploaded Aug 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paperank-0.1.1-py3-none-any.whl (19.5 kB view details)

Uploaded Aug 29, 2025 Python 3

File details

Details for the file paperank-0.1.1.tar.gz.

File metadata

Download URL: paperank-0.1.1.tar.gz
Upload date: Aug 29, 2025
Size: 17.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for paperank-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`a995cbad765ec9e998a23db7ff165c9588e3e57adf128c717d644cb9b03d2398`
MD5	`2365370b58408a3b1106802e5f065870`
BLAKE2b-256	`ec8974ab1a24cbe58705fff40120d0c82dae5e62d3748d8f82c4f062e9c8f22c`

See more details on using hashes here.

File details

Details for the file paperank-0.1.1-py3-none-any.whl.

File metadata

Download URL: paperank-0.1.1-py3-none-any.whl
Upload date: Aug 29, 2025
Size: 19.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for paperank-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dc4d275021f048df7486c966c1f8cd64da38d8efa58b4b2c0340d7a2e119d9a7`
MD5	`da627f4d22709584ab3e60a369bd5faa`
BLAKE2b-256	`57939595e12d0db6e0b52527291059484394f706efa71eb30614f9b8539d078d`

See more details on using hashes here.

paperank 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

paperank

Features

Installation

Quick Start

Advanced Parameters

Deprecated Function

Main API

Submodules

Example

Testing

License

Citation

Support & Contributions

Project Homepage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes