Skip to main content

A package (and website) to automatically attempt to find the code associated with a paper.

Project description

papers-without-code

Build Status Python Package Documentation

A Python package (and website) to automatically attempt to find GitHub repositories that are similar to academic papers.

Image of the Papers without Code web application homepage


Installation

Stable Release: pip install papers-without-code
Development Head: pip install git+https://github.com/evamaxfield/papers-without-code.git

Usage

Provide a DOI, SemanticScholarID, CorpusID, ArXivID, ACL, or URL from semanticscholar.org, arxiv.org, aclweb.org, acm.org, or biorxiv.org. DOIs can be provided as is. All other IDs should be given with their type, for example: doi:10.18653/v1/2020.acl-main.447 or CorpusID:202558505 or url:https://arxiv.org/abs/2004.07180.

CLI

pip install papers-without-code

pwoc query
# or pwoc path/to/file.pdf

Python

from papers_without_code import search_for_repos

search_for_repos("query")
# search_for_repos("path/to/file.pdf")

⚠️ Prior to using PWOC with a PDF you must be logged in to Docker CLI via docker login because we automatically fetch, spin up, and tear down containers for processing. ⚠️

How it Works

In short, we pass the query on to the Semantic Scholar search service (wrapped by danielnsilva/semanticscholar) which provides us basic details about the paper. We then use KeyBERT to extract keywords from the paper title and abstract. We then make multiple threaded requests to GitHub's API for repositories which match the keywords. Once we have all the possible repositories back, we rank them by similarity between the repository's README and the paper's abstract (or if not available, it's title).

When using Papers without Code locally and providing a filepath, the only change to this workflow, is keyword extraction. When local and providing a filepath, we use GROBID to extract keywords from the full text of the paper in addition to the title and abstract.

Documentation

For full package documentation please visit evamaxfield.github.io/papers-without-code.

Exploratory data analysis of the dataset used for testing

Development

See CONTRIBUTING.md for information related to developing the code.

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papers-without-code-0.3.0.tar.gz (12.1 MB view hashes)

Uploaded Source

Built Distribution

papers_without_code-0.3.0-py3-none-any.whl (12.2 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page