Skip to main content

A package (and website) to automatically attempt to find the code associated with a paper.

Project description

papers-without-code

Build Status Python Package Documentation

A Python package (and website) to automatically attempt to find GitHub repositories that are similar to academic papers.

Image of the Papers without Code web application homepage


Installation

Stable Release: pip install papers-without-code
Development Head: pip install git+https://github.com/evamaxfield/papers-without-code.git

Usage

Provide a DOI, SemanticScholarID, CorpusID, ArXivID, ACL, or URL from semanticscholar.org, arxiv.org, aclweb.org, acm.org, or biorxiv.org. DOIs can be provided as is. All other IDs should be given with their type, for example: doi:10.18653/v1/2020.acl-main.447 or CorpusID:202558505 or url:https://arxiv.org/abs/2004.07180.

CLI

pip install papers-without-code

pwoc query
# or pwoc path/to/file.pdf

Python

from papers_without_code import search_for_repos

search_for_repos("query")
# search_for_repos("path/to/file.pdf")

⚠️ Prior to using PWOC with a PDF you must be logged in to Docker CLI via docker login because we automatically fetch, spin up, and tear down containers for processing. ⚠️

How it Works

In short, we pass the query on to the Semantic Scholar search service (wrapped by danielnsilva/semanticscholar) which provides us basic details about the paper. We then use KeyBERT to extract keywords from the paper title and abstract. We then make multiple threaded requests to GitHub's API for repositories which match the keywords. Once we have all the possible repositories back, we rank them by similarity between the repository's README and the paper's abstract (or if not available, it's title).

When using Papers without Code locally and providing a filepath, the only change to this workflow, is keyword extraction. When local and providing a filepath, we use GROBID to extract keywords from the full text of the paper in addition to the title and abstract.

Documentation

For full package documentation please visit evamaxfield.github.io/papers-without-code.

Exploratory data analysis of the dataset used for testing

Development

See CONTRIBUTING.md for information related to developing the code.

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papers-without-code-0.3.0.tar.gz (12.1 MB view details)

Uploaded Source

Built Distribution

papers_without_code-0.3.0-py3-none-any.whl (12.2 MB view details)

Uploaded Python 3

File details

Details for the file papers-without-code-0.3.0.tar.gz.

File metadata

  • Download URL: papers-without-code-0.3.0.tar.gz
  • Upload date:
  • Size: 12.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for papers-without-code-0.3.0.tar.gz
Algorithm Hash digest
SHA256 2286202843ec0f22ae8fe7d98d6cdd3b84382b2ce6b4a084f08a04b9603e680c
MD5 b4d33dec94ffb94acedeb91649ccd032
BLAKE2b-256 6c60908b7a02c2a1ca0ba31b1b3c5f3e7f7e6674caac89f566aea408b74469f5

See more details on using hashes here.

File details

Details for the file papers_without_code-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for papers_without_code-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e70ab7ba97aabd58f86fade67b73202516daf63065fb520d142b77515dd18e8f
MD5 87b8869c662cee01a6930e3a97323df6
BLAKE2b-256 73788a542757b2fff4170e2eea1854e8b27e57c67fe0b333547660c7ffc81f50

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page