A package (and website) to automatically attempt to find the code associated with a paper.
Project description
papers-without-code
A Python package (and website) to automatically attempt to find GitHub repositories that are similar to academic papers.
Installation
Stable Release: pip install papers-without-code
Development Head: pip install git+https://github.com/evamaxfield/papers-without-code.git
Usage
Provide a DOI, SemanticScholarID, CorpusID, ArXivID, ACL,
or URL from semanticscholar.org, arxiv.org, aclweb.org,
acm.org, or biorxiv.org. DOIs can be provided as is.
All other IDs should be given with their type, for example:
doi:10.18653/v1/2020.acl-main.447
or CorpusID:202558505
or url:https://arxiv.org/abs/2004.07180
.
CLI
pip install papers-without-code
pwoc query
# or pwoc path/to/file.pdf
Python
from papers_without_code import search_for_repos
search_for_repos("query")
# search_for_repos("path/to/file.pdf")
⚠️ Prior to using PWOC with a PDF you must be logged in to Docker CLI via docker login
because we automatically fetch, spin up, and tear down containers for processing. ⚠️
How it Works
In short, we pass the query on to the Semantic Scholar search service (wrapped by danielnsilva/semanticscholar) which provides us basic details about the paper. We then use KeyBERT to extract keywords from the paper title and abstract. We then make multiple threaded requests to GitHub's API for repositories which match the keywords. Once we have all the possible repositories back, we rank them by similarity between the repository's README and the paper's abstract (or if not available, it's title).
When using Papers without Code locally and providing a filepath, the only change to this workflow, is keyword extraction. When local and providing a filepath, we use GROBID to extract keywords from the full text of the paper in addition to the title and abstract.
Documentation
For full package documentation please visit evamaxfield.github.io/papers-without-code.
Exploratory data analysis of the dataset used for testing
Development
See CONTRIBUTING.md for information related to developing the code.
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file papers-without-code-0.3.0.tar.gz
.
File metadata
- Download URL: papers-without-code-0.3.0.tar.gz
- Upload date:
- Size: 12.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2286202843ec0f22ae8fe7d98d6cdd3b84382b2ce6b4a084f08a04b9603e680c |
|
MD5 | b4d33dec94ffb94acedeb91649ccd032 |
|
BLAKE2b-256 | 6c60908b7a02c2a1ca0ba31b1b3c5f3e7f7e6674caac89f566aea408b74469f5 |
File details
Details for the file papers_without_code-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: papers_without_code-0.3.0-py3-none-any.whl
- Upload date:
- Size: 12.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e70ab7ba97aabd58f86fade67b73202516daf63065fb520d142b77515dd18e8f |
|
MD5 | 87b8869c662cee01a6930e3a97323df6 |
|
BLAKE2b-256 | 73788a542757b2fff4170e2eea1854e8b27e57c67fe0b333547660c7ffc81f50 |