Extract and map content from academic papers for LLM processing

These details have not been verified by PyPI

Project links

Project description

Papercut

Extract knowledge from academic papers. A CLI-first Python package for researchers.

Installation

pip install papercutter

With LLM features (summarization, reports, study aids):

pip install papercutter[llm]

With fast PDF processing (PyMuPDF):

pip install papercutter[fast]

All optional dependencies:

pip install papercutter[all]

Development Installation

git clone https://github.com/pranjalrawat007/papercut.git
cd papercut
pip install -e ".[dev]"

Quick Start

Fetch Papers

Download papers from various academic sources:

# From arXiv
papercut fetch arxiv 2301.00001

# From DOI
papercut fetch doi 10.1257/aer.20180779

# From SSRN
papercut fetch ssrn 4123456

# From NBER
papercut fetch nber w29000

# From direct URL
papercut fetch url "https://example.com/paper.pdf" --name smith_2024

Extract Text

Extract clean text from PDFs:

# Full text to stdout
papercut extract text paper.pdf

# Save to file
papercut extract text paper.pdf --output paper.txt

# Chunk for LLM processing
papercut extract text paper.pdf --chunk-size 4000 --overlap 200

# Extract specific pages
papercut extract text paper.pdf --pages "1-10,15"

Extract Tables

Extract tables from PDFs as CSV or JSON:

# All tables to stdout as JSON
papercut extract tables paper.pdf

# Save as CSV files
papercut extract tables paper.pdf --output ./tables/ --format csv

# Extract from specific pages
papercut extract tables paper.pdf --pages "5-10" --format json

Extract References

Extract bibliography as BibTeX:

# BibTeX to stdout
papercut extract refs paper.pdf

# Save to file
papercut extract refs paper.pdf --output refs.bib

# As JSON
papercut extract refs paper.pdf --format json

Configuration

Papercut stores configuration in ~/.papercut/config.yaml:

output:
  directory: ~/papers

extraction:
  backend: pdfplumber
  text:
    chunk_size: null
    chunk_overlap: 200
  tables:
    format: csv

# LLM settings (v0.2)
llm:
  default_provider: anthropic
  default_model: claude-sonnet-4-20250514

Environment variables override config:

export PAPERCUT_ANTHROPIC_API_KEY=sk-ant-...
export PAPERCUT_OPENAI_API_KEY=sk-...

Development

Run tests:

pytest tests/

Run linting:

ruff check src/
mypy src/

License

MIT License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.1.0

Jan 12, 2026

3.0.2

Jan 12, 2026

3.0.1

Jan 12, 2026

3.0.0

Jan 12, 2026

2.0.2

Jan 9, 2026

2.0.1

Jan 9, 2026

2.0.0

Jan 9, 2026

1.2.0

Jan 9, 2026

This version

1.1.0

Jan 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papercutter-1.1.0.tar.gz (99.4 kB view details)

Uploaded Jan 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

papercutter-1.1.0-py3-none-any.whl (107.8 kB view details)

Uploaded Jan 8, 2026 Python 3

File details

Details for the file papercutter-1.1.0.tar.gz.

File metadata

Download URL: papercutter-1.1.0.tar.gz
Upload date: Jan 8, 2026
Size: 99.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for papercutter-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`64fb2ad61390bcfcdfdc30c492ab0499ba390f8ccc29b21af1885ef6dfe08d0c`
MD5	`85e25dd1eb2945cd4b95d7bfa91c76a3`
BLAKE2b-256	`62772eae732d36a4b25e9527854a38da47c63fcf984a145b4a81ef6e8bfb8fed`

See more details on using hashes here.

File details

Details for the file papercutter-1.1.0-py3-none-any.whl.

File metadata

Download URL: papercutter-1.1.0-py3-none-any.whl
Upload date: Jan 8, 2026
Size: 107.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for papercutter-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`237e49bab726ed5e5ec7fd8a7e79565d327902a60a32f9b44a5b2940279be69c`
MD5	`6c0be7f44630cf6dc0b33d0f571c7cc3`
BLAKE2b-256	`47eee42927a361c4eb6117cb03a68df3d53528869d22b7235c915668275caac8`

See more details on using hashes here.

papercutter 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Papercut

Installation

Development Installation

Quick Start

Fetch Papers

Extract Text

Extract Tables

Extract References

Configuration

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes