Extract and map content from academic papers for LLM processing

These details have not been verified by PyPI

Project links

Project description

Papercutter

Extract knowledge from academic papers. A CLI-first Python package for researchers.

Installation

pip install papercutter

With LLM features (summarization, reports, study aids):

pip install papercutter[llm]

With fast PDF processing (PyMuPDF):

pip install papercutter[fast]

All optional dependencies:

pip install papercutter[all]

Development Installation

git clone https://github.com/rawatpranjal/papercutter.git
cd papercutter
pip install -e ".[dev]"

Quick Start

Fetch Papers

Download papers from various academic sources:

# From arXiv
papercutter fetch arxiv 2301.00001

# From DOI
papercutter fetch doi 10.1257/aer.20180779

# From SSRN
papercutter fetch ssrn 4123456

# From NBER
papercutter fetch nber w29000

# From direct URL
papercutter fetch url "https://example.com/paper.pdf" --name smith_2024

Extract Text

Extract clean text from PDFs:

# Full text to stdout
papercutter extract text paper.pdf

# Save to file
papercutter extract text paper.pdf --output paper.txt

# Chunk for LLM processing
papercutter extract text paper.pdf --chunk-size 4000 --overlap 200

# Extract specific pages
papercutter extract text paper.pdf --pages "1-10,15"

Extract Tables

Extract tables from PDFs as CSV or JSON:

# All tables to stdout as JSON
papercutter extract tables paper.pdf

# Save as CSV files
papercutter extract tables paper.pdf --output ./tables/ --format csv

# Extract from specific pages
papercutter extract tables paper.pdf --pages "5-10" --format json

Extract References

Extract bibliography as BibTeX:

# BibTeX to stdout
papercutter extract refs paper.pdf

# Save to file
papercutter extract refs paper.pdf --output refs.bib

# As JSON
papercutter extract refs paper.pdf --format json

Configuration

Papercutter stores configuration in ~/.papercutter/config.yaml:

output:
  directory: ~/papers

extraction:
  backend: pdfplumber
  text:
    chunk_size: null
    chunk_overlap: 200
  tables:
    format: csv

# LLM settings (v0.2)
llm:
  default_provider: anthropic
  default_model: claude-sonnet-4-20250514

Environment variables override config:

export PAPERCUTTER_ANTHROPIC_API_KEY=sk-ant-...
export PAPERCUTTER_OPENAI_API_KEY=sk-...

Migration from Papercut

Papercutter is a direct rename of the original Papercut project. To upgrade an existing installation:

Reinstall the package: pip uninstall papercut && pip install papercutter.
Update scripts and shell aliases to call papercutter instead of papercut.
Rename your config directory if you have custom settings: mv ~/.papercut ~/.papercutter.
(Optional) Rename the cache directory to retain cached artifacts: mv ~/.cache/papercut ~/.cache/papercutter.
Update any PAPERCUT_* environment variables to the new PAPERCUTTER_* prefix.

Development

Run tests:

pytest tests/

Run linting:

ruff check src/
mypy src/

License

MIT License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.1.0

Jan 12, 2026

3.0.2

Jan 12, 2026

3.0.1

Jan 12, 2026

3.0.0

Jan 12, 2026

2.0.2

Jan 9, 2026

2.0.1

Jan 9, 2026

2.0.0

Jan 9, 2026

This version

1.2.0

Jan 9, 2026

1.1.0

Jan 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papercutter-1.2.0.tar.gz (180.4 kB view details)

Uploaded Jan 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

papercutter-1.2.0-py3-none-any.whl (216.2 kB view details)

Uploaded Jan 9, 2026 Python 3

File details

Details for the file papercutter-1.2.0.tar.gz.

File metadata

Download URL: papercutter-1.2.0.tar.gz
Upload date: Jan 9, 2026
Size: 180.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for papercutter-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`43b083e239746ec97836c5bc9f8d6c732a198ea988884f72b111e76fd3e61ff9`
MD5	`9bbf6bac4cc126f0476c69b68ff27845`
BLAKE2b-256	`18e73d2fe8d6f8f5c8c62341d5a5ac5083276ab41337a0cccf0759e81dbdf3bb`

See more details on using hashes here.

File details

Details for the file papercutter-1.2.0-py3-none-any.whl.

File metadata

Download URL: papercutter-1.2.0-py3-none-any.whl
Upload date: Jan 9, 2026
Size: 216.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for papercutter-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bf4c575e0c1445bfa7dcce23de311c9acd06e75ddc3c2eb8ec44ed39a2a6eeff`
MD5	`62dc91750e1d7d7e60879668fd6b624f`
BLAKE2b-256	`f6c433335165df5beccdf6b6ddc0aa753106aacacdf9f125112dbeefa1b7dc7f`

See more details on using hashes here.

papercutter 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Papercutter

Installation

Development Installation

Quick Start

Fetch Papers

Extract Text

Extract Tables

Extract References

Configuration

Migration from Papercut

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes