Academic paper PDF renaming tool - 学术论文PDF重命名工具

These details have not been verified by PyPI

Project links

Project description

Chou (瞅) - Academic Paper PDF Renamer

A Python tool to automatically rename academic PDF papers to citation-style filenames by extracting title, author, and year information from the PDF content.

Features

Extracts title and authors from PDF first page using font size analysis
OCR support for scanned PDFs (5 OCR backends available)
Extracts publication year using 10 different strategies (supports English and Chinese)
Chinese name handling - automatically uses full names for Chinese authors
Chinese thesis/dissertation support - detects labeled fields like "论文题目", "作者姓名"
Multiple author format options
Dry-run mode for safe preview
Handles special characters and Unicode in author names
Logs all operations and exports results to CSV

Requirements

Python >= 3.10
PyMuPDF (required)
OCR backend (optional, for scanned PDFs)

Installation

From PyPI

pip install chou

From Source

git clone https://github.com/cycleuser/Chou.git
cd Chou
pip install -e .

With OCR Support

Choose one or more OCR backends based on your needs:

# Install with all OCR backends
pip install -e ".[ocr-surya,ocr-paddle,ocr-rapid,ocr-easy,ocr-tesseract]"

# Or install specific backends:
pip install surya-ocr          # Surya - Best accuracy, transformer-based (recommended)
pip install paddleocr paddlepaddle  # PaddleOCR - Good for Chinese
pip install rapidocr-onnxruntime    # RapidOCR - Lightweight, fast
pip install easyocr                 # EasyOCR - Easy to use
pip install pytesseract Pillow      # Tesseract - Classic OCR

Quick Start

After installation, the chou command is available:

# Preview changes (dry-run mode, default)
chou --dir /path/to/papers --dry-run

# Actually rename files
chou --dir /path/to/papers --execute

# Show version
chou --version

Usage

chou [options]

Options

Option	Short	Description
`--dir DIR`	`-d`	Directory containing PDF files (default: current)
`--dry-run`	`-n`	Preview without renaming (default: True)
`--execute`	`-x`	Actually rename files
`--format FMT`	`-f`	Author name format (see below)
`--num-authors N`	`-N`	Number of authors for n_* formats (default: 3)
`--recursive`	`-r`	Process subdirectories recursively (default: True)
`--no-recursive`		Only process the specified directory
`--ocr-engine`		Specify OCR engine (default: auto-detect)
`--no-ocr`		Disable OCR fallback
`--output FILE`	`-o`	Export results to CSV file
`--log-file FILE`	`-l`	Log file path
`--verbose`	`-v`	Verbose output

Author Format Options (`-f`)

Format	Example Output
`first_surname`	`Wang et al. (2023) - Title.pdf`
`first_full`	`Weihao Wang et al. (2023) - Title.pdf`
`all_surnames`	`Wang, Zhang, You (2023) - Title.pdf`
`all_full`	`Weihao Wang, Rufeng Zhang, Mingyu You (2023) - Title.pdf`
`n_surnames`	`Wang, Zhang et al. (2023) - Title.pdf`
`n_full`	`Weihao Wang, Rufeng Zhang et al. (2023) - Title.pdf`

Note: For Chinese authors, full names are always used (e.g., 张三 instead of just 张) since single-character surnames are not meaningful.

Examples

# Use first author's full name
chou -d /path/to/papers -f first_full --dry-run

# Use first 2 authors' surnames
chou -d /path/to/papers -f n_surnames -N 2 --dry-run

# Process and export results
chou -d /path/to/papers --execute -o results.csv

# Use specific OCR engine
chou -d /path/to/papers --ocr-engine rapidocr --dry-run

# Disable OCR
chou -d /path/to/papers --no-ocr --dry-run

OCR Support

For scanned PDFs without embedded text, the tool automatically uses OCR. Available backends (in priority order):

Backend	Install Command	Notes
Surya	`pip install surya-ocr`	Best accuracy, transformer-based
PaddleOCR	`pip install paddleocr paddlepaddle`	Good for Chinese
RapidOCR	`pip install rapidocr-onnxruntime`	Lightweight, fast
EasyOCR	`pip install easyocr`	Easy to use
Tesseract	`pip install pytesseract Pillow`	Classic OCR

The tool automatically selects the best available backend. To disable a specific backend:

# Disable Surya OCR (e.g., on low-memory systems)
export CHOU_DISABLE_SURYA=1
chou --dry-run

Year Extraction Strategies

The tool uses 10 strategies to extract publication year, ranked by confidence:

Conference + year (100): CVPR 2023, NeurIPS'22, AAAI-23
Ordinal edition (90): Thirty-Seventh AAAI Conference
Copyright notice (85): Copyright 2023, (c) 2023
Publication date (80): Published: 2023, Accepted: Jan 2023
Chinese year (78): 2023年, 二〇二三年
arXiv ID (75): arXiv:2301.12345
DOI with year (75): 10.1109/CVPR.2023.xxx
Journal volume (70): Vol. 35, 2023
Date pattern (60-65): March 2023, 2023/03
Frequent year (20-50): Most common year in text

Supported Conferences

AAAI, IJCAI, NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP, NAACL, SIGIR, KDD, WWW, CHI, USENIX, and 50+ more.

Project Structure

Chou/
├── chou/                  # Main package
│   ├── core/             # Core functionality
│   │   ├── processor.py       # PDF processing
│   │   ├── ocr_extractor.py   # OCR backends
│   │   ├── author_parser.py   # Author name parsing
│   │   ├── year_parser.py     # Year extraction
│   │   └── filename_gen.py    # Filename generation
│   ├── cli/              # Command-line interface
│   └── gui/              # GUI (optional)
├── tests/                # pytest tests
├── requirements.txt      # Dependencies
├── pyproject.toml        # Package configuration
├── README.md             # This file
└── README_CN.md          # Chinese documentation

GUI (Optional)

A graphical user interface is available:

pip install chou[gui]
chou-gui

Development

# Install development dependencies
pip install -e ".[test]"

# Run tests
pytest

# Run with verbose output
pytest -v

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Apr 2, 2026

0.1.5

Apr 2, 2026

0.1.4

Mar 10, 2026

0.1.3

Mar 9, 2026

0.1.2

Mar 9, 2026

0.1.1

Mar 7, 2026

This version

0.1.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chou-0.1.0.tar.gz (42.7 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chou-0.1.0-py3-none-any.whl (39.1 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file chou-0.1.0.tar.gz.

File metadata

Download URL: chou-0.1.0.tar.gz
Upload date: Feb 28, 2026
Size: 42.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for chou-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b18f23350c15b9b8f21a8d4b87459a750bff3fe46fa372e48eb4ae67d3d503ab`
MD5	`d40f0b59f69f047c95259e24e0dc6ca4`
BLAKE2b-256	`0ddb47fb3136316dc13481636dcc12f9c0551c9a09654ea97b151cfa242a8721`

See more details on using hashes here.

File details

Details for the file chou-0.1.0-py3-none-any.whl.

File metadata

Download URL: chou-0.1.0-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 39.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for chou-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d97c675de7e114a1dde67f4aa5d89dbaddab1c9f0c668fd938ab4902a799a0b5`
MD5	`139fd220e9eef1b6c7ae196dd2f8c6c6`
BLAKE2b-256	`ee2b58f5b7437f16fb7df0933e6cd6085c175aa5ed5350f544448f0e3364bc77`

See more details on using hashes here.

chou 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Chou (瞅) - Academic Paper PDF Renamer

Features

Requirements

Installation

From PyPI

From Source

With OCR Support

Quick Start

Usage

Options

Author Format Options (-f)

Examples

OCR Support

Year Extraction Strategies

Supported Conferences

Project Structure

GUI (Optional)

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Author Format Options (`-f`)