Skip to main content

A tool for standardizing academic article filenames

Project description

Reference Renamer

Python 3.8+ License: MIT Status: Active

CLI tool for standardizing academic article filenames using metadata extraction and verification.

Features

  • Automatic Renaming: Renames academic articles to a standardized format (Author_Year_FiveWordTitle.ext)
  • Multiple Format Support: Handles PDF, TXT, and other document formats
  • Metadata Enrichment: Uses arXiv, Semantic Scholar, and LLM processing to verify and enrich document metadata
  • Citation Management: Maintains a BibTeX database of processed articles
  • Accessibility Focused: Clear output formats and screen reader support
  • Detailed Logging: Tracks all file operations

Installation

System Dependencies

Required for PDF processing with OCR:

# Ubuntu/Debian
sudo apt-get install tesseract-ocr poppler-utils

# macOS
brew install tesseract poppler

# Windows
# Install Tesseract from: https://github.com/UB-Mannheim/tesseract/wiki
# Install Poppler from: https://github.com/oschwartz10612/poppler-windows/releases

Note: OCR is only used as a fallback when PDFs contain no extractable text. If you only process text-based PDFs, these dependencies are optional.

Python Installation

  1. Clone the repository:
git clone https://github.com/lukeslp/reference-renamer.git
cd reference-renamer
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows
  1. Install dependencies:
pip install -r requirements.txt

Ollama (Optional)

Ollama provides local LLM processing for enhanced metadata extraction. It is optional - the tool works without it using arXiv and Semantic Scholar APIs.

If you want to use Ollama:

# Install Ollama from https://ollama.ai
ollama run drummer-knowledge

Without Ollama, the tool will:

  • Skip LLM-based extraction
  • Fall back to arXiv and Semantic Scholar for metadata
  • Still function correctly for most academic papers

Usage

Basic Usage

# Process a single directory
reference-renamer /path/to/papers

# Process recursively with dry run
reference-renamer --recursive --dry-run /path/to/papers

# Generate citations only
reference-renamer --citations-only /path/to/papers

Configuration

Create a config.yaml file in your working directory:

processing:
  supported_extensions:
    - .pdf
    - .txt
  recursive: true
  max_title_words: 5

apis:
  semantic_scholar:
    enabled: true
    timeout: 30
  arxiv:
    enabled: true
    max_results: 3

logging:
  level: INFO
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

Output

The tool generates:

  1. Renamed files in the specified format
  2. A CSV log of all operations (rename_log.csv)
  3. A BibTeX database of citations (citations.bib)

Accessibility Features

  • Screen reader-friendly output formats
  • High contrast CLI interface
  • Clear error messages and status updates
  • Configurable output formats

Development

Setting up the Development Environment

  1. Install development dependencies:
pip install -r requirements-dev.txt
  1. Install pre-commit hooks:
pre-commit install

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=reference_renamer

# Run specific test file
pytest tests/test_file_processor.py

Code Style

This project uses:

  • Black for code formatting
  • isort for import sorting
  • flake8 for linting
  • mypy for type checking

Run the full suite:

# Format code
black .
isort .

# Check types
mypy .

# Lint
flake8

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests and linting
  5. Submit a pull request

Please ensure your PR:

  • Includes tests for new features
  • Updates documentation as needed
  • Follows the project's code style
  • Includes a clear description of changes

License

This project is licensed under the MIT License. The full license text is included in the source distribution as the LICENSE file.

Acknowledgments

Support

For support, please:

  1. Check the documentation in this README
  2. Search existing issues
  3. Create a new issue if needed

What's New

This refactor archives old helpers, adds example snippets, and includes new funding links.

Credits & Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reference_renamer-0.1.1.tar.gz (35.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reference_renamer-0.1.1-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file reference_renamer-0.1.1.tar.gz.

File metadata

  • Download URL: reference_renamer-0.1.1.tar.gz
  • Upload date:
  • Size: 35.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for reference_renamer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d73b849c1c40f19646bcef314daae4c9a3a02b9d9a1d7a288fd230083ce13f86
MD5 4f818422e6782a8582865e6ae9dcc39f
BLAKE2b-256 056aab38d8b9eba82ad4b1a405e6aac34c95d01970b5c72e6199bf7ec2467945

See more details on using hashes here.

File details

Details for the file reference_renamer-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for reference_renamer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6881c1a804819718e468dc1c8a634178dd04d1a2e96fc2d7f4356eaa64a10705
MD5 40e7a23a65ac00130d68900fcb3534fb
BLAKE2b-256 5f645c8d840a6cb5e3ca6bfc20895fa0e87d5e1e2fafe6ae6a91fa8118106744

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page