A tool for standardizing academic article filenames
Project description
Reference Renamer
CLI tool for standardizing academic article filenames using metadata extraction and verification.
Features
- Automatic Renaming: Renames academic articles to a standardized format (
Author_Year_FiveWordTitle.ext) - Multiple Format Support: Handles PDF, TXT, and other document formats
- Metadata Enrichment: Uses arXiv, Semantic Scholar, and LLM processing to verify and enrich document metadata
- Citation Management: Maintains a BibTeX database of processed articles
- Accessibility Focused: Clear output formats and screen reader support
- Detailed Logging: Tracks all file operations
Installation
System Dependencies
Required for PDF processing with OCR:
# Ubuntu/Debian
sudo apt-get install tesseract-ocr poppler-utils
# macOS
brew install tesseract poppler
# Windows
# Install Tesseract from: https://github.com/UB-Mannheim/tesseract/wiki
# Install Poppler from: https://github.com/oschwartz10612/poppler-windows/releases
Note: OCR is only used as a fallback when PDFs contain no extractable text. If you only process text-based PDFs, these dependencies are optional.
Python Installation
- Clone the repository:
git clone https://github.com/lukeslp/reference-renamer.git
cd reference-renamer
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
- Install dependencies:
pip install -r requirements.txt
Ollama (Optional)
Ollama provides local LLM processing for enhanced metadata extraction. It is optional - the tool works without it using arXiv and Semantic Scholar APIs.
If you want to use Ollama:
# Install Ollama from https://ollama.ai
ollama run drummer-knowledge
Without Ollama, the tool will:
- Skip LLM-based extraction
- Fall back to arXiv and Semantic Scholar for metadata
- Still function correctly for most academic papers
Usage
Basic Usage
# Process a single directory
reference-renamer /path/to/papers
# Process recursively with dry run
reference-renamer --recursive --dry-run /path/to/papers
# Generate citations only
reference-renamer --citations-only /path/to/papers
Configuration
Create a config.yaml file in your working directory:
processing:
supported_extensions:
- .pdf
- .txt
recursive: true
max_title_words: 5
apis:
semantic_scholar:
enabled: true
timeout: 30
arxiv:
enabled: true
max_results: 3
logging:
level: INFO
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
Output
The tool generates:
- Renamed files in the specified format
- A CSV log of all operations (
rename_log.csv) - A BibTeX database of citations (
citations.bib)
Accessibility Features
- Screen reader-friendly output formats
- High contrast CLI interface
- Clear error messages and status updates
- Configurable output formats
Development
Setting up the Development Environment
- Install development dependencies:
pip install -r requirements-dev.txt
- Install pre-commit hooks:
pre-commit install
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=reference_renamer
# Run specific test file
pytest tests/test_file_processor.py
Code Style
This project uses:
- Black for code formatting
- isort for import sorting
- flake8 for linting
- mypy for type checking
Run the full suite:
# Format code
black .
isort .
# Check types
mypy .
# Lint
flake8
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linting
- Submit a pull request
Please ensure your PR:
- Includes tests for new features
- Updates documentation as needed
- Follows the project's code style
- Includes a clear description of changes
License
This project is licensed under the MIT License. The full license text is included in the source distribution as the LICENSE file.
Acknowledgments
- Built with Python
- Uses Ollama for LLM processing
- Integrates with arXiv and Semantic Scholar
Support
For support, please:
- Check the documentation in this README
- Search existing issues
- Create a new issue if needed
What's New
This refactor archives old helpers, adds example snippets, and includes new funding links.
Credits & Support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reference_renamer-0.1.1.tar.gz.
File metadata
- Download URL: reference_renamer-0.1.1.tar.gz
- Upload date:
- Size: 35.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d73b849c1c40f19646bcef314daae4c9a3a02b9d9a1d7a288fd230083ce13f86
|
|
| MD5 |
4f818422e6782a8582865e6ae9dcc39f
|
|
| BLAKE2b-256 |
056aab38d8b9eba82ad4b1a405e6aac34c95d01970b5c72e6199bf7ec2467945
|
File details
Details for the file reference_renamer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: reference_renamer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6881c1a804819718e468dc1c8a634178dd04d1a2e96fc2d7f4356eaa64a10705
|
|
| MD5 |
40e7a23a65ac00130d68900fcb3534fb
|
|
| BLAKE2b-256 |
5f645c8d840a6cb5e3ca6bfc20895fa0e87d5e1e2fafe6ae6a91fa8118106744
|