Skip to main content

AI-powered file renaming tool with LLM integration

Project description

๐ŸŽ‰ OnomaTool: AI-Powered File Renamer ๐Ÿš€

Python License PRs Welcome Tests Passing


โœจ What is OnomaTool?

OnomaTool is your AI-powered file renaming assistant! ๐Ÿง โœจ

  • Rename files in bulk with smart, context-aware suggestions ๐Ÿค–
  • Supports PDFs, images, markdown, SVG, PPTX, DOCX, TXT, and more! ๐Ÿ“„๐Ÿ–ผ๏ธ
  • Always preserves file extensions ๐Ÿ”’
  • CLI with dry-run, interactive, verbose, and debug modes ๐Ÿ–ฅ๏ธ
  • Configurable via .onomarc TOML config file โš™๏ธ
  • Uses Markitdown for unified file processing ๐Ÿ“
  • NEW: Advanced UTF-8 encoding detection and conversion for text files ๐Ÿ”ค
  • NEW: Configurable word count limits for filenames ๐Ÿ”ค

๐Ÿš€ Features

Core Functionality

  • ๐Ÿฆพ AI Suggestions: Get 3 smart file name ideas for every file
  • ๐Ÿค– Multiple LLM Providers: OpenAI (including local endpoints) and Google Gemini support
  • ๐Ÿงฉ Conflict Resolution: Never overwrite files - automatic numeric suffix handling
  • ๐Ÿ”’ Extension Preservation: Original file extensions are always preserved
  • ๐Ÿ“ Glob Pattern Support: Process files using flexible glob patterns

File Processing

  • ๐Ÿ“„ PDF Files: Extract markdown content + generate images for each page
  • ๐Ÿ–ผ๏ธ SVG Files: Convert to PNG for AI analysis (enforced PNG-only processing)
  • ๐Ÿ“Š PPTX Files: Extract content + generate images for each slide using LibreOffice
  • ๐Ÿ“ Text Files: UTF-8 encoding detection and conversion + markdown processing
  • ๐Ÿ–ผ๏ธ Image Files: Base64 encoding for direct AI image analysis
  • ๐Ÿ“‘ Office Documents: DOCX, XLSX support via Markitdown
  • ๐Ÿ”ค Unicode Support: Automatic encoding detection for text files with chardet

CLI Modes

  • ๐Ÿงช Dry-Run Mode: Preview changes without modifying files (--dry-run)
  • ๐Ÿค Interactive Mode: Confirm changes after dry-run preview (--interactive)
  • ๐Ÿ” Debug Mode: Preserve temp files and show processing paths (--debug)
  • ๐Ÿ“ข Verbose Mode: Show LLM requests and responses (--verbose)
  • โš™๏ธ Config Generation: Generate default config file (--save-config)

Advanced Features

  • ๐ŸŽฏ Smart Processing: Combined image + text analysis for documents
  • ๐Ÿ—๏ธ Modular Architecture: Extensible processor system
  • ๐ŸŒ Local LLM Support: Works with local OpenAI-compatible endpoints
  • ๐Ÿ“Š Multiple Naming Conventions: snake_case, CamelCase, kebab-case, and more
  • ๐Ÿ›ก๏ธ SSL Flexibility: Automatic SSL handling for local/HTTP endpoints
  • ๐Ÿ”ค Encoding Intelligence: Automatic detection and UTF-8 conversion for text files
  • ๐Ÿงช Comprehensive Testing: 13+ test cases for encoding reliability

๐Ÿ› ๏ธ Installation

Method 1: Install from Source

# Clone the repository
git clone https://github.com/yourusername/onomatool.git
cd onomatool

# Create virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# For encoding detection (included in requirements.txt)
pip install chardet

# Install the package
pip install -e .

Method 2: Direct Installation

# Install directly from the repository
pip install git+https://github.com/yourusername/onomatool.git

System Dependencies

For full functionality (SVG, PDF, PPTX processing):

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install libreoffice imagemagick libcairo2 libpango-1.0-0 libpangocairo-1.0-0

# macOS (with Homebrew)
brew install libreoffice imagemagick cairo pango

# Windows: Download and install LibreOffice and ImageMagick
# For SVG support: pip install cairosvg (requires Cairo system libraries)

โšก Quick Start

Basic Usage

# Rename all PDFs in current directory
onomatool '*.pdf'

# Process files in subdirectories
onomatool 'docs/**/*.md'

# Specify file format explicitly
onomatool '*.unknown' --format pdf

Preview Mode

# See what would be renamed (no changes made)
onomatool '*.jpg' --dry-run

# Interactive confirmation after preview
onomatool '*.pdf' --dry-run --interactive

Debug and Verbose Modes

# Debug mode - preserve temp files
onomatool '*.svg' --debug

# Verbose mode - see LLM interactions
onomatool '*.docx' --verbose

# Combined modes
onomatool '*.pptx' --debug --verbose --dry-run

โš™๏ธ Configuration

OnomaTool uses a TOML configuration file at ~/.onomarc or a custom path with --config.

Generate Default Config

onomatool --save-config

Configuration Options

# API Configuration
default_provider = "openai"  # or "google"
openai_api_key = "sk-..."
openai_base_url = "https://api.openai.com/v1"  # or local endpoint
google_api_key = "your-google-api-key"

# Model and Behavior
llm_model = "gpt-4o"  # or "gemini-pro"
naming_convention = "snake_case"  # snake_case, CamelCase, kebab-case, etc.

# Custom Prompts (optional - defaults provided)
system_prompt = "You are a file naming assistant."
user_prompt = "Suggest 3 file names for: {content}"
image_prompt = "Suggest 3 file names for this image."

# Markitdown Configuration
[markitdown]
enable_plugins = false
docintel_endpoint = ""

# Word count limits (NEW!)
min_filename_words = 5      # Minimum words required (ensures descriptive names)
max_filename_words = 15     # Maximum words allowed (prevents overly long names)

Supported Naming Conventions

  • snake_case (default)
  • CamelCase
  • kebab-case
  • PascalCase
  • dot.notation
  • natural language

๐Ÿ“ Supported File Types

File Type Processing Method Output
PDF Markitdown + PyMuPDF page images Combined text + image analysis
PPTX Markitdown + LibreOffice slide images Combined text + image analysis
SVG Convert to PNG + Markitdown Image analysis only
Images (JPG, PNG, etc.) Base64 encoding Direct image analysis
DOCX Markitdown processing Text analysis
TXT, MD, NOTE UTF-8 encoding detection + text processing Text analysis
XLSX Markitdown processing Content analysis
CSV, JSON, XML, HTML UTF-8 encoding detection + Markitdown Content analysis
Code Files (PY, JS, CSS, YAML) UTF-8 encoding detection + text processing Code analysis

๐Ÿง‘โ€๐Ÿ’ป Development

Project Structure

src/onomatool/
โ”œโ”€โ”€ cli.py                 # Command-line interface
โ”œโ”€โ”€ config.py              # Configuration management
โ”œโ”€โ”€ llm_integration.py     # OpenAI/Google API integration
โ”œโ”€โ”€ file_dispatcher.py     # File routing logic
โ”œโ”€โ”€ processors/            # File processing modules
โ”‚   โ”œโ”€โ”€ markitdown_processor.py
โ”‚   โ””โ”€โ”€ text_processor.py
โ”œโ”€โ”€ utils/                 # Utility functions
โ”‚   โ””โ”€โ”€ image_utils.py     # SVG conversion utilities
โ”œโ”€โ”€ prompts.py             # Default prompts
โ”œโ”€โ”€ renamer.py             # File renaming logic
โ”œโ”€โ”€ conflict_resolver.py   # Filename conflict handling
โ””โ”€โ”€ file_collector.py      # Glob pattern matching

Running Tests

# Install test dependencies
pip install pytest pytest-mock

# Run all tests
pytest

# Run specific test suites
pytest tests/test_usage_enduser.py    # End-to-end user tests
pytest tests/test_utf8_encoding.py    # UTF-8 encoding tests

# Run with coverage
pytest --cov=onomatool

Code Style

# Format code
ruff format .

# Lint code
ruff check --fix .

# Run all checks
ruff check . && ruff format --check .

๐Ÿ”ง Advanced Usage

Custom Configuration Files

# Use custom config file
onomatool '*.pdf' --config /path/to/custom.toml

Local LLM Endpoints

# In your .onomarc
default_provider = "openai"
openai_base_url = "http://localhost:1234/v1"
openai_api_key = "not-needed-for-local"

Processing Specific Formats

# Force format detection
onomatool 'unknown_files/*' --format pdf
onomatool 'images/*' --format image

๐Ÿ›ก๏ธ Safety Features

  • No Overwrites: Built-in conflict resolution with numeric suffixes
  • Extension Preservation: Original file extensions always maintained
  • Dry-Run Mode: Preview all changes before execution
  • Temp File Management: Automatic cleanup (preservable in debug mode)
  • Error Handling: Graceful failure with clear error messages
  • Encoding Safety: Automatic UTF-8 conversion preserves original files
  • Unicode Compatibility: Handles em dashes, smart quotes, accented characters

๐Ÿค Contributing

We welcome contributions! Please:

  1. Fork the repository ๐Ÿด
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Write tests for your changes ๐Ÿงช
  4. Follow PEP8 and run ruff check --fix . ๐Ÿ
  5. Update CHANGELOG.md and FILETREE.md ๐Ÿ“š
  6. Submit a pull request ๐Ÿš€

Development Setup

# Clone and setup development environment
git clone https://github.com/yourusername/onomatool.git
cd onomatool
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .

# Install development dependencies
pip install pytest pytest-mock ruff

# Run tests to verify setup
pytest tests/test_utf8_encoding.py
pytest tests/test_usage_enduser.py

๐Ÿ“œ License

MIT License - see LICENSE file for details.


๐Ÿ™‹ FAQ

Q: Does it work on Windows/Mac/Linux? A: Yes! Cross-platform support with Python 3.10+.

Q: Can I use local LLMs? A: Yes! Set openai_base_url to your local endpoint in .onomarc.

Q: Will it overwrite my files? A: Never! Built-in conflict resolution prevents overwrites.

Q: What if my API key is invalid? A: The tool will show clear error messages and fail gracefully.

Q: Can I customize the AI prompts? A: Yes! Set system_prompt, user_prompt, and image_prompt in your config.

Q: How does SVG processing work? A: SVGs are converted to PNG images before AI analysis for better results.

Q: Can I see what the AI is thinking? A: Use --verbose to see full LLM requests and responses.

Q: What about files with special characters or different encodings? A: OnomaTool automatically detects and converts file encodings to UTF-8, handling em dashes, accented characters, and other Unicode symbols seamlessly.

Q: Does it work with files that have encoding issues? A: Yes! The tool uses advanced encoding detection to identify and convert problematic files while preserving the original content.

Q: How do word count limits affect filename generation? A: Word count limits control the minimum and maximum number of words in generated filenames. This helps maintain descriptive and concise naming conventions.


๐ŸŒŸ Star this repo if you find it useful! ๐ŸŒŸ

Made with โค๏ธ, AI, and a lot of import os.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onomatool-0.1.2.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

onomatool-0.1.2-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file onomatool-0.1.2.tar.gz.

File metadata

  • Download URL: onomatool-0.1.2.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.2

File hashes

Hashes for onomatool-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d049510d3d0ae3e0baaf242f6339e8e0762f3eb30de2eab342bab98266750edf
MD5 e579e4c84bc280f07c683d9d66e066d2
BLAKE2b-256 bbcd2065aa63ea0ac9270a9d50dac808b13043329a4bfc4351a65019ee71effb

See more details on using hashes here.

File details

Details for the file onomatool-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: onomatool-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 29.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.2

File hashes

Hashes for onomatool-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7d05607a7480ecf542982bf5da43e56c3a53ff3e58d2e2153654ae086e8ae52b
MD5 3a6038edf00b6b6b42de9df780a5b6ee
BLAKE2b-256 6eb41ee59b4ed9377116d85f4bf2d19fb5eef33ca81ed981a517c974163674d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page