Reverse mapping tool: from Python import statements to pip package names
Project description
PyImport2Pkg
🐍 Reverse mapping from Python import statements to pip package names
📋 Table of Contents
- Introduction
- Why This Tool?
- Core Features
- Installation
- Quick Start
- Commands
- Advanced Features
- Python API
- Architecture
- FAQ
- Contributing
Introduction
PyImport2Pkg solves a core problem in the AI-assisted coding era:
Given Python import statements in code, how do we quickly and accurately know which pip packages need to be installed?
Problem Statement
In traditional development, pip package names usually match import module names. However, in practice, many popular libraries have package name ≠ module name:
import cv2→ installpip install opencv-pythonfrom PIL import Image→ installpip install Pillowimport sklearn→ installpip install scikit-learnimport google.cloud.storage→ installpip install google-cloud-storage
When AI generates code with dozens of imports, manually looking up each mapping is time-consuming and error-prone. PyImport2Pkg automates this.
Why This Tool?
The Challenge
When using AI code generators (like GitHub Copilot, Claude, or ChatGPT), you often get code like:
import cv2
import numpy as np
from sklearn.model_selection import train_test_split
from google.cloud import storage
import requests
Question: Which packages do you need to pip install?
Without PyImport2Pkg
- ❌ Manually Google each module name
- ❌ Check PyPI documentation
- ❌ Risk installing wrong packages
- ❌ Takes 5-10 minutes for 10 imports
With PyImport2Pkg
$ pyimport2pkg analyze ./my_ai_generated_code
Dependencies:
opencv-python
numpy
scikit-learn
google-cloud-storage
requests
Done in seconds! ✅
Core Features
🎯 Key Capabilities
| Feature | Description |
|---|---|
| Project Analysis | Recursively scan Python projects, extract all imports, generate requirements.txt |
| Smart Mapping | Multi-tier priority system for accurate module→package mapping |
| Namespace Support | Correctly handle google.*, azure.*, zope.* namespace packages |
| Optional Deps | Distinguish required vs optional imports (try-except, platform-specific) |
| Version-Aware | Auto-detect target Python version, handle backport packages |
| High-Performance DB | Smart incremental updates, true parallel processing, batch writes |
| Interrupt Recovery | Support resuming from breakpoint without data loss |
Mapping Priority
PyImport2Pkg uses a multi-tier priority system:
- Namespace packages - When submodules detected (e.g.,
google.cloud.storage→google-cloud-storage) - Hardcoded mappings - Known special cases (e.g.,
cv2→opencv-python) - PyPI database - From
top_level.txtin wheel files - Smart guess - Assume module name equals package name
Installation
Requirements
- Python 3.10+
- Minimal dependencies (only
httpx>=0.25.0)
Install via pip
pip install pyimport2pkg
Install in development mode
git clone https://github.com/buptanswer/pyimport2pkg.git
cd pyimport2pkg
pip install -e ".[dev]"
Verify Installation
pyimport2pkg --version
# pyimport2pkg 0.3.0
Quick Start
Analyze a Project
# Analyze current directory
pyimport2pkg analyze .
# Output:
# Analyzing: .
# Found imports from 24 files
#
# Dependencies:
# numpy
# pandas
# requests
# sklearn
# matplotlib
Query a Single Module
pyimport2pkg query cv2
# Output:
# Module: cv2
# Source: hardcoded
# Candidates:
# 1. opencv-python (recommended)
# 2. opencv-contrib-python
# 3. opencv-python-headless
Save Results
# Save as requirements.txt
pyimport2pkg analyze . -o requirements.txt
# Save as JSON
pyimport2pkg analyze . -o dependencies.json -f json
Commands
analyze - Analyze Project
Scan Python project for imports and identify required packages.
pyimport2pkg analyze <path> [options]
Options:
| Option | Description | Default |
|---|---|---|
-o, --output |
Output file path | stdout |
-f, --format |
Format (txt|json|simple) | txt |
-t, --target-version |
Target Python version | current |
Examples:
# Basic analysis
pyimport2pkg analyze /path/to/project
# Specify target Python version
pyimport2pkg analyze . -t 3.11
# Save as JSON
pyimport2pkg analyze . -o deps.json -f json
# Simple package list
pyimport2pkg analyze . -f simple
query - Query Module Mapping
Look up which pip package provides a specific module.
pyimport2pkg query <module_name>
Examples:
pyimport2pkg query numpy # → numpy
pyimport2pkg query cv2 # → opencv-python (+ alternatives)
pyimport2pkg query PIL # → Pillow
pyimport2pkg query google.cloud.storage # → google-cloud-storage
build-db - Build Mapping Database
Build PyPI package mapping database. This downloads metadata for top PyPI packages and builds the mapping.
pyimport2pkg build-db [options]
Options:
| Option | Description | Default |
|---|---|---|
--max-packages |
Target number of PyPI packages | 5000 |
--concurrency |
Number of parallel workers | 50 |
--resume |
Resume interrupted build | — |
--retry-failed |
Retry failed packages only | — |
--rebuild |
Force rebuild (delete old DB) | — |
--db-path |
Custom database path | data/mapping.db |
Examples:
# Build database with top 5000 packages
pyimport2pkg build-db --max-packages 5000
# Resume interrupted build
pyimport2pkg build-db --resume
# Retry only failed packages
pyimport2pkg build-db --retry-failed
# Expand existing database
pyimport2pkg build-db --max-packages 10000
# Force rebuild
pyimport2pkg build-db --rebuild --max-packages 5000
Features:
- ✅ Smart incremental updates (no reprocessing)
- ✅ Interrupt recovery with progress tracking
- ✅ Parallel processing (50x by default)
- ✅ Batch database writes
- ✅ Rate limit detection & auto-recovery
- ✅ Memory-optimized chunked processing
build-status - Check Build Status
View current or last build status.
pyimport2pkg build-status
# Output:
# Build Status: completed
# Total: 5000
# Processed: 5000
# Failed: 8
# Success Rate: 99.8%
# Last Updated: 2025-12-06 10:30:45
db-info - Database Information
Show database statistics.
pyimport2pkg db-info
# Output:
# Database Information
# ===================
# Database: data/mapping.db
# Packages: 5000
# Modules: 25000
# Last Updated: 2025-12-06 08:00:00
Advanced Features
v0.3.0 Highlights
1. Smart Incremental Updates
Extend your database without reprocessing:
# Database has 500 packages, expand to 1000
pyimport2pkg build-db --max-packages 1000
# Automatically processes only 500 new packages
2. Interrupt & Resume
Resume from breakpoint:
# Start build
pyimport2pkg build-db --max-packages 5000
# Later, resume
pyimport2pkg build-db --resume
3. Failed Package Retry
Retry only failed packages:
# First run: 860 failed
pyimport2pkg build-db --retry-failed
# Second run: only remaining failures
pyimport2pkg build-db --retry-failed
4. Performance Improvements
- 10-50x faster database writes (batch processing)
- 50x parallel concurrency (vs 20x in v0.2.0)
- Memory-optimized chunked processing for 15000+ packages
- Batch progress saves (every 100 packages)
5. Rate Limit Detection
Automatic PyPI rate limit handling:
Detected 20 consecutive failures - possible rate limiting.
Pausing 30 seconds before retry (pause 1/5)...
Resuming...
6. Graceful Interruption (Ctrl+C)
^C
Saving progress, please wait... (Ctrl+C again to force quit)
Build interrupted. Processed 2500/5000 packages.
Use --resume to continue.
Python API
Use PyImport2Pkg programmatically:
Basic Usage
from pyimport2pkg import Scanner, Parser, Filter, Mapper, Exporter
from pathlib import Path
# 1. Scan project
scanner = Scanner()
files = scanner.scan(Path("./my_project"))
# 2. Parse imports
parser = Parser()
imports = []
for file_path in files:
imports.extend(parser.parse(file_path))
# 3. Filter stdlib & local modules
filter = Filter(project_root=Path("./my_project"))
filtered = filter.filter(imports)
# 4. Map to packages
mapper = Mapper()
results = mapper.map(filtered)
# 5. Export results
exporter = Exporter()
exporter.to_requirements_txt(results, "requirements.txt")
Query Single Module
from pyimport2pkg import Mapper
mapper = Mapper()
result = mapper.map_single("cv2")
for candidate in result.package_candidates:
print(f"{candidate.name}: {candidate.download_count} downloads")
Check Build Status
from pyimport2pkg.database import get_build_progress
progress = get_build_progress()
status = progress.get_status()
print(f"Processed: {status['processed']}/{status['total']}")
print(f"Failed: {status['failed']}")
print(f"Success Rate: {status['success_rate']:.1%}")
Architecture
Pipeline Design
Python Project
↓
Scanner (scan for .py files)
↓
Parser (extract imports via AST)
↓
Filter (remove stdlib, local modules)
↓
Mapper (map to pip packages)
↓
Resolver (handle conflicts)
↓
Exporter (generate output)
↓
requirements.txt / JSON / list
Core Modules
| Module | Purpose |
|---|---|
scanner.py |
Recursively find Python files |
parser.py |
Extract imports with context (AST-based) |
filter.py |
Filter stdlib, local, backports |
mapper.py |
Multi-tier package mapping |
resolver.py |
Handle one-to-many conflicts |
exporter.py |
Multi-format output |
database.py |
PyPI mapping database |
Performance
Analysis Speed
| Project Size | Time | Files |
|---|---|---|
| Small (<100 files) | < 1s | ~50 |
| Medium (100-1000) | 1-5s | ~500 |
| Large (1000+) | 5-30s | ~2000 |
Database Build
| Packages | Time | Memory |
|---|---|---|
| 5000 | 10-20 min | ~200 MB |
| 10000 | 20-40 min | ~400 MB |
| 15000 | 40-80 min | ~600 MB |
FAQ
Q: How do I exclude certain directories?
A: Scanner auto-excludes: .git, .venv, venv, env, __pycache__, etc.
For custom exclusions, use Python API:
scanner = Scanner(exclude_dirs=["tests", "docs"])
Q: Does it support relative imports?
A: Yes. Relative imports are marked as local modules and filtered out.
Q: What about conditional imports?
A: Conditional imports (inside if/try blocks) are marked as optional=True.
Q: How long does database build take?
A: Depends on package count and network:
- 5000 packages: ~10-20 min
- 10000 packages: ~20-40 min
- Supports pause/resume
Q: Database not found error?
A: Either:
- Build database:
pyimport2pkg build-db - Or use online mode without local database
Q: Missing some imports?
Possible reasons:
- Package not in top 5000 PyPI
- Package metadata incomplete
- Non-standard package structure
Troubleshooting
No Python found
# Use explicit Python
python -m pyimport2pkg analyze .
Permission denied
# Ensure read access to project directory
chmod -R +r ./my_project
Out of memory
# Build database in chunks
pyimport2pkg build-db --max-packages 5000 # start small
pyimport2pkg build-db --max-packages 10000 # expand later
Contributing
Report Bugs
File issues at: https://github.com/buptanswer/pyimport2pkg/issues
Include:
- Python version
- PyImport2Pkg version
- Full error traceback
- Minimal reproduction example
Contribute Code
# Fork repository
git clone https://github.com/YOUR_USERNAME/pyimport2pkg.git
cd pyimport2pkg
# Create feature branch
git checkout -b feature/your-feature
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Make changes & commit
git add .
git commit -m "feat: your feature description"
# Push & create pull request
git push origin feature/your-feature
Development
Setup
pip install -e ".[dev]"
Run Tests
pytest tests/ -v
pytest tests/ --cov=pyimport2pkg # with coverage
Test Specific Module
pytest tests/test_parser.py -v
pytest tests/test_parser.py::TestParser::test_simple_import -v
License
MIT License - See LICENSE for details
Changelog
See CHANGELOG for detailed version history.
- v0.3.0 - Performance & reliability improvements (Dec 2025)
- v0.2.0 - Initial feature release
- v0.1.0 - Beta version
Support
- 📧 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📖 Documentation: User Guide
Acknowledgments
Built for the AI-assisted coding era. Special thanks to users who provided feedback and testing!
Made with ❤️ for developers using AI code generators
PyImport2Pkg v0.3.0 - December 2025
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyimport2pkg-0.3.0.tar.gz.
File metadata
- Download URL: pyimport2pkg-0.3.0.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f342e7bfb33f4ae1f5c55fc45894865c8d31a7bdcca44e4610a0a5823cab02a
|
|
| MD5 |
8716c793e42f8bb9e7b720a8a14b5bdb
|
|
| BLAKE2b-256 |
3d2a8664f63365eeb121a9c18fa101ffa7d161c540175ef179dc977758e49dfd
|
File details
Details for the file pyimport2pkg-0.3.0-py3-none-any.whl.
File metadata
- Download URL: pyimport2pkg-0.3.0-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2dab9a984984d8a763639b63bbd8943a031dbb51b5d91dc97e9a6974554dabd0
|
|
| MD5 |
f4acd71084da28be6c07296477aeac42
|
|
| BLAKE2b-256 |
2fe9068cf07b6b0397d429b21720c69cfe72349b185daa657718fc9dbd110975
|