A utility to merge and deduplicate VCF contact files
Project description
VCF Contact Merger
A powerful Python utility to merge and deduplicate VCF (vCard) contact files with intelligent duplicate detection and property merging.
Features
- Smart Duplicate Detection: Identifies duplicates based on normalized names, phone numbers, and email addresses
- Intelligent Property Merging: Combines contact information from multiple sources while preserving all data
- Robust File Handling: Supports various VCF formats and handles encoding issues gracefully
- Command-Line Interface: Easy-to-use CLI for batch processing multiple contact files
- Zero Dependencies: Uses only Python standard library - no external packages required
- Cross-Platform: Works on Windows, macOS, and Linux
- High Code Quality: Pylint score of 10/10 with comprehensive type hints and documentation
Installation
From Source
git clone https://github.com/fam007e/VCFmerger.git
cd VCFmerger
pip install -e .
Direct Download
wget https://raw.githubusercontent.com/fam007e/VCFmerger/main/merge_script.py
python3 merge_script.py output.vcf input1.vcf input2.vcf
Usage
Command Line Interface
After installation, you can use the vcf-merge command:
# Basic usage
vcf-merge merged_contacts.vcf contacts1.vcf contacts2.vcf contacts3.vcf
# Merge multiple backup files
vcf-merge all_contacts.vcf backup1.vcf backup2.vcf export.vcf
Direct Script Usage
python3 merge_script.py output.vcf input1.vcf input2.vcf [additional_files...]
Python API
from merge_script import VCFMerger
# Create merger instance
merger = VCFMerger()
# Read VCF files
with open('contacts1.vcf', 'r') as f1, open('contacts2.vcf', 'r') as f2:
vcf_contents = [f1.read(), f2.read()]
# Merge contacts
merged_vcf = merger.merge_vcfs(vcf_contents)
# Write result
with open('merged.vcf', 'w') as output:
output.write(merged_vcf)
How It Works
Duplicate Detection Algorithm
The merger uses a sophisticated key-based approach to identify duplicates:
- Name Normalization: Converts full names (FN) and structured names (N) to lowercase
- Phone Number Normalization: Strips formatting, keeping only digits and '+' prefix
- Email Normalization: Converts email addresses to lowercase
- Composite Key: Creates unique keys from normalized names, phone sets, and email sets
Property Merging Strategy
- Single-Value Properties: Later-processed files take priority (FN, N, ORG, TITLE, etc.)
- Multi-Value Properties: All values are preserved and combined (TEL, EMAIL, URL, ADR)
- Special Handling: PHOTO properties and quoted-printable encoding are handled correctly
Supported VCF Properties
- Names: FN (Full Name), N (Structured Name)
- Contact Info: TEL (Phone), EMAIL, URL
- Organization: ORG, TITLE
- Address: ADR (Address)
- Media: PHOTO (with multi-line support)
- Metadata: VERSION and custom properties
Examples
Example 1: Basic Merging
Input files:
contacts1.vcf:
BEGIN:VCARD
VERSION:3.0
FN:John Doe
TEL:+1234567890
EMAIL:john@example.com
END:VCARD
contacts2.vcf:
BEGIN:VCARD
VERSION:3.0
FN:John Doe
TEL:+1234567890
EMAIL:john.doe@work.com
ORG:Acme Corp
END:VCARD
Command:
vcf-merge merged.vcf contacts1.vcf contacts2.vcf
Result: Single contact with both email addresses and organization information.
Example 2: Phone Number Normalization
These contacts will be detected as duplicates:
TEL:+1 (555) 123-4567TEL:+15551234567TEL:555.123.4567
All normalize to +15551234567.
Development
Prerequisites
- Python 3.6 or later
- Git
Setting Up Development Environment
# Clone the repository
git clone https://github.com/fam007e/VCFmerger.git
cd VCFmerger
# Install in development mode with dev dependencies
pip install -e .[dev]
# Run tests
pytest
# Check code quality
pylint merge_script.py
# Format code
black merge_script.py
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=merge_script
# Run specific test
pytest test_*.py
Code Quality
The project maintains high code quality standards:
# Pylint check (should score 10/10)
pylint merge_script.py
# Type checking
mypy merge_script.py
# Code formatting
black merge_script.py --check
File Structure
VCFmerger/
├── merge_script.py # Main merger script
├── setup.py # Package installation script (legacy)
├── pyproject.toml # Modern Python project configuration
├── __init__.py # Package initialization
├── README.md # This file
├── LICENSE # MIT License
├── requirements.txt # Runtime dependencies (empty - no deps)
├── .gitignore # Git ignore rules
└── tests/ # Test files (if any)
├── test_*.py # Test modules
└── sample_data/ # Test data files
├── contacts1.vcf
└── contacts2.vcf
Contributing
Contributions are welcome! We value your input and want to make contributing to this project as easy and transparent as possible.
Please read our Contributing Guidelines for details on our code of conduct, and the process for submitting pull requests to us.
Community Standards
- Code of Conduct: We are committed to providing a friendly, safe and welcoming environment for all. Please read and respect our Code of Conduct.
- Security: If you discover a security vulnerability, please see our Security Policy for reporting instructions.
Troubleshooting
Common Issues
Issue: "No valid input VCF files found"
- Solution: Check file paths and ensure VCF files contain valid vCard data
Issue: Encoding errors with special characters
- Solution: The script handles UTF-8 with error handling, but ensure your VCF files are properly encoded
Issue: Large files processing slowly
- Solution: The script processes files sequentially; consider splitting very large files
Getting Help
- Issues: Report bugs and request features on GitHub Issues
- Discussions: Join discussions on GitHub Discussions
- Email: Contact the maintainer at vcfmerger mail
Changelog
Version 1.0.0 (Initial Release)
- Smart duplicate detection based on names, phones, and emails
- Intelligent property merging with priority handling
- Command-line interface with multiple input support
- Python API for programmatic usage
- Comprehensive error handling and logging
- Cross-platform compatibility
- Zero external dependencies
- 10/10 pylint score with full type hints
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Inspired by the need to merge contact backups from multiple sources
- Built with Python's robust standard library
- Thanks to the open-source community for feedback and contributions
Author
Faisal Ahmed Moshiur
- GitHub: @fam007e
- Email: vcfmerger mail
⭐ Star this repository if you find it useful! ⭐
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vcf_merger-2025.12.7.post2.tar.gz.
File metadata
- Download URL: vcf_merger-2025.12.7.post2.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42d68ae6481ebda35f5b66eb1bde22540c6b92586e539f584668f3fea227f47c
|
|
| MD5 |
06a679206165af482a70ad55fe81cc2f
|
|
| BLAKE2b-256 |
c384b1c8afd5396bfe30a363ce7069c12b248ab0398f44eaed128abb12d0ddfc
|
Provenance
The following attestation bundles were made for vcf_merger-2025.12.7.post2.tar.gz:
Publisher:
release.yml on fam007e/vcfmerger
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcf_merger-2025.12.7.post2.tar.gz -
Subject digest:
42d68ae6481ebda35f5b66eb1bde22540c6b92586e539f584668f3fea227f47c - Sigstore transparency entry: 747806704
- Sigstore integration time:
-
Permalink:
fam007e/vcfmerger@77815e5854d27dc7e657c1cd71f0a10796850966 -
Branch / Tag:
refs/tags/v2025.12.07-2 - Owner: https://github.com/fam007e
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@77815e5854d27dc7e657c1cd71f0a10796850966 -
Trigger Event:
push
-
Statement type:
File details
Details for the file vcf_merger-2025.12.7.post2-py3-none-any.whl.
File metadata
- Download URL: vcf_merger-2025.12.7.post2-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59d6417b02e79defe2cbe8be3971406c4510944dbcd5ed665137b8d222a7b67b
|
|
| MD5 |
8fbf1b9c76421c2c42d139f2d7588204
|
|
| BLAKE2b-256 |
db2aa5568eb46ea990987080e2fe8061b0c0518b722b3b234693d39c721663f4
|
Provenance
The following attestation bundles were made for vcf_merger-2025.12.7.post2-py3-none-any.whl:
Publisher:
release.yml on fam007e/vcfmerger
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcf_merger-2025.12.7.post2-py3-none-any.whl -
Subject digest:
59d6417b02e79defe2cbe8be3971406c4510944dbcd5ed665137b8d222a7b67b - Sigstore transparency entry: 747806706
- Sigstore integration time:
-
Permalink:
fam007e/vcfmerger@77815e5854d27dc7e657c1cd71f0a10796850966 -
Branch / Tag:
refs/tags/v2025.12.07-2 - Owner: https://github.com/fam007e
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@77815e5854d27dc7e657c1cd71f0a10796850966 -
Trigger Event:
push
-
Statement type: