Skip to main content

A utility to merge and deduplicate VCF contact files

Project description

VCF Contact Merger

A powerful Python utility to merge and deduplicate VCF (vCard) contact files with intelligent duplicate detection and property merging.

Python Version License: MIT Code Quality

Features

  • Smart Duplicate Detection: Identifies duplicates based on normalized names, phone numbers, and email addresses
  • Intelligent Property Merging: Combines contact information from multiple sources while preserving all data
  • Robust File Handling: Supports various VCF formats and handles encoding issues gracefully
  • Command-Line Interface: Easy-to-use CLI for batch processing multiple contact files
  • Zero Dependencies: Uses only Python standard library - no external packages required
  • Cross-Platform: Works on Windows, macOS, and Linux
  • High Code Quality: Pylint score of 10/10 with comprehensive type hints and documentation

Installation

From Source

git clone https://github.com/fam007e/VCFmerger.git
cd VCFmerger
pip install -e .

Direct Download

wget https://raw.githubusercontent.com/fam007e/VCFmerger/main/merge_script.py
python3 merge_script.py output.vcf input1.vcf input2.vcf

Usage

Command Line Interface

After installation, you can use the vcf-merge command:

# Basic usage
vcf-merge merged_contacts.vcf contacts1.vcf contacts2.vcf contacts3.vcf

# Merge multiple backup files
vcf-merge all_contacts.vcf backup1.vcf backup2.vcf export.vcf

Direct Script Usage

python3 merge_script.py output.vcf input1.vcf input2.vcf [additional_files...]

Python API

from merge_script import VCFMerger

# Create merger instance
merger = VCFMerger()

# Read VCF files
with open('contacts1.vcf', 'r') as f1, open('contacts2.vcf', 'r') as f2:
    vcf_contents = [f1.read(), f2.read()]

# Merge contacts
merged_vcf = merger.merge_vcfs(vcf_contents)

# Write result
with open('merged.vcf', 'w') as output:
    output.write(merged_vcf)

How It Works

Duplicate Detection Algorithm

The merger uses a sophisticated key-based approach to identify duplicates:

  1. Name Normalization: Converts full names (FN) and structured names (N) to lowercase
  2. Phone Number Normalization: Strips formatting, keeping only digits and '+' prefix
  3. Email Normalization: Converts email addresses to lowercase
  4. Composite Key: Creates unique keys from normalized names, phone sets, and email sets

Property Merging Strategy

  • Single-Value Properties: Later-processed files take priority (FN, N, ORG, TITLE, etc.)
  • Multi-Value Properties: All values are preserved and combined (TEL, EMAIL, URL, ADR)
  • Special Handling: PHOTO properties and quoted-printable encoding are handled correctly

Supported VCF Properties

  • Names: FN (Full Name), N (Structured Name)
  • Contact Info: TEL (Phone), EMAIL, URL
  • Organization: ORG, TITLE
  • Address: ADR (Address)
  • Media: PHOTO (with multi-line support)
  • Metadata: VERSION and custom properties

Examples

Example 1: Basic Merging

Input files:

contacts1.vcf:

BEGIN:VCARD
VERSION:3.0
FN:John Doe
TEL:+1234567890
EMAIL:john@example.com
END:VCARD

contacts2.vcf:

BEGIN:VCARD
VERSION:3.0
FN:John Doe
TEL:+1234567890
EMAIL:john.doe@work.com
ORG:Acme Corp
END:VCARD

Command:

vcf-merge merged.vcf contacts1.vcf contacts2.vcf

Result: Single contact with both email addresses and organization information.

Example 2: Phone Number Normalization

These contacts will be detected as duplicates:

  • TEL:+1 (555) 123-4567
  • TEL:+15551234567
  • TEL:555.123.4567

All normalize to +15551234567.

Development

Prerequisites

  • Python 3.6 or later
  • Git

Setting Up Development Environment

# Clone the repository
git clone https://github.com/fam007e/VCFmerger.git
cd VCFmerger

# Install in development mode with dev dependencies
pip install -e .[dev]

# Run tests
pytest

# Check code quality
pylint merge_script.py

# Format code
black merge_script.py

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=merge_script

# Run specific test
pytest test_*.py

Code Quality

The project maintains high code quality standards:

# Pylint check (should score 10/10)
pylint merge_script.py

# Type checking
mypy merge_script.py

# Code formatting
black merge_script.py --check

File Structure

VCFmerger/
├── merge_script.py          # Main merger script
├── setup.py                 # Package installation script (legacy)
├── pyproject.toml          # Modern Python project configuration
├── __init__.py             # Package initialization
├── README.md               # This file
├── LICENSE                 # MIT License
├── requirements.txt        # Runtime dependencies (empty - no deps)
├── .gitignore             # Git ignore rules
└── tests/                 # Test files (if any)
    ├── test_*.py          # Test modules
    └── sample_data/       # Test data files
        ├── contacts1.vcf
        └── contacts2.vcf

Contributing

Contributions are welcome! We value your input and want to make contributing to this project as easy and transparent as possible.

Please read our Contributing Guidelines for details on our code of conduct, and the process for submitting pull requests to us.

Community Standards

  • Code of Conduct: We are committed to providing a friendly, safe and welcoming environment for all. Please read and respect our Code of Conduct.
  • Security: If you discover a security vulnerability, please see our Security Policy for reporting instructions.

Troubleshooting

Common Issues

Issue: "No valid input VCF files found"

  • Solution: Check file paths and ensure VCF files contain valid vCard data

Issue: Encoding errors with special characters

  • Solution: The script handles UTF-8 with error handling, but ensure your VCF files are properly encoded

Issue: Large files processing slowly

  • Solution: The script processes files sequentially; consider splitting very large files

Getting Help

Changelog

Version 1.0.0 (Initial Release)

  • Smart duplicate detection based on names, phones, and emails
  • Intelligent property merging with priority handling
  • Command-line interface with multiple input support
  • Python API for programmatic usage
  • Comprehensive error handling and logging
  • Cross-platform compatibility
  • Zero external dependencies
  • 10/10 pylint score with full type hints

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Inspired by the need to merge contact backups from multiple sources
  • Built with Python's robust standard library
  • Thanks to the open-source community for feedback and contributions

Author

Faisal Ahmed Moshiur


Star this repository if you find it useful!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcf_merger-2025.12.31.tar.gz (25.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcf_merger-2025.12.31-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file vcf_merger-2025.12.31.tar.gz.

File metadata

  • Download URL: vcf_merger-2025.12.31.tar.gz
  • Upload date:
  • Size: 25.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcf_merger-2025.12.31.tar.gz
Algorithm Hash digest
SHA256 3c3edacb1efcd888563aa2a26ed8294e6a6b18299399e0f2905633e26314836c
MD5 4bd41e6278728276df1fe98f64b3a908
BLAKE2b-256 830de66b5698b775f814f0f96b7558c24e23b80f56a2d9f3a156b175af3ecb88

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcf_merger-2025.12.31.tar.gz:

Publisher: release.yml on fam007e/vcfmerger

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcf_merger-2025.12.31-py3-none-any.whl.

File metadata

File hashes

Hashes for vcf_merger-2025.12.31-py3-none-any.whl
Algorithm Hash digest
SHA256 9472b84cc39c46a85841d621cf316cade53a6d6c85e7bed7c911bdbe6ddc8bf7
MD5 c4295fa1db7a4bcd570a2cc6c0f3c134
BLAKE2b-256 f74a8b4579fae08e1c68530b9667089949b99b8b8146d1d651baab8b05019756

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcf_merger-2025.12.31-py3-none-any.whl:

Publisher: release.yml on fam007e/vcfmerger

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page