Skip to main content

A utility to merge and deduplicate VCF contact files

Project description

VCF Contact Merger

A powerful Python utility to merge and deduplicate VCF (vCard) contact files with intelligent duplicate detection and property merging.

Python Version License: MIT Code Quality

Features

  • Smart Duplicate Detection: Identifies duplicates based on normalized names, phone numbers, and email addresses
  • Intelligent Property Merging: Combines contact information from multiple sources while preserving all data
  • Robust File Handling: Supports various VCF formats and handles encoding issues gracefully
  • Command-Line Interface: Easy-to-use CLI for batch processing multiple contact files
  • Zero Dependencies: Uses only Python standard library - no external packages required
  • Cross-Platform: Works on Windows, macOS, and Linux
  • High Code Quality: Pylint score of 10/10 with comprehensive type hints and documentation

Installation

From Source

git clone https://github.com/fam007e/VCFmerger.git
cd VCFmerger
pip install -e .

Direct Download

wget https://raw.githubusercontent.com/fam007e/VCFmerger/main/merge_script.py
python3 merge_script.py output.vcf input1.vcf input2.vcf

Usage

Command Line Interface

After installation, you can use the vcf-merge command:

# Basic usage
vcf-merge merged_contacts.vcf contacts1.vcf contacts2.vcf contacts3.vcf

# Merge multiple backup files
vcf-merge all_contacts.vcf backup1.vcf backup2.vcf export.vcf

Direct Script Usage

python3 merge_script.py output.vcf input1.vcf input2.vcf [additional_files...]

Python API

from merge_script import VCFMerger

# Create merger instance
merger = VCFMerger()

# Read VCF files
with open('contacts1.vcf', 'r') as f1, open('contacts2.vcf', 'r') as f2:
    vcf_contents = [f1.read(), f2.read()]

# Merge contacts
merged_vcf = merger.merge_vcfs(vcf_contents)

# Write result
with open('merged.vcf', 'w') as output:
    output.write(merged_vcf)

How It Works

Duplicate Detection Algorithm

The merger uses a sophisticated key-based approach to identify duplicates:

  1. Name Normalization: Converts full names (FN) and structured names (N) to lowercase
  2. Phone Number Normalization: Strips formatting, keeping only digits and '+' prefix
  3. Email Normalization: Converts email addresses to lowercase
  4. Composite Key: Creates unique keys from normalized names, phone sets, and email sets

Property Merging Strategy

  • Single-Value Properties: Later-processed files take priority (FN, N, ORG, TITLE, etc.)
  • Multi-Value Properties: All values are preserved and combined (TEL, EMAIL, URL, ADR)
  • Special Handling: PHOTO properties and quoted-printable encoding are handled correctly

Supported VCF Properties

  • Names: FN (Full Name), N (Structured Name)
  • Contact Info: TEL (Phone), EMAIL, URL
  • Organization: ORG, TITLE
  • Address: ADR (Address)
  • Media: PHOTO (with multi-line support)
  • Metadata: VERSION and custom properties

Examples

Example 1: Basic Merging

Input files:

contacts1.vcf:

BEGIN:VCARD
VERSION:3.0
FN:John Doe
TEL:+1234567890
EMAIL:john@example.com
END:VCARD

contacts2.vcf:

BEGIN:VCARD
VERSION:3.0
FN:John Doe
TEL:+1234567890
EMAIL:john.doe@work.com
ORG:Acme Corp
END:VCARD

Command:

vcf-merge merged.vcf contacts1.vcf contacts2.vcf

Result: Single contact with both email addresses and organization information.

Example 2: Phone Number Normalization

These contacts will be detected as duplicates:

  • TEL:+1 (555) 123-4567
  • TEL:+15551234567
  • TEL:555.123.4567

All normalize to +15551234567.

Development

Prerequisites

  • Python 3.6 or later
  • Git

Setting Up Development Environment

# Clone the repository
git clone https://github.com/fam007e/VCFmerger.git
cd VCFmerger

# Install in development mode with dev dependencies
pip install -e .[dev]

# Run tests
pytest

# Check code quality
pylint merge_script.py

# Format code
black merge_script.py

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=merge_script

# Run specific test
pytest test_*.py

Code Quality

The project maintains high code quality standards:

# Pylint check (should score 10/10)
pylint merge_script.py

# Type checking
mypy merge_script.py

# Code formatting
black merge_script.py --check

File Structure

VCFmerger/
├── merge_script.py          # Main merger script
├── setup.py                 # Package installation script (legacy)
├── pyproject.toml          # Modern Python project configuration
├── __init__.py             # Package initialization
├── README.md               # This file
├── LICENSE                 # MIT License
├── requirements.txt        # Runtime dependencies (empty - no deps)
├── .gitignore             # Git ignore rules
└── tests/                 # Test files (if any)
    ├── test_*.py          # Test modules
    └── sample_data/       # Test data files
        ├── contacts1.vcf
        └── contacts2.vcf

Contributing

Contributions are welcome! We value your input and want to make contributing to this project as easy and transparent as possible.

Please read our Contributing Guidelines for details on our code of conduct, and the process for submitting pull requests to us.

Community Standards

  • Code of Conduct: We are committed to providing a friendly, safe and welcoming environment for all. Please read and respect our Code of Conduct.
  • Security: If you discover a security vulnerability, please see our Security Policy for reporting instructions.

Troubleshooting

Common Issues

Issue: "No valid input VCF files found"

  • Solution: Check file paths and ensure VCF files contain valid vCard data

Issue: Encoding errors with special characters

  • Solution: The script handles UTF-8 with error handling, but ensure your VCF files are properly encoded

Issue: Large files processing slowly

  • Solution: The script processes files sequentially; consider splitting very large files

Getting Help

Changelog

Version 1.0.0 (Initial Release)

  • Smart duplicate detection based on names, phones, and emails
  • Intelligent property merging with priority handling
  • Command-line interface with multiple input support
  • Python API for programmatic usage
  • Comprehensive error handling and logging
  • Cross-platform compatibility
  • Zero external dependencies
  • 10/10 pylint score with full type hints

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Inspired by the need to merge contact backups from multiple sources
  • Built with Python's robust standard library
  • Thanks to the open-source community for feedback and contributions

Author

Faisal Ahmed Moshiur


Star this repository if you find it useful!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcf_merger-2025.12.7.post2.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcf_merger-2025.12.7.post2-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file vcf_merger-2025.12.7.post2.tar.gz.

File metadata

  • Download URL: vcf_merger-2025.12.7.post2.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcf_merger-2025.12.7.post2.tar.gz
Algorithm Hash digest
SHA256 42d68ae6481ebda35f5b66eb1bde22540c6b92586e539f584668f3fea227f47c
MD5 06a679206165af482a70ad55fe81cc2f
BLAKE2b-256 c384b1c8afd5396bfe30a363ce7069c12b248ab0398f44eaed128abb12d0ddfc

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcf_merger-2025.12.7.post2.tar.gz:

Publisher: release.yml on fam007e/vcfmerger

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcf_merger-2025.12.7.post2-py3-none-any.whl.

File metadata

File hashes

Hashes for vcf_merger-2025.12.7.post2-py3-none-any.whl
Algorithm Hash digest
SHA256 59d6417b02e79defe2cbe8be3971406c4510944dbcd5ed665137b8d222a7b67b
MD5 8fbf1b9c76421c2c42d139f2d7588204
BLAKE2b-256 db2aa5568eb46ea990987080e2fe8061b0c0518b722b3b234693d39c721663f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcf_merger-2025.12.7.post2-py3-none-any.whl:

Publisher: release.yml on fam007e/vcfmerger

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page