Skip to main content

A comprehensive toolkit for BSON file manipulation

Project description

🛠️ BSON Tools

Python 3.6+ License: MIT Code style: black

A powerful toolkit for analyzing, maintaining, and transforming BSON files with ease. Perfect for MongoDB database maintenance, data migration, and debugging.

✨ Features

📊 Analysis

  • Generate comprehensive statistics about BSON contents
  • Field name frequencies and data type distribution
  • Document size analysis
  • Array field detection
  • Date range identification
  • Structural validation

🔄 Transformation

  • Convert BSON to JSON with proper type handling
  • Remove duplicate documents
  • Clean invalid UTF-8 and corrupt BSON data
  • Trim files to specific document counts
  • Remove specific documents

🔍 Validation

  • Full structural validation
  • UTF-8 encoding verification
  • Size field validation
  • Detailed error reporting
  • Integrity checking

🚀 Quick Start

Installation

pip install bson-tools

Basic Usage

# Analyze a BSON file
bson-tools analyze input.bson

# Convert to JSON
bson-tools export input.bson -o output.json

# Remove duplicates
bson-tools deduplicate input.bson -o clean.bson

# Validate file
bson-tools validate suspect.bson

📖 Detailed Usage

Analysis

Get detailed information about your BSON file:

bson-tools analyze large-collection.bson

Output example:

{
  "total_documents": 1000,
  "total_size_bytes": 2048576,
  "avg_doc_size_bytes": 2048.58,
  "field_names": {
    "_id": 1000,
    "name": 985,
    "data.nested": 750
  },
  "data_types": {
    "ObjectId": 1000,
    "string": 1985,
    "int": 750
  },
  "array_fields": ["tags", "categories"],
  "date_range": {
    "min": "2023-01-01T00:00:00",
    "max": "2024-12-31T23:59:59"
  }
}

Transformation

Export to JSON

bson-tools export input.bson -o output.json

Remove Duplicates

bson-tools deduplicate input.bson -o deduped.bson

Trim File

bson-tools trim input.bson -o trimmed.bson -n 1000

Clean Invalid Documents

bson-tools clean corrupt.bson -o clean.bson

Validation

Run comprehensive validation:

bson-tools validate suspect.bson

Output example:

{
  "valid_documents": 995,
  "invalid_documents": 5,
  "errors": [
    "Document 996: Invalid UTF-8 encoding",
    "Document 998: Truncated document"
  ],
  "warnings": [
    "File size mismatch detected"
  ],
  "integrity_check": false
}

🎯 Common Use Cases

Database Maintenance

  • Validate BSON dumps before restoration
  • Clean corrupted backup files
  • Remove duplicate documents
  • Analyze collection structure

Data Migration

  • Convert BSON to JSON for processing
  • Validate data integrity
  • Transform document structure
  • Clean invalid entries

Debugging

  • Analyze document structure
  • Identify data type mismatches
  • Locate corrupt documents
  • Verify file integrity

🔧 Advanced Usage

Compare Two Files

bson-tools compare original.bson --compare-with modified.bson

Custom Output Format

bson-tools analyze input.bson --format yaml

Quiet Mode

bson-tools transform input.bson -o output.bson --quiet

📝 Command Reference

bson-tools <command> [options]

Commands:
  analyze     Generate statistics about BSON contents
  export      Convert BSON to JSON
  deduplicate Remove duplicate documents
  validate    Check file integrity
  clean       Remove invalid documents
  trim        Keep first N documents
  transform   Apply custom transformations
  compare     Compare two BSON files

Options:
  --output, -o     Output file path
  --number, -n     Document number for trim
  --compare-with   Second file for comparison
  --quiet, -q      Suppress progress messages
  --format         Output format (json|yaml)

⚠️ Best Practices

  1. Always Backup

    • Keep original files until verification
    • Test on sample data first
    • Verify output integrity
  2. Performance

    • Use quiet mode for scripts
    • Monitor memory usage
    • Process large files in stages
  3. Validation

    • Validate files after operations
    • Check output file sizes
    • Verify document counts

🤝 Contributing

Contributions are welcome! Areas for improvement:

  • Additional validation checks
  • Performance optimizations
  • New transformation features
  • Extended format support
  • Documentation improvements

📄 License

MIT License - Feel free to use and modify as needed.

🛟 Support

  • Report issues on GitHub
  • Check documentation
  • Contact maintainers

🔍 Debugging Tips

If you encounter issues:

  1. Run validation first
  2. Check file permissions
  3. Verify input file integrity
  4. Monitor system resources
  5. Enable verbose logging

🏗️ Architecture

The toolkit is built with modularity in mind:

bson_tools/
├── processor/
│   ├── analyzer.py
│   ├── transformer.py
│   └── validator.py
├── utils/
│   ├── logging.py
│   └── progress.py
└── cli.py

Each component is independent and focused on specific tasks.


Made with ❤️

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bson_tools-0.1.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bson_tools-0.1.0-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file bson_tools-0.1.0.tar.gz.

File metadata

  • Download URL: bson_tools-0.1.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for bson_tools-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d948a50ff9fb48a931386cf0b66ed903ff8edb52b58d5de4b70efef9ea2be5f1
MD5 db9096bfd918ac0f71afff3b0d8680f5
BLAKE2b-256 38456f682a5786bdcc7fc967f2c081783aa4e46cca99b64c8cff808aaf4ad7cd

See more details on using hashes here.

File details

Details for the file bson_tools-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bson_tools-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for bson_tools-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69bd7e5f111afcbab4a07d2df1fdd61496a76a90b900cdaf2024eb69e7e23c01
MD5 da912022444bcd4418e1882a005e4187
BLAKE2b-256 9566dbbda490fa62cf6c09066f5f707966613d21f0cdc7f3f616267591513489

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page