A comprehensive toolkit for BSON file manipulation
Project description
🛠️ BSON Tools
A powerful toolkit for analyzing, maintaining, and transforming BSON files with ease. Perfect for MongoDB database maintenance, data migration, and debugging.
✨ Features
📊 Analysis
- Generate comprehensive statistics about BSON contents
- Field name frequencies and data type distribution
- Document size analysis
- Array field detection
- Date range identification
- Structural validation
🔄 Transformation
- Convert BSON to JSON with proper type handling
- Remove duplicate documents
- Clean invalid UTF-8 and corrupt BSON data
- Trim files to specific document counts
- Remove specific documents
🔍 Validation
- Full structural validation
- UTF-8 encoding verification
- Size field validation
- Detailed error reporting
- Integrity checking
🚀 Quick Start
Installation
pip install bson-tools
Basic Usage
# Analyze a BSON file
bson-tools analyze input.bson
# Convert to JSON
bson-tools export input.bson -o output.json
# Remove duplicates
bson-tools deduplicate input.bson -o clean.bson
# Validate file
bson-tools validate suspect.bson
📖 Detailed Usage
Analysis
Get detailed information about your BSON file:
bson-tools analyze large-collection.bson
Output example:
{
"total_documents": 1000,
"total_size_bytes": 2048576,
"avg_doc_size_bytes": 2048.58,
"field_names": {
"_id": 1000,
"name": 985,
"data.nested": 750
},
"data_types": {
"ObjectId": 1000,
"string": 1985,
"int": 750
},
"array_fields": ["tags", "categories"],
"date_range": {
"min": "2023-01-01T00:00:00",
"max": "2024-12-31T23:59:59"
}
}
Transformation
Export to JSON
bson-tools export input.bson -o output.json
Remove Duplicates
bson-tools deduplicate input.bson -o deduped.bson
Trim File
bson-tools trim input.bson -o trimmed.bson -n 1000
Clean Invalid Documents
bson-tools clean corrupt.bson -o clean.bson
Validation
Run comprehensive validation:
bson-tools validate suspect.bson
Output example:
{
"valid_documents": 995,
"invalid_documents": 5,
"errors": [
"Document 996: Invalid UTF-8 encoding",
"Document 998: Truncated document"
],
"warnings": [
"File size mismatch detected"
],
"integrity_check": false
}
🎯 Common Use Cases
Database Maintenance
- Validate BSON dumps before restoration
- Clean corrupted backup files
- Remove duplicate documents
- Analyze collection structure
Data Migration
- Convert BSON to JSON for processing
- Validate data integrity
- Transform document structure
- Clean invalid entries
Debugging
- Analyze document structure
- Identify data type mismatches
- Locate corrupt documents
- Verify file integrity
🔧 Advanced Usage
Compare Two Files
bson-tools compare original.bson --compare-with modified.bson
Custom Output Format
bson-tools analyze input.bson --format yaml
Quiet Mode
bson-tools transform input.bson -o output.bson --quiet
📝 Command Reference
bson-tools <command> [options]
Commands:
analyze Generate statistics about BSON contents
export Convert BSON to JSON
deduplicate Remove duplicate documents
validate Check file integrity
clean Remove invalid documents
trim Keep first N documents
transform Apply custom transformations
compare Compare two BSON files
Options:
--output, -o Output file path
--number, -n Document number for trim
--compare-with Second file for comparison
--quiet, -q Suppress progress messages
--format Output format (json|yaml)
⚠️ Best Practices
-
Always Backup
- Keep original files until verification
- Test on sample data first
- Verify output integrity
-
Performance
- Use quiet mode for scripts
- Monitor memory usage
- Process large files in stages
-
Validation
- Validate files after operations
- Check output file sizes
- Verify document counts
🤝 Contributing
Contributions are welcome! Areas for improvement:
- Additional validation checks
- Performance optimizations
- New transformation features
- Extended format support
- Documentation improvements
📄 License
MIT License - Feel free to use and modify as needed.
🛟 Support
- Report issues on GitHub
- Check documentation
- Contact maintainers
🔍 Debugging Tips
If you encounter issues:
- Run validation first
- Check file permissions
- Verify input file integrity
- Monitor system resources
- Enable verbose logging
🏗️ Architecture
The toolkit is built with modularity in mind:
bson_tools/
├── processor/
│ ├── analyzer.py
│ ├── transformer.py
│ └── validator.py
├── utils/
│ ├── logging.py
│ └── progress.py
└── cli.py
Each component is independent and focused on specific tasks.
Made with ❤️
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bson_tools-0.1.0.tar.gz.
File metadata
- Download URL: bson_tools-0.1.0.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d948a50ff9fb48a931386cf0b66ed903ff8edb52b58d5de4b70efef9ea2be5f1
|
|
| MD5 |
db9096bfd918ac0f71afff3b0d8680f5
|
|
| BLAKE2b-256 |
38456f682a5786bdcc7fc967f2c081783aa4e46cca99b64c8cff808aaf4ad7cd
|
File details
Details for the file bson_tools-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bson_tools-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69bd7e5f111afcbab4a07d2df1fdd61496a76a90b900cdaf2024eb69e7e23c01
|
|
| MD5 |
da912022444bcd4418e1882a005e4187
|
|
| BLAKE2b-256 |
9566dbbda490fa62cf6c09066f5f707966613d21f0cdc7f3f616267591513489
|