MBOX to JSON Converter with attachment extraction and cross-referencing
Project description
Prakhar Sharma -
Adrita Bhattacharya -
MBOX to JSON
A command line tool to convert MBOX file to JSON.
Explore the docs » (Currently NA)
View Demo
·
Report Bug
·
Request Feature
Table of Contents
About The Project
A small package that converts MBOX files to JSON. Also includes functionality to extract attachments with complete traceability.
✨ Key Features:
- 📧 Convert MBOX to JSON or CSV format
- 📎 Extract attachments with metadata tracking
- 🔗 Cross-reference attachments to source emails
- 📊 Split large outputs into manageable chunks
- 🛡️ Robust error handling and logging
- ⚡ Modern Python packaging with flexible dependencies
Built With
Getting Started
There are 2 ways to install this tool.
Prerequisites
Make sure you upgrade pip before moving on.
All the required dependencies are in requirements.txt which would be installed at the time of running the setup.
pip install --upgrade pip
1. Install from PyPI
pip install mbox-to-json
2. Install from GitHub
-
Download the repository as zip. Unzip.
-
cdto the repository folder -
Run this command
pip install .
Usage
-
Help Function
mbox-to-json -h -
Most basic conversion from MBOX to JSON. Just provide the file path. Output JSON file would be in the same location as the input file.
mbox-to-json /Users/prakhar/downloads/random_file.mbox -
Use
-aflag to extract attachments. The files would be available ininput_file_directory/attachmentsmbox-to-json /Users/prakhar/downloads/random_file.mbox -a
-
Use
-a --skip-attachment-metadatato extract attachments but keep JSON/CSV output clean (without attachment metadata)mbox-to-json /Users/prakhar/downloads/random_file.mbox -a --skip-attachment-metadata
-
Use
-cflag to convert to CSV instead of JSON. Output CSV file would be in the same location as the input file.mbox-to-json /Users/prakhar/downloads/random_file.mbox -c
-
Use
-sto split large output into multiple filesmbox-to-json /Users/prakhar/downloads/random_file.mbox -s 3
-
Use
--max-payload-sizeto set maximum email payload size in MB (default: 10MB)mbox-to-json /Users/prakhar/downloads/random_file.mbox --max-payload-size 50
-
Use
--max-body-part-sizeto set maximum body part size in MB (default: 1MB)mbox-to-json /Users/prakhar/downloads/random_file.mbox --max-body-part-size 5
-
Use
--max-recursion-depthto set maximum recursion depth for nested emails (default: 50)mbox-to-json /Users/prakhar/downloads/random_file.mbox --max-recursion-depth 100
-
Use
--workersto set number of parallel workers (default: 1, automatically limited by CPU cores)mbox-to-json /Users/prakhar/downloads/random_file.mbox --workers 4
-
Use
--enable-parallelto force parallel processing regardless of file size or message countmbox-to-json /Users/prakhar/downloads/random_file.mbox --workers 4 --enable-parallel
-
Use
-oto specify the output file location. Make sure to provide the file name too, with the extension JSON (or CSV)mbox-to-json /Users/prakhar/downloads/random_file.mbox -o /Users/prakhar/downloads/random_output.json
For more examples, please refer to the Documentation
Output Files
When using the -a flag, mbox-to-json creates several output files for complete attachment tracking:
- Main output file (JSON/CSV): Contains email data with attachment metadata (unless
--skip-attachment-metadatais used) *_attachments_manifest.json: Complete inventory of all attachments with source email referencesattachments/folder: Contains extracted attachment files- Individual
.metadata.jsonfiles: Detailed metadata for each extracted attachment extraction_map.json: Complete mapping of attachments to source emails
Memory Optimization for Large Files
For large MBOX files that may cause memory issues or recursion errors, you can adjust processing parameters:
# For very large files - increase payload limits and reduce batch size
mbox-to-json large_inbox.mbox --max-payload-size 50 --batch-size 500
# For systems with limited memory - reduce limits
mbox-to-json inbox.mbox --max-payload-size 5 --max-body-part-size 0.5 --batch-size 2000
# For deeply nested email threads - increase recursion depth
mbox-to-json complex_threads.mbox --max-recursion-depth 100
# Use parallel processing for faster performance (automatically uses available CPU cores)
mbox-to-json large_inbox.mbox --workers 4 --batch-size 500
# Force parallel processing for smaller files that don't meet automatic thresholds
mbox-to-json medium_inbox.mbox --workers 4 --enable-parallel
# Disable parallel processing entirely (use serial processing)
mbox-to-json any_inbox.mbox --workers 1
# Combine options for optimal performance with parallel processing
mbox-to-json inbox.mbox -a -c --workers 8 --enable-parallel --max-payload-size 20 --batch-size 250 -o output.csv
Performance Tips
- Intelligent Parallel Processing: Automatically enabled for files ≥200MB with ≥1000 messages
- Force Parallel Processing: Use
--enable-parallelto override automatic decision for any file size - Worker Optimization: Set
--workersto match your CPU core count for maximum performance - Memory Management: Adjust
--batch-sizebased on available RAM (lower for limited memory) - Large Files: Increase
--max-payload-sizefor files with large attachments - Processing Mode: Tool will log why parallel/serial processing was chosen
Roadmap
- TBA
License
Distributed under the MIT License. See LICENSE.txt for more information.
Contact
LinkedIn - Prakhar Sharma, Adrita Bhattacharya
Github - PS1607, adritabhattacharya
Google Developer - PS1607
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mbox_to_json-2.0.0.tar.gz.
File metadata
- Download URL: mbox_to_json-2.0.0.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
947fc435c538cc03c4463695d6f0046ad1b379223b89f12b0aec96eca8064254
|
|
| MD5 |
85f93e0b55511a9b4833f17321a239ed
|
|
| BLAKE2b-256 |
817b84cbf07a5064e4f7d7938f7dc7ae8551018308d84745860500c135df0e77
|
File details
Details for the file mbox_to_json-2.0.0-py3-none-any.whl.
File metadata
- Download URL: mbox_to_json-2.0.0-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
696c8c30fe91b1caa46c55e999584d607001bd18751acb39e0cdc058bdf2a06b
|
|
| MD5 |
9f6d875602f80acbbbfe56a3c8678096
|
|
| BLAKE2b-256 |
7f77cbcf6baa03e9e9299a5ca368aaf40abed5d5b493655cf7e466c874cb7931
|