Skip to main content

MBOX to JSON Converter with attachment extraction and cross-referencing

Project description

Open in Visual Studio Code

Prakhar Sharma - LinkedIn
Adrita Bhattacharya - LinkedIn

MBOX to JSON

A command line tool to convert MBOX file to JSON.
Explore the docs » (Currently NA)

View Demo · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. License
  6. Contact

About The Project

A small package that converts MBOX files to JSON. Also includes functionality to extract attachments with complete traceability.

✨ Key Features:

  • 📧 Convert MBOX to JSON or CSV format
  • 📎 Extract attachments with metadata tracking
  • 🔗 Cross-reference attachments to source emails
  • 📊 Split large outputs into manageable chunks
  • 🛡️ Robust error handling and logging
  • ⚡ Modern Python packaging with flexible dependencies

(back to top)

Built With

(back to top)

Getting Started

There are 2 ways to install this tool.

Prerequisites

Make sure you upgrade pip before moving on.
All the required dependencies are in requirements.txt which would be installed at the time of running the setup.

pip install --upgrade pip

1. Install from PyPI

pip install mbox-to-json

2. Install from GitHub

  1. Download the repository as zip. Unzip.

  2. cd to the repository folder

  3. Run this command

    pip install .
    

(back to top)

Usage

  • Help Function

    mbox-to-json -h
    
  • Most basic conversion from MBOX to JSON. Just provide the file path. Output JSON file would be in the same location as the input file.

    mbox-to-json /Users/prakhar/downloads/random_file.mbox
    
  • Use -a flag to extract attachments. The files would be available in input_file_directory/attachments

    mbox-to-json /Users/prakhar/downloads/random_file.mbox -a
    
  • Use -a --skip-attachment-metadata to extract attachments but keep JSON/CSV output clean (without attachment metadata)

    mbox-to-json /Users/prakhar/downloads/random_file.mbox -a --skip-attachment-metadata
    
  • Use -c flag to convert to CSV instead of JSON. Output CSV file would be in the same location as the input file.

    mbox-to-json /Users/prakhar/downloads/random_file.mbox -c
    
  • Use -s to split large output into multiple files

    mbox-to-json /Users/prakhar/downloads/random_file.mbox -s 3
    
  • Use --max-payload-size to set maximum email payload size in MB (default: 10MB)

    mbox-to-json /Users/prakhar/downloads/random_file.mbox --max-payload-size 50
    
  • Use --max-body-part-size to set maximum body part size in MB (default: 1MB)

    mbox-to-json /Users/prakhar/downloads/random_file.mbox --max-body-part-size 5
    
  • Use --max-recursion-depth to set maximum recursion depth for nested emails (default: 50)

    mbox-to-json /Users/prakhar/downloads/random_file.mbox --max-recursion-depth 100
    
  • Use --workers to set number of parallel workers (default: 1, automatically limited by CPU cores)

    mbox-to-json /Users/prakhar/downloads/random_file.mbox --workers 4
    
  • Use --enable-parallel to force parallel processing regardless of file size or message count

    mbox-to-json /Users/prakhar/downloads/random_file.mbox --workers 4 --enable-parallel
    
  • Use -o to specify the output file location. Make sure to provide the file name too, with the extension JSON (or CSV)

    mbox-to-json /Users/prakhar/downloads/random_file.mbox -o /Users/prakhar/downloads/random_output.json
    

For more examples, please refer to the Documentation

Output Files

When using the -a flag, mbox-to-json creates several output files for complete attachment tracking:

  • Main output file (JSON/CSV): Contains email data with attachment metadata (unless --skip-attachment-metadata is used)
  • *_attachments_manifest.json: Complete inventory of all attachments with source email references
  • attachments/ folder: Contains extracted attachment files
  • Individual .metadata.json files: Detailed metadata for each extracted attachment
  • extraction_map.json: Complete mapping of attachments to source emails

Memory Optimization for Large Files

For large MBOX files that may cause memory issues or recursion errors, you can adjust processing parameters:

# For very large files - increase payload limits and reduce batch size
mbox-to-json large_inbox.mbox --max-payload-size 50 --batch-size 500

# For systems with limited memory - reduce limits
mbox-to-json inbox.mbox --max-payload-size 5 --max-body-part-size 0.5 --batch-size 2000

# For deeply nested email threads - increase recursion depth
mbox-to-json complex_threads.mbox --max-recursion-depth 100

# Use parallel processing for faster performance (automatically uses available CPU cores)
mbox-to-json large_inbox.mbox --workers 4 --batch-size 500

# Force parallel processing for smaller files that don't meet automatic thresholds
mbox-to-json medium_inbox.mbox --workers 4 --enable-parallel

# Disable parallel processing entirely (use serial processing)
mbox-to-json any_inbox.mbox --workers 1

# Combine options for optimal performance with parallel processing
mbox-to-json inbox.mbox -a -c --workers 8 --enable-parallel --max-payload-size 20 --batch-size 250 -o output.csv

Performance Tips

  • Intelligent Parallel Processing: Automatically enabled for files ≥200MB with ≥1000 messages
  • Force Parallel Processing: Use --enable-parallel to override automatic decision for any file size
  • Worker Optimization: Set --workers to match your CPU core count for maximum performance
  • Memory Management: Adjust --batch-size based on available RAM (lower for limited memory)
  • Large Files: Increase --max-payload-size for files with large attachments
  • Processing Mode: Tool will log why parallel/serial processing was chosen

(back to top)

Roadmap

  • TBA

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

LinkedIn - Prakhar Sharma, Adrita Bhattacharya

Github - PS1607, adritabhattacharya

Google Developer - PS1607

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mbox_to_json-2.0.0.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mbox_to_json-2.0.0-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file mbox_to_json-2.0.0.tar.gz.

File metadata

  • Download URL: mbox_to_json-2.0.0.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for mbox_to_json-2.0.0.tar.gz
Algorithm Hash digest
SHA256 947fc435c538cc03c4463695d6f0046ad1b379223b89f12b0aec96eca8064254
MD5 85f93e0b55511a9b4833f17321a239ed
BLAKE2b-256 817b84cbf07a5064e4f7d7938f7dc7ae8551018308d84745860500c135df0e77

See more details on using hashes here.

File details

Details for the file mbox_to_json-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: mbox_to_json-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for mbox_to_json-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 696c8c30fe91b1caa46c55e999584d607001bd18751acb39e0cdc058bdf2a06b
MD5 9f6d875602f80acbbbfe56a3c8678096
BLAKE2b-256 7f77cbcf6baa03e9e9299a5ca368aaf40abed5d5b493655cf7e466c874cb7931

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page