Extracts email metadata and text from a PDF file
Project description
pdf2mbox
a command-line utility and Python package for converting PDF emails to MBOX format
Installation
pip install pdf2mbox
Usage
# from the command line
% python -m pdf2mbox --help
usage: pdf2mbox.py [-h] [--version] [--overwrite] [--csv [CSV]]
pdf_file [mbox_file]
Generates an mbox from a PDF containing emails
positional arguments:
pdf_file PDF file provided as input
mbox_file Mbox file generated as output
optional arguments:
-h, --help show this help message and exit
--version, -v show program's version number and exit
--overwrite, -o overwrite MBOX file if it exists
--csv [CSV] generate CSV file output
# from within python
from pdf2mbox import pdf2mbox
pe = pdf2mbox(pdf_file, mbox_file) # pe contains dict of emails
OS Dependencies
If you encounter errors installing pdf2mbox, please check the OS-level dependencies of both the pdftotext and python-magic packages to ensure you have the required libraries installed, as pdf2mbox utilizes both these packages.
Notes
- Assumes an email ends when a new email begins
- Works best with a standard email header (i.e., From:, To:, Sent:, Subject:)
- The initial development of this package was funded in part by The Mellon Foundation’s “Email Archives: Building Capacity and Community” program.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf2mbox-0.3.4.tar.gz.
File metadata
- Download URL: pdf2mbox-0.3.4.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a1696912a8678cea336f72b10c2721d207a2b98d7c4f4ac6649c9ba57749f32
|
|
| MD5 |
6f7fc0d42c4e516ad053ced081c49718
|
|
| BLAKE2b-256 |
076b49b3ee4e5eee49d56879c6e7687ca9d1f650891aa74e4222ebed9678f9f4
|
File details
Details for the file pdf2mbox-0.3.4-py3-none-any.whl.
File metadata
- Download URL: pdf2mbox-0.3.4-py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1e998ce2d3e5838531f1853c3718d205abfb33d3120d4fb68ded152ce16fdc8
|
|
| MD5 |
037839215dc93a16f0fdaa5cb24b04d0
|
|
| BLAKE2b-256 |
b4a5294179e2265fd488c6872a33bdfc3564b26c30fe6012e42516b0d3ec55aa
|