Extracts email metadata and text from a PDF file
Project description
pdf2mbox
a command-line utility and Python package for converting PDF emails to MBOX format
Installation
pip install pdf2mbox
Usage
# from the command line
% python -m pdf2mbox --help
usage: pdf2mbox.py [-h] [--version] [--overwrite] [--csv [CSV]]
pdf_file [mbox_file]
Generates an mbox from a PDF containing emails
positional arguments:
pdf_file PDF file provided as input
mbox_file Mbox file generated as output
optional arguments:
-h, --help show this help message and exit
--version, -v show program's version number and exit
--overwrite, -o overwrite MBOX file if it exists
--csv [CSV] generate CSV file output
# from within python
from pdf2mbox import pdf2mbox
pe = pdf2mbox(pdf_file, mbox_file) # pe contains dict of emails
Notes
- The initial development of this package was funded in part by The Mellon Foundation’s “Email Archives: Building Capacity and Community” program.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdf2mbox-0.3.2.tar.gz
(3.7 kB
view details)
File details
Details for the file pdf2mbox-0.3.2.tar.gz.
File metadata
- Download URL: pdf2mbox-0.3.2.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8d754891f41a7e891fe6cf12e74fbef049fb6d9b752865363b5f17b088ddfae
|
|
| MD5 |
279edaad96e75042f77f97f1e9633b9c
|
|
| BLAKE2b-256 |
7dc22ae4c49a98516452141d8500d651766a711ba21de84e67a059b4e4eb4878
|