Email ingestion for AI agents and data applications.
Project description
MailAtlas
MailAtlas turns email files and manually synced IMAP folders into cleaned text, HTML, assets, metadata, and exportable artifacts for applications.
MailAtlas has two input paths:
- ingest email files already on disk with
ingest - connect to a live mailbox with
syncand fetch selected folders manually
An mbox file is a mailbox file on disk. It is not the same thing as IMAP sync.
MailAtlas produces:
- cleaned body text
- normalized HTML snapshots when the message contains HTML
- extracted inline images and attachments
- document metadata and provenance
- JSON, Markdown, HTML, and PDF exports from stored documents
- manual, incremental IMAP sync into the same local store
MailAtlas is a library and CLI for parsing, storing, and exporting email for AI agents, retrieval systems, analytics pipelines, and archival systems.
Why MailAtlas
- Turn raw email into cleaned text, HTML, inline images, file attachments, and metadata.
- Preserve provenance, forwarded chains, inline images, and regular attachments.
- Apply configurable cleaning for boilerplate, wrappers, footer noise, and link-only lines.
- Export JSON, Markdown, HTML, and PDF artifacts from stored documents.
- Manually sync selected IMAP folders without storing mailbox credentials in the local store.
- Start with the built-in filesystem and SQLite store, then copy the resulting files and metadata into your own storage stack if needed.
Project Status
MailAtlas is currently alpha. Expect the CLI, stored schema, and release tooling to keep improving, but the repository is set up for public contribution with synthetic fixtures, CI, release artifacts, and package smoke checks.
Install
pip
python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install mailatlas
mailatlas doctor
If you want the optional API extra from PyPI:
python -m pip install "mailatlas[api]"
uv
python3.12 -m pip install uv
uv tool install mailatlas
mailatlas doctor
brew
brew tap mailatlas/mailatlas
brew install mailatlas
mailatlas doctor
If Homebrew resolves a different formula named mailatlas, use:
brew install mailatlas/mailatlas/mailatlas
From source
Use a source checkout when you want to run the shipped fixtures, the demo API, or contribute to the project:
python3.12 -m venv .venv
source .venv/bin/activate
make bootstrap-python
mailatlas doctor
If you are changing the docs site too:
make bootstrap-docs
Run make help to see the full local command surface.
Verify The Install
mailatlas doctor
mailatlas doctor runs a temporary self-check that verifies ingest, storage, and JSON export. It
also checks PDF export when Chrome or Chromium is available, and reports a warning instead of
failing if the browser is missing.
Local Store
By default, MailAtlas writes to .mailatlas in the current directory:
store.dbraw/html/assets/exports/
Set MAILATLAS_HOME once if you want MailAtlas to reuse a different root automatically:
export MAILATLAS_HOME="$PWD/.mailatlas"
You can also override the root per command with --root.
Next Steps
- Use Quickstart walkthrough for the file-based path with the shipped fixtures.
- Use Manual IMAP sync when MailAtlas should connect to a live mailbox.
- Use CLI overview for the full command surface.
Core Use Cases
- Build a retrieval corpus from mailbox exports.
- Feed agents cleaned email text without losing links to raw messages and attachments.
- Generate reviewable PDF artifacts from stored HTML or cleaned text fallback.
- Normalize inbound email for analytics, retention, or archival processing.
- Inspect and test parser behavior against known synthetic fixtures.
CLI Examples
Auto-detect and ingest an mbox archive:
mailatlas ingest data/fixtures/atlas-demo.mbox
Manual IMAP sync is incremental by folder and stores only non-secret cursor state:
export MAILATLAS_IMAP_HOST=imap.example.com
export MAILATLAS_IMAP_USERNAME=user@example.com
export MAILATLAS_IMAP_ACCESS_TOKEN=oauth-access-token
mailatlas sync \
--folder INBOX \
--folder Newsletters
MailAtlas consumes the access token you already have. It does not run a browser login flow or act as your OAuth client.
Parser cleanup is configurable:
mailatlas ingest data/fixtures/atlas-founder-forward.eml \
--no-strip-forwarded-headers \
--no-strip-boilerplate
Python API Example
from mailatlas import ImapSyncConfig, MailAtlas, ParserConfig
atlas = MailAtlas(
db_path=".mailatlas/store.db",
workspace_path=".mailatlas",
parser_config=ParserConfig(strip_boilerplate=True, stop_at_footer=True),
)
parsed = atlas.parse_eml(
"data/fixtures/atlas-founder-forward.eml",
)
refs = atlas.ingest_eml(
["data/fixtures/atlas-market-map.eml", "data/fixtures/atlas-inline-chart.eml"],
)
sync_result = atlas.sync_imap(
ImapSyncConfig(
host="imap.example.com",
username="user@example.com",
password="app-password",
folders=("INBOX", "Newsletters"),
)
)
pdf_path = atlas.export_document(
refs[0].id,
format="pdf",
)
Default Storage Layout
MailAtlas writes ordinary files to the filesystem and indexes them in SQLite by default:
raw/for original message byteshtml/for normalized HTML bodies when presentassets/for extracted inline and attached filesexports/for JSON, HTML, and PDF file exportsstore.dbfor the SQLite index and IMAP sync cursors
These are ordinary files and metadata rows. If you are embedding MailAtlas inside a service, you
can move them into your own blob store and database. PDF export uses headless Chrome or Chromium
against the stored HTML snapshot when one exists, and falls back to generated HTML from cleaned text otherwise.
Markdown export prints to stdout by default with absolute local asset paths, or writes a
document.md plus copied assets/ bundle when you pass --out <directory>.
MailAtlas vs Alternatives
| Option | What it does well | Where MailAtlas is stronger |
|---|---|---|
| Inbox connectors | Convenient ad hoc question answering | Repeatable ingestion, exported files, and traceable source records |
| Generic parsers | Basic MIME parsing | Cleaned text, HTML snapshots, assets, metadata conventions |
| One-off scripts | Fast for a narrow task | Better repeatability, packaging, examples, docs, and release path |
Docs And Examples
- Installation guide
- Quickstart walkthrough
- Manual IMAP sync
- CLI overview
- Workspace model
- Document schema
- Parser cleaning config
- Why not connectors?
- Support
- Security policy
- Changelog
- Releasing
- Code of Conduct
- Contributing
Development
make test
make docs
make smoke-release
make demo-cli
make demo-parser
make doctor
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mailatlas-0.2.0.tar.gz.
File metadata
- Download URL: mailatlas-0.2.0.tar.gz
- Upload date:
- Size: 39.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e32618ba52ab6be5d72e206e4baf2cb798f0183223d8ed4f1165c21d45b27103
|
|
| MD5 |
5455f66b916ace7f2336421dd4cede80
|
|
| BLAKE2b-256 |
d75c8be44b475b28131b90a3e73c40256c57906ca7f18302d5bd9461d13357ea
|
Provenance
The following attestation bundles were made for mailatlas-0.2.0.tar.gz:
Publisher:
release.yml on mailatlas/mailatlas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mailatlas-0.2.0.tar.gz -
Subject digest:
e32618ba52ab6be5d72e206e4baf2cb798f0183223d8ed4f1165c21d45b27103 - Sigstore transparency entry: 1079015209
- Sigstore integration time:
-
Permalink:
mailatlas/mailatlas@dc7759c5b5fd250e688d48a925745cbb6e8f8705 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/mailatlas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@dc7759c5b5fd250e688d48a925745cbb6e8f8705 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mailatlas-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mailatlas-0.2.0-py3-none-any.whl
- Upload date:
- Size: 31.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
789397a1bb709fa0139e007390d7d5644a0e8bf0fd7968f44ae35e2adc845ae5
|
|
| MD5 |
a2bfc74bbf4ccbd3ccc95bcf7a8194c6
|
|
| BLAKE2b-256 |
156fd96f4501a822822de4af1e70cd1d5c68a118077bf0c26ffd6081a90edc2c
|
Provenance
The following attestation bundles were made for mailatlas-0.2.0-py3-none-any.whl:
Publisher:
release.yml on mailatlas/mailatlas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mailatlas-0.2.0-py3-none-any.whl -
Subject digest:
789397a1bb709fa0139e007390d7d5644a0e8bf0fd7968f44ae35e2adc845ae5 - Sigstore transparency entry: 1079015238
- Sigstore integration time:
-
Permalink:
mailatlas/mailatlas@dc7759c5b5fd250e688d48a925745cbb6e8f8705 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/mailatlas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@dc7759c5b5fd250e688d48a925745cbb6e8f8705 -
Trigger Event:
push
-
Statement type: