Skip to main content

Email ingestion for AI agents and data applications.

Project description

MailAtlas

MailAtlas turns email files and manually synced IMAP folders into cleaned text, HTML, assets, metadata, and exportable artifacts for applications.

MailAtlas has two input paths:

  • ingest email files already on disk with ingest
  • connect to a live mailbox with sync and fetch selected folders manually

An mbox file is a mailbox file on disk. It is not the same thing as IMAP sync.

MailAtlas produces:

  • cleaned body text
  • normalized HTML snapshots when the message contains HTML
  • extracted inline images and attachments
  • document metadata and provenance
  • JSON, Markdown, HTML, and PDF exports from stored documents
  • manual, incremental IMAP sync into the same local store

MailAtlas is a library and CLI for parsing, storing, and exporting email for AI agents, retrieval systems, analytics pipelines, and archival systems.

Why MailAtlas

  • Turn raw email into cleaned text, HTML, inline images, file attachments, and metadata.
  • Preserve provenance, forwarded chains, inline images, and regular attachments.
  • Apply configurable cleaning for boilerplate, wrappers, footer noise, and link-only lines.
  • Export JSON, Markdown, HTML, and PDF artifacts from stored documents.
  • Manually sync selected IMAP folders without storing mailbox credentials in the local store.
  • Start with the built-in filesystem and SQLite store, then copy the resulting files and metadata into your own storage stack if needed.

Project Status

MailAtlas is currently alpha. Expect the CLI, stored schema, and release tooling to keep improving, but the repository is set up for public contribution with synthetic fixtures, CI, release artifacts, and package smoke checks.

Install

pip

python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install mailatlas
mailatlas doctor

If you want the optional API extra from PyPI:

python -m pip install "mailatlas[api]"

uv

python3.12 -m pip install uv
uv tool install mailatlas
mailatlas doctor

brew

brew tap mailatlas/mailatlas
brew install mailatlas
mailatlas doctor

If Homebrew resolves a different formula named mailatlas, use:

brew install mailatlas/mailatlas/mailatlas

From source

Use a source checkout when you want to run the shipped fixtures, the demo API, or contribute to the project:

python3.12 -m venv .venv
source .venv/bin/activate
make bootstrap-python
mailatlas doctor

If you are changing the docs site too:

make bootstrap-docs

Run make help to see the full local command surface.

Verify The Install

mailatlas doctor

mailatlas doctor runs a temporary self-check that verifies ingest, storage, and JSON export. It also checks PDF export when Chrome or Chromium is available, and reports a warning instead of failing if the browser is missing.

Local Store

By default, MailAtlas writes to .mailatlas in the current directory:

  • store.db
  • raw/
  • html/
  • assets/
  • exports/

Set MAILATLAS_HOME once if you want MailAtlas to reuse a different root automatically:

export MAILATLAS_HOME="$PWD/.mailatlas"

You can also override the root per command with --root.

Next Steps

Core Use Cases

  • Build a retrieval corpus from mailbox exports.
  • Feed agents cleaned email text without losing links to raw messages and attachments.
  • Generate reviewable PDF artifacts from stored HTML or cleaned text fallback.
  • Normalize inbound email for analytics, retention, or archival processing.
  • Inspect and test parser behavior against known synthetic fixtures.

CLI Examples

Auto-detect and ingest an mbox archive:

mailatlas ingest data/fixtures/atlas-demo.mbox

Manual IMAP sync is incremental by folder and stores only non-secret cursor state:

export MAILATLAS_IMAP_HOST=imap.example.com
export MAILATLAS_IMAP_USERNAME=user@example.com
export MAILATLAS_IMAP_ACCESS_TOKEN=oauth-access-token

mailatlas sync \
  --folder INBOX \
  --folder Newsletters

MailAtlas consumes the access token you already have. It does not run a browser login flow or act as your OAuth client.

Parser cleanup is configurable:

mailatlas ingest data/fixtures/atlas-founder-forward.eml \
  --no-strip-forwarded-headers \
  --no-strip-boilerplate

Python API Example

from mailatlas import ImapSyncConfig, MailAtlas, ParserConfig

atlas = MailAtlas(
    db_path=".mailatlas/store.db",
    workspace_path=".mailatlas",
    parser_config=ParserConfig(strip_boilerplate=True, stop_at_footer=True),
)

parsed = atlas.parse_eml(
    "data/fixtures/atlas-founder-forward.eml",
)

refs = atlas.ingest_eml(
    ["data/fixtures/atlas-market-map.eml", "data/fixtures/atlas-inline-chart.eml"],
)

sync_result = atlas.sync_imap(
    ImapSyncConfig(
        host="imap.example.com",
        username="user@example.com",
        password="app-password",
        folders=("INBOX", "Newsletters"),
    )
)

pdf_path = atlas.export_document(
    refs[0].id,
    format="pdf",
)

Default Storage Layout

MailAtlas writes ordinary files to the filesystem and indexes them in SQLite by default:

  • raw/ for original message bytes
  • html/ for normalized HTML bodies when present
  • assets/ for extracted inline and attached files
  • exports/ for JSON, HTML, and PDF file exports
  • store.db for the SQLite index and IMAP sync cursors

These are ordinary files and metadata rows. If you are embedding MailAtlas inside a service, you can move them into your own blob store and database. PDF export uses headless Chrome or Chromium against the stored HTML snapshot when one exists, and falls back to generated HTML from cleaned text otherwise. Markdown export prints to stdout by default with absolute local asset paths, or writes a document.md plus copied assets/ bundle when you pass --out <directory>.

MailAtlas vs Alternatives

Option What it does well Where MailAtlas is stronger
Inbox connectors Convenient ad hoc question answering Repeatable ingestion, exported files, and traceable source records
Generic parsers Basic MIME parsing Cleaned text, HTML snapshots, assets, metadata conventions
One-off scripts Fast for a narrow task Better repeatability, packaging, examples, docs, and release path

Docs And Examples

Development

make test
make docs
make smoke-release
make demo-cli
make demo-parser
make doctor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mailatlas-0.2.0.tar.gz (39.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mailatlas-0.2.0-py3-none-any.whl (31.2 kB view details)

Uploaded Python 3

File details

Details for the file mailatlas-0.2.0.tar.gz.

File metadata

  • Download URL: mailatlas-0.2.0.tar.gz
  • Upload date:
  • Size: 39.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mailatlas-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e32618ba52ab6be5d72e206e4baf2cb798f0183223d8ed4f1165c21d45b27103
MD5 5455f66b916ace7f2336421dd4cede80
BLAKE2b-256 d75c8be44b475b28131b90a3e73c40256c57906ca7f18302d5bd9461d13357ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for mailatlas-0.2.0.tar.gz:

Publisher: release.yml on mailatlas/mailatlas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mailatlas-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mailatlas-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 31.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mailatlas-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 789397a1bb709fa0139e007390d7d5644a0e8bf0fd7968f44ae35e2adc845ae5
MD5 a2bfc74bbf4ccbd3ccc95bcf7a8194c6
BLAKE2b-256 156fd96f4501a822822de4af1e70cd1d5c68a118077bf0c26ffd6081a90edc2c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mailatlas-0.2.0-py3-none-any.whl:

Publisher: release.yml on mailatlas/mailatlas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page