Skip to main content

Webcam book scanner for macOS — capture, dewarp, binarize, OCR to searchable PDF/Markdown.

Project description

Release CI Issues Python 3.12 Platforms Made in France License: PolyForm Shield


Aglaïa

Turn a webcam and a stack of pages into clean, deskewed, dewarped, searchable PDFs — locally, on your machine.
Explore the docs »

Website · Download · Report Bug · Request Feature

[!WARNING] Alpha software — it will crash. Aglaïa is under active development and still in testing. It is fairly well tested on macOS, but not yet on Linux or Windows — expect bugs and rough edges there. Keep your originals; don't rely on it for anything irreplaceable. Bug reports welcome.

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. How It Works
  5. Roadmap
  6. Contributing
  7. License
  8. Contact
  9. Acknowledgments

About The Project

The goal of this project is to provide a simple, comprehensive, extendable, tool for end-to-end book scanning.

Book scanning is usually done using custom rigs — eg Custom rigs on diybookscanner.org

This is a fine way of doing things which has many advantages : you control the perspective, the geometry, the illumination, … BUT it is not portable and requires a quite substancial time and resources investment.

At the other end of the spectrum you have ugly phone pictures, and a myriad of "Free Scanner" aps on the iOS appStore which all repackage the simple homography and detection primitives of the Apple Vision SDK : if you are not scanning flat sheets of paper you are out of luck

Aglaia wants to do the following : Provide a comprehensive solution to book scanning, ie allow to use physical books and the tools you have (a laptop and a phone, maybe a camera) in any situation you could be (at your desk, on bench outside, at the libray…) to produce high-quality digital materials suitable for :

  • archiving, indexing and printing : clean, OCRed PDFs
  • feeding knowledgebases and AI research tools : well structured Markdown files

The slider section on the project website demonstrates this purpose : 3 different scanning situations

Aglaïa's purpose is similar to Scantailor's, but it tries to reduce the friction with better import (images, pdf or capture), exports (pdf with OCR, markdown, ...), built-in OCR, extendability through custom pipelines, exporters and plugins and more modern algorithms.

What the default pipeline will do to a standard book ?

In a few word Aglaïa will produce precisely cut binarized text pages with perspective and page curvature correction. On a modern laptop it can process roughly 1 page / sec.

To achieve it, it relies on :

  • a coarse scan-based deskewing followed by a robust and precise page extraction using lightweight ML text recognition models
  • a finer page based deskewing followed by a robust binarization which can tolerate very unequal illumination and handle border constraints
  • keystone and page curvature correction. This is the most computationally demanding part, handed out to JAX/CUDA or JAX/MLX libraries if available
  • a final replay to intelligently compose the coordinate and morphological transforms to avoid successive interpolation artifacts, especially severe on bilevel images

Operator composition (Smart replay) prevents re-interpolation artifacts

[!NOTE] Cross-platform. Native GUI builds ship for macOS (signed/notarized DMG, Apple Silicon), Windows (installer) and Linux (AppImage), plus pip install aglaia on any platform. On macOS, Apple Vision powers page detection and on-device OCR; off macOS, Aglaïa falls back to EAST/DBnet for layout and to Surya / PaddleOCR-VL / Mistral for OCR. Voice control (Vosk) is offline and cross-platform.

(back to top)

How much does it costs ?

It's free. Donations are appreciated to help cover developpement costs (signing and notarization, AI coding tool)

Built With

  • Python managed with uv
  • PySide6 — cross-platform desktop GUI
  • OpenCV · NumPy · SciPy · Pillow — image processing
  • page-dewarp + JAX / MLX — cubic-sheet page dewarp (MLX on Apple Silicon). The original project has been highly modified and extended
  • doxapy — binarization (Wolf / Sauvola)
  • pikepdf · pypdfium2 — PDF I/O
  • Apple Vision · Speech (pyobjc, macOS) — OCR, layout, with EAST/DBnet fallbacks
  • Surya · PaddleOCR-VL · Mistral Document AI — cross-platform OCR engines
  • Vosk — offline voice control (cross-platform)
  • SQLite (FTS5) — project + full-text store

(back to top)

Getting Started

Download

Grab the latest build for your platform — the release CI publishes fixed-name "latest" artifacts, so these links never go stale:

Platform Download Notes
macOS (Apple Silicon) Aglaia-macos-arm64.dmg Signed + notarized. Open and drag to Applications.
Windows (x64) Aglaia-windows-x64-setup.exe Installer; registers the .agl file type. Not code-signed — SmartScreen will warn; click More info → Run anyway. Verify with SHA256SUMS-windows.txt.
Linux (x86_64) Aglaia-x86_64.AppImage CPU dewarp. chmod +x, then run. Needs FUSE (fuse2).
Linux GPU (x86_64, NVIDIA) Aglaia-x86_64-cuda.AppImage GPU-accelerated dewarp — slim CUDA runtime bundled (no source/--extra cuda install needed). Needs an NVIDIA driver + FUSE (fuse2); falls back to CPU without a GPU.

Install from the command line

CLI-only or scripted setups (any platform) install via uv or pip:

# from PyPI — installs the `aglaia` command
pip install aglaia                  # lean base: headless pipeline, no Qt
pip install "aglaia[gui]"           # Windows / Linux GUI (Qt)
pip install "aglaia[gui,macos]"     # macOS GUI: Vision, Speech, MLX dewarp
# or build from source with the extras you want
git clone https://github.com/yb85/aglaia.git && cd aglaia
uv sync --extra gui --extra macos   # macOS GUI
uv sync --extra gui                 # Windows / Linux GUI
uv sync                             # headless: CLI pipeline, no Qt

Optional extras: --extra surya / --extra paddle (OCR engines), --extra voice (Vosk), --extra cloud (Mistral), --extra cuda (NVIDIA GPU dewarp on Linux).

First run (CLI-only installs): run the one-time setup to pick and download the offline models, seed the default pipelines, and bootstrap the config:

aglaia --setup        # interactive: choose models (DBnet, EAST, Surya…), download, configure

This is the terminal equivalent of the GUI's first-run wizard. A headless batch run refuses to start until the install is configured (--setup or the GUI). The GUI installs run the wizard automatically on first launch.

[!WARNING] Build with the right options The --extra options are mandatory to interface models and backends with python. If you download the models or install cuda drivers on your computer but forget to include teh relevant extra options, they won't be used

(back to top)

Usage

# Capture GUI (webcam + processing chain + voice control)
uv run aglaia ~/scans/my-book        # or just `aglaia …` once installed

# Headless CLI batch — same chain, no Qt
uv run aglaia ~/scans/my-book.agl --headless -p aglaia/config/pipelines/book_curved_x2.yaml

Key flags: --setup (first-run config), -c/--config, -p/--pipeline, --workers, --export, --ocr, --input-dpi, --headless, --camera-id. The import panel accepts multiple images and PDFs (per-page extract or render). Page detection defaults to DBnet (auto resolves DBnet → Apple Vision on macOS → EAST); aglaia --setup and the in-app downloader fetch the models, or drop them into ./model/ / ./models/.

For the full guide, see the documentation.

(back to top)

How It Works

capture → DPI fix → deskew → layout detect → keystone → dewarp → binarize → OCR → export

Every step is a pluggable processor defined in a YAML pipeline. Add your own by dropping aglaia/processors/<NewProc>.py (the registry auto-discovers it) — or, at runtime, drop a .py into <APP_DATA>/plugins/ and approve it in the trust prompt. See Architecture and Processors.

(back to top)

Contributing

Work is tracked via GitHub issues + milestones — one issue per discrete unit of work. Before non-trivial work, open an issue. Branch names reference it (feat/123-slug); PRs close via Closes #N.

  1. Fork the project
  2. Create your branch (git checkout -b feat/123-amazing-feature)
  3. Make changes; keep ruff, mypy --strict, and pytest green
  4. Commit (git commit -m 'feat: add amazing feature')
  5. Push and open a Pull Request

(back to top)

License

Source-available under the PolyForm Shield License 1.0.0 — see LICENSE.

[!WARNING] This is not strictly-speaking "open-source" !

the reason is not to make it one day a commercial product, but to avoid trivial SaaS repackaging which hurts the developpment of free Apps.

You may use, modify, and redistribute the software for any purpose except building a product that competes with it. Otherwise free.

Repackaging it and removing the donation link is direct competition.

(back to top)

Contact

aglaia@bibli.cc

Project: github.com/yb85/aglaia · Website: aglaia.bibli.cc

(back to top)

Acknowledgments

See ABOUT page.

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aglaia-0.1.0rc1.tar.gz (15.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aglaia-0.1.0rc1-py3-none-any.whl (15.3 MB view details)

Uploaded Python 3

File details

Details for the file aglaia-0.1.0rc1.tar.gz.

File metadata

  • Download URL: aglaia-0.1.0rc1.tar.gz
  • Upload date:
  • Size: 15.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for aglaia-0.1.0rc1.tar.gz
Algorithm Hash digest
SHA256 7ae6ef107e4d771c89dcd1c31bbebe9e00897dc10420dffc1b1fdf0370dec31c
MD5 af53dea597ae256419c0ef5bc3e2be91
BLAKE2b-256 dc6b9dbbc1d379c0f9d66b22880399768c78082740a620c3e6b858722bdc11f7

See more details on using hashes here.

File details

Details for the file aglaia-0.1.0rc1-py3-none-any.whl.

File metadata

  • Download URL: aglaia-0.1.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 15.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for aglaia-0.1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 e1732a35d29a7907b81071a85ceab0fe36c16dfb3574db10c9db278243fe7742
MD5 b4a09fd7037ecd674175196b148e96f9
BLAKE2b-256 5fad3a58872baab3801d02a20f4befa28677526c000b2b0dbaf45862def98222

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page