Webcam book scanner for macOS — capture, dewarp, binarize, OCR to searchable PDF/Markdown.
Project description
Turn a webcam and a stack of pages into clean, deskewed, dewarped,
searchable PDFs — locally, on your machine.
Explore the docs »
Website
·
Download
·
Report Bug
·
Request Feature
[!WARNING] Alpha software — it will crash. Aglaïa is under active development and still in testing. It is fairly well tested on macOS, but not yet on Linux or Windows — expect bugs and rough edges there. Keep your originals; don't rely on it for anything irreplaceable. Bug reports welcome.
Table of Contents
About The Project
The goal of this project is to provide a simple, comprehensive, extendable, tool for end-to-end book scanning.
Book scanning is usually done using custom rigs — eg Custom rigs on diybookscanner.org
This is a fine way of doing things which has many advantages : you control the perspective, the geometry, the illumination, … BUT it is not portable and requires a quite substancial time and resources investment.
At the other end of the spectrum you have ugly phone pictures, and a myriad of "Free Scanner" aps on the iOS appStore which all repackage the simple homography and detection primitives of the Apple Vision SDK : if you are not scanning flat sheets of paper you are out of luck
Aglaia wants to do the following : Provide a comprehensive solution to book scanning, ie allow to use physical books and the tools you have (a laptop and a phone, maybe a camera) in any situation you could be (at your desk, on bench outside, at the libray…) to produce high-quality digital materials suitable for :
- archiving, indexing and printing : clean, OCRed PDFs
- feeding knowledgebases and AI research tools : well structured Markdown files
The slider section on the project website demonstrates this purpose :
Aglaïa's purpose is similar to Scantailor's, but it tries to reduce the friction with better import (images, pdf or capture), exports (pdf with OCR, markdown, ...), built-in OCR, extendability through custom pipelines, exporters and plugins and more modern algorithms.
What the default pipeline will do to a standard book ?
In a few word Aglaïa will produce precisely cut binarized text pages with perspective and page curvature correction. On a modern laptop it can process roughly 1 page / sec.
To achieve it, it relies on :
- a coarse scan-based deskewing followed by a robust and precise page extraction using lightweight ML text recognition models
- a finer page based deskewing followed by a robust binarization which can tolerate very unequal illumination and handle border constraints
- keystone and page curvature correction. This is the most computationally demanding part, handed out to JAX/CUDA or JAX/MLX libraries if available
- a final replay to intelligently compose the coordinate and morphological transforms to avoid successive interpolation artifacts, especially severe on bilevel images
[!NOTE] Cross-platform. Native GUI builds ship for macOS (signed/notarized DMG, Apple Silicon), Windows (installer) and Linux (AppImage), plus
pip install aglaiaon any platform. On macOS, Apple Vision powers page detection and on-device OCR; off macOS, Aglaïa falls back to EAST/DBnet for layout and to Surya / PaddleOCR-VL / Mistral for OCR. Voice control (Vosk) is offline and cross-platform.
How much does it costs ?
It's free. Donations are appreciated to help cover developpement costs (signing and notarization, AI coding tool)
Built With
managed with uv
- PySide6 — cross-platform desktop GUI
- OpenCV · NumPy · SciPy · Pillow — image processing
- page-dewarp + JAX / MLX — cubic-sheet page dewarp (MLX on Apple Silicon). The original project has been highly modified and extended
- doxapy — binarization (Wolf / Sauvola)
- pikepdf · pypdfium2 — PDF I/O
- Apple Vision · Speech (pyobjc, macOS) — OCR, layout, with EAST/DBnet fallbacks
- Surya · PaddleOCR-VL · Mistral Document AI — cross-platform OCR engines
- Vosk — offline voice control (cross-platform)
- SQLite (FTS5) — project + full-text store
Getting Started
Download
Grab the latest build for your platform — the release CI publishes fixed-name "latest" artifacts, so these links never go stale:
| Platform | Download | Notes |
|---|---|---|
| macOS (Apple Silicon) | Aglaia-macos-arm64.dmg |
Signed + notarized. Open and drag to Applications. |
| Windows (x64) | Aglaia-windows-x64-setup.exe |
Installer; registers the .agl file type. Not code-signed — SmartScreen will warn; click More info → Run anyway. Verify with SHA256SUMS-windows.txt. |
| Linux (x86_64) | Aglaia-x86_64.AppImage |
CPU dewarp. chmod +x, then run. Needs FUSE (fuse2). |
| Linux GPU (x86_64, NVIDIA) | Aglaia-x86_64-cuda.AppImage |
GPU-accelerated dewarp — slim CUDA runtime bundled (no source/--extra cuda install needed). Needs an NVIDIA driver + FUSE (fuse2); falls back to CPU without a GPU. |
Install from the command line
CLI-only or scripted setups (any platform) install via uv or pip:
# from PyPI — installs the `aglaia` command
pip install aglaia # lean base: headless pipeline, no Qt
pip install "aglaia[gui]" # Windows / Linux GUI (Qt)
pip install "aglaia[gui,macos]" # macOS GUI: Vision, Speech, MLX dewarp
# or build from source with the extras you want
git clone https://github.com/yb85/aglaia.git && cd aglaia
uv sync --extra gui --extra macos # macOS GUI
uv sync --extra gui # Windows / Linux GUI
uv sync # headless: CLI pipeline, no Qt
Optional extras: --extra surya / --extra paddle (OCR engines),
--extra voice (Vosk), --extra cloud (Mistral), --extra cuda (NVIDIA
GPU dewarp on Linux).
First run (CLI-only installs): run the one-time setup to pick and download the offline models, seed the default pipelines, and bootstrap the config:
aglaia --setup # interactive: choose models (DBnet, EAST, Surya…), download, configure
This is the terminal equivalent of the GUI's first-run wizard. A headless batch
run refuses to start until the install is configured (--setup or the GUI).
The GUI installs run the wizard automatically on first launch.
[!WARNING] Build with the right options The
--extraoptions are mandatory to interface models and backends with python. If you download the models or install cuda drivers on your computer but forget to include teh relevant extra options, they won't be used
Usage
# Capture GUI (webcam + processing chain + voice control)
uv run aglaia ~/scans/my-book # or just `aglaia …` once installed
# Headless CLI batch — same chain, no Qt
uv run aglaia ~/scans/my-book.agl --headless -p aglaia/config/pipelines/book_curved_x2.yaml
Key flags: --setup (first-run config), -c/--config, -p/--pipeline,
--workers, --export, --ocr, --input-dpi, --headless,
--camera-id. The import panel accepts multiple images and PDFs (per-page
extract or render). Page detection defaults to DBnet (auto resolves
DBnet → Apple Vision on macOS → EAST); aglaia --setup and the in-app
downloader fetch the models, or drop them into ./model/ / ./models/.
For the full guide, see the documentation.
How It Works
capture → DPI fix → deskew → layout detect → keystone → dewarp → binarize → OCR → export
Every step is a pluggable processor defined in a YAML pipeline. Add your
own by dropping aglaia/processors/<NewProc>.py (the registry auto-discovers
it) — or, at runtime, drop a .py into <APP_DATA>/plugins/ and approve
it in the trust prompt. See
Architecture and
Processors.
Contributing
Work is tracked via GitHub issues + milestones — one issue per discrete
unit of work. Before non-trivial work, open an issue. Branch names
reference it (feat/123-slug); PRs close via Closes #N.
- Fork the project
- Create your branch (
git checkout -b feat/123-amazing-feature) - Make changes; keep
ruff,mypy --strict, andpytestgreen - Commit (
git commit -m 'feat: add amazing feature') - Push and open a Pull Request
License
Source-available under the PolyForm Shield License 1.0.0
— see LICENSE.
[!WARNING] This is not strictly-speaking "open-source" !
the reason is not to make it one day a commercial product, but to avoid trivial SaaS repackaging which hurts the developpment of free Apps.
You may use, modify, and redistribute the software for any purpose except building a product that competes with it. Otherwise free.
Repackaging it and removing the donation link is direct competition.
Contact
Project: github.com/yb85/aglaia · Website: aglaia.bibli.cc
Acknowledgments
See ABOUT page.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aglaia-0.1.0a6.tar.gz.
File metadata
- Download URL: aglaia-0.1.0a6.tar.gz
- Upload date:
- Size: 15.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a970e828d2a5da360284025293fd8cd9a2fb1ace30f0b03aa27bcd3cb81634bc
|
|
| MD5 |
609928c3c2583b6f53016671b365b38d
|
|
| BLAKE2b-256 |
34c468b765d153b5e9d60b22480690a9c7f7c6ef4f8abe891a2fa035b8020ce5
|
File details
Details for the file aglaia-0.1.0a6-py3-none-any.whl.
File metadata
- Download URL: aglaia-0.1.0a6-py3-none-any.whl
- Upload date:
- Size: 15.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69f57a6c7025570301c549531adc7eea0a0c87aaa58dd87c20455b8b7c76dc06
|
|
| MD5 |
759743f26060ae3b9b556e7ea57e4045
|
|
| BLAKE2b-256 |
69c6fbb1fb8f860e56e98b28816b7fbef02c276854eb504b50cbf8f9868eae55
|