EverAlgo parser: multimodal raw-file parsing (image / audio / document / video / url) into ParsedContent.
Project description
everalgo-parser
Multimodal parsing — image / audio / document / video / url raw inputs into ParsedContent. Used by everalgo-knowledge for file ingestion and by evermem step 1 for inline parsing.
See the umbrella project: EverAlgo monorepo and the architecture document at docs/concepts/architecture.md.
Quick start
import everalgo
from everalgo.llm.config import LLMConfig
from everalgo.llm.providers.openai_compat import OpenAICompatClient
from everalgo.parser import aparse, RawFile
# Configure an LLM once (process-wide). The parser uses OpenAI-compatible
# clients; OpenRouter is the reference deployment (Gemini multimodal via
# OpenRouter passthrough).
everalgo.configure(OpenAICompatClient(LLMConfig(
model="google/gemini-3-flash-preview",
api_key="sk-or-v1-...",
base_url="https://openrouter.ai/api/v1",
)))
# Bytes-in: caller already hydrated the file.
parsed = await aparse(RawFile(content=pdf_bytes, extension="pdf"))
print(parsed.text)
# URL-in: parser fetches over HTTP, then delegates to the HTML handler.
parsed = await aparse(RawFile(uri="https://example.com/article"))
print(parsed.metadata["title"], parsed.text[:500])
Supported formats
| Modality | Extensions | Backend |
|---|---|---|
PDF |
pdf |
Multimodal LLM (single call, full doc) |
IMAGE |
png / jpg / jpeg / webp / bmp / tiff / tif / svg |
Multimodal LLM; BMP/TIFF transcoded to PNG via Pillow; SVG rasterised via cairosvg; tall screenshots split + merged |
AUDIO |
mp3 / wav / m4a / amr / aiff / aac / ogg / flac |
Multimodal LLM ASR |
HTML |
html / htm |
bs4 cleanup → LLM extraction |
EMAIL |
eml |
stdlib email + inline-image OCR via the LLM |
DOCUMENT |
docx / pptx / xlsx / doc / ppt / xls / pages / key / numbers / odt / ods / odp / rtf |
LibreOffice soffice --convert-to pdf → reuse PDF path |
URL |
(any http/https URI) |
httpx fetch → HTML handler |
DIRECT |
txt / md / csv / tsv / vtt |
UTF-8 decode, no LLM |
VIDEO |
— | Deferred (no upstream implementation; ADR pending) |
Installation
pip install everalgo-parser # core: pdf / image / audio / html / eml / direct / url
pip install 'everalgo-parser[svg]' # adds SVG support (cairosvg)
System dependency for Office documents
Office document parsing (docx / xlsx / pptx / …) shells out to LibreOffice, which is a system package, not a pip wheel. Install before parsing Office files:
# Ubuntu / Debian
sudo apt-get install -y libreoffice
# macOS
brew install --cask libreoffice
The parser detects soffice via shutil.which("soffice") and the canonical macOS Applications path. Missing → RuntimeError with install instructions when an Office file is parsed; non-Office paths are unaffected.
Conventions
aparse(...)is async;parse(...)is the sync bridge viaasgiref.async_to_sync.- Prompts live as module-level string constants under
prompts/{en,zh}/<operator>.py(AGENTS.md §5). Swap languages by re-binding the constant at startup. - The library is stateless: it never reads the filesystem and never owns business state. HTTP I/O (LLM calls, URL fetching) is explicitly allowed.
- No retry / fallback / metrics inside operators — surface failures via
LLMError, let the caller wrap.
Reference
- Architecture (definitive):
docs/concepts/architecture.md - Schema source for PDF / image / audio / document / html / email:
evermemos-multimodal(tagprod-20260306-0331-v1). - Schema source for URL metadata extraction:
evermemos-opensource/src/common_utils/url_extractor.py.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file everalgo_parser-0.1.0.tar.gz.
File metadata
- Download URL: everalgo_parser-0.1.0.tar.gz
- Upload date:
- Size: 681.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0567400d74d757a90218ce52cb3a8f6546503c4b5035b2cdba62fbc1677b622
|
|
| MD5 |
db18976cdce9ee70867cce6faa613a1c
|
|
| BLAKE2b-256 |
5275ee067b89a28c706a7fbdfe4f49dc4ffa9757f191ca3045ce3fe04a1e7361
|
File details
Details for the file everalgo_parser-0.1.0-py3-none-any.whl.
File metadata
- Download URL: everalgo_parser-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00bde61dd7592510f654247a89aaaaa6329cd03431660622113ca728510aae8a
|
|
| MD5 |
6ab8b4aa1878b07245c161a4d180d1c9
|
|
| BLAKE2b-256 |
ffa88b5c268e9d38556322d0e45aec80c3b298e670c09707d2a2034f23bf0b71
|