Universal asset memorizer: scrape URLs, memorize images/text/code/video as A-Z unlimited assets

These details have not been verified by PyPI

Project links

Project description

aax 🗂️

Universal Asset Memorizer — scrape images, text, code, video, and data from any URL and store them as A–Z unlimited labeled assets.

pip install aax

What is aax?

aax is a Python library that memorizes everything on a URL:

Asset Type	What it captures
🖼️ IMAGE	PNG, JPEG, WebP, GIF, SVG, AVIF, BMP, TIFF …
📝 TEXT	Paragraphs, headings, lists, captions, blockquotes
💻 CODE	`<code>`, `<pre>`, inline snippets
🎥 VIDEO	`<video>`, iframes, embedded players
🎵 AUDIO	`<audio>`, podcast feeds
📊 DATA	JSON-LD, meta tags, HTML tables → structured JSON
🔗 LINK	All hyperlinks with anchor text
📄 DOC	PDF, DOCX, XLSX, PPTX linked files

Every asset gets a unique A–Z unlimited label (A, B, C … Z, AA, AB … ∞).

Architecture

Built on three pillars:

aax/
├── core/          ← Memorizer engine + AssetSession
│   ├── memorizer  ← scrapes URLs, extracts all assets
│   ├── session    ← A-Z labeled container for results
│   └── types      ← Asset dataclass, AssetKind enum, index_label()
│
├── vision/        ← URL Vision Checker (inspired by torchvision / vision-main)
│   └── checker    ← image size, dominant colors, webpage meta
│
├── image/         ← Image processing (inspired by image-rs / image-main)
│   └── processor  ← load/transform/save images from URL or file
│
├── data/          ← Structured data builder (inspired by serde_json / json-master)
│   └── builder    ← serialize to JSON/JSONL/CSV/SQLite, full-text search
│
└── storage/       ← Persistent A-Z vault
    └── vault      ← disk-backed long-term asset memory

Quick Start

1. Memorize a URL

import aax

# Scrape everything from Wikipedia's main page
session = aax.memorize("https://id.wikipedia.org/wiki/Halaman_Utama")

print(session.summary())
# ━━━ aax AssetSession ━━━
#   URL   : https://id.wikipedia.org/wiki/Halaman_Utama
#   Total : 847 assets (labeled A–ZH)
#   text  : 312
#   link  : 289
#   image : 143
#   data  : 78
#   ...

# Access by label
first_image = session["A"]      # not always image — first scraped asset
print(first_image.kind)         # AssetKind.TEXT / IMAGE / etc.

# Typed views
for img in session.images:
    print(img.label, img.src)

for text in session.texts:
    print(text.label, text.content[:80])

# Search content
results = session.search("Indonesia")
print(f"{len(results)} assets mention 'Indonesia'")

# Save everything to disk
session.save("./wiki_assets", download_images=True)

2. Check URL Vision

import aax

# What's in this URL?
v = aax.vision("https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Flag_of_Indonesia.svg/320px-Flag_of_Indonesia.svg.png")
print(v.describe())
# [aax.vision] https://upload.wikimedia.org/...
#   Type     : image/png
#   Kind     : Image (PNG)
#   Size     : 320×213 px  (ratio 1.5023)
#   FileSize : 3.2 KB
#   Colors   : #ce1126, #ffffff, #f5f5f5, #d4d4d4, #e8e8e8

print(v.dominant_colors)   # ['#ce1126', '#ffffff', ...]
print(v.size)              # (320, 213)
print(v.is_image)          # True

# Webpage vision
vw = aax.vision("https://id.wikipedia.org/wiki/Pemerintahan_Nasional_Pertama")
print(vw.describe())
# [aax.vision] https://id.wikipedia.org/...
#   Type     : text/html
#   Kind     : Webpage (HTML)
#   Title    : Pemerintahan Nasional Pertama – Wikipedia ...
#   Desc     : Pemerintahan Nasional Pertama adalah...

3. Process Images

from aax.image import ImageProcessor

ip = ImageProcessor()

# Load from URL → transform → save
(ip.from_url("https://upload.wikimedia.org/wikipedia/commons/thumb/...")
   .resize(640, 480)
   .grayscale()
   .blur(1.5)
   .save("processed.jpg"))

# Batch download
handles = ip.batch_from_urls([
    "https://example.com/img1.jpg",
    "https://example.com/img2.png",
])
for h in handles:
    h.thumbnail(256).save(f"thumb_{h.source.split('/')[-1]}")

# Get image info
h = ip.from_url("https://...")
print(h.info())
# {'format': 'JPEG', 'size': (1920, 1080), 'mode': 'RGB', ...}

4. Build Structured Data

from aax.data import DataBuilder
import aax

session = aax.memorize("https://id.wikipedia.org/wiki/Pemerintahan_Nasional_Pertama")

db = DataBuilder()
db.ingest(session)

# Export formats
db.to_json("assets.json")                      # full JSON
db.to_jsonl("assets.jsonl")                    # one record per line
db.to_csv("texts.csv", kind="text")            # CSV of text assets
db.to_sqlite("assets.db")                      # SQLite with FTS

# Search
results = db.query("kabinet")
for r in results:
    print(r["label"], r["content_text"][:60])

# Reload from disk
db2 = DataBuilder.from_json("assets.json")
db3 = DataBuilder.from_sqlite("assets.db")

5. Persistent Vault

from aax.storage import AssetVault
import aax

vault = AssetVault("./my_vault")

# Store sessions from multiple URLs
urls = [
    "https://id.wikipedia.org/wiki/Halaman_Utama",
    "https://id.wikipedia.org/wiki/Pemerintahan_Nasional_Pertama",
]
for url in urls:
    session = aax.memorize(url)
    stored = vault.store(session)
    print(f"Stored {stored} assets from {url}")

# Retrieve
asset = vault.get("A")
content = vault.get_content("B")

# Query
images = vault.list_by_kind("image")
indonesia_assets = vault.search("Indonesia")

# Stats
print(vault.stats())
# {'total': 1694, 'labels': 'A … ZZH', 'disk_bytes': 4_200_000, ...}

CLI Usage

# Memorize a URL
aax memorize https://id.wikipedia.org/wiki/Halaman_Utama --out ./assets

# Only scrape images and text
aax memorize https://id.wikipedia.org/wiki/Halaman_Utama --kinds IMAGE,TEXT

# Download images too
aax memorize https://example.com --download-images

# Follow internal links (depth 2)
aax memorize https://example.com --follow-links --depth 2

# Vision check
aax vision https://example.com/image.png
aax vision https://id.wikipedia.org/wiki/Halaman_Utama --json

# Vault management
aax vault ./my_vault stats
aax vault ./my_vault list --kind image --limit 20

Asset Labels: A–Z Unlimited

Assets are labeled like Excel columns — never runs out:

A, B, C, … Z,
AA, AB, AC, … AZ,
BA, BB, … ZZ,
AAA, AAB, … ∞

from aax.core.types import index_label

index_label(0)    # 'A'
index_label(25)   # 'Z'
index_label(26)   # 'AA'
index_label(701)  # 'ZZ'
index_label(702)  # 'AAA'

Filtering Asset Kinds

from aax.core.types import AssetKind

session = aax.memorize(url, kinds=[AssetKind.IMAGE, AssetKind.TEXT])

Available kinds: IMAGE TEXT CODE VIDEO AUDIO DATA LINK DOC FONT STYLE SCRIPT ICON IFRAME UNKNOWN

Advanced: Multi-URL Scrape

import aax
from aax.data import DataBuilder
from aax.storage import AssetVault

urls = [
    "https://id.wikipedia.org/wiki/Halaman_Utama",
    "https://id.wikipedia.org/wiki/Pemerintahan_Nasional_Pertama",
]

vault = AssetVault("./vault")
db    = DataBuilder()

for url in urls:
    print(f"Memorizing {url}")
    session = aax.memorize(url, verbose=True)
    vault.store(session)
    db.ingest(session)
    print(session.summary())

# Export everything
db.to_sqlite("all_assets.db")
print(f"\nVault total: {len(vault)} assets")
print(vault.stats())

Dependencies

Core (always installed):

requests, aiohttp, httpx — HTTP
beautifulsoup4, lxml — HTML parsing
Pillow — image processing
rich, click, tqdm — CLI/output

Optional (pip install aax[vision]):

torch, torchvision — deep vision models
transformers — image captioning, classification
opencv-python — advanced image ops

Full (pip install aax[full]):

All vision deps + yt-dlp, pytesseract, pdf2image

Inspired By

Library	Role in aax
`serde_json` (json-master)	Structured data serialization, JSON A-Z asset records
`image` (image-main)	Image format support, decoding/encoding pipeline
`torchvision` (vision-main)	URL-based image loading, transform pipelines, vision checking

License

MIT © aax

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

May 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aax_vision_lib-1.0.0.tar.gz (32.5 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aax_vision_lib-1.0.0-py3-none-any.whl (31.3 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file aax_vision_lib-1.0.0.tar.gz.

File metadata

Download URL: aax_vision_lib-1.0.0.tar.gz
Upload date: May 26, 2026
Size: 32.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for aax_vision_lib-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`c552416bfef109aba28b06a6675169c224d85f8317c286556a30f0f26a14883e`
MD5	`41619100432a024f211694d041e63b05`
BLAKE2b-256	`faf49876de2d71ee6707e0832efe96a09ffcb23de58cebf7bb6ad50fe1147306`

See more details on using hashes here.

File details

Details for the file aax_vision_lib-1.0.0-py3-none-any.whl.

File metadata

Download URL: aax_vision_lib-1.0.0-py3-none-any.whl
Upload date: May 26, 2026
Size: 31.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for aax_vision_lib-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`91e7c46b1b641ded4320b46794784a8bf540f01c638cac81702b856243dfd3e9`
MD5	`a2d0994d847647c520fba07b4cc8ba1f`
BLAKE2b-256	`063323186d8d2ad28ca00f528461bca04f872b3a1c67671cfa4aa6c1b8d171e4`

See more details on using hashes here.

aax-vision-lib 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

aax 🗂️

What is aax?

Architecture

Quick Start

1. Memorize a URL

2. Check URL Vision

3. Process Images

4. Build Structured Data

5. Persistent Vault

CLI Usage

Asset Labels: A–Z Unlimited

Filtering Asset Kinds

Advanced: Multi-URL Scrape

Dependencies

Inspired By

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes