CLI-first pipeline for discovering and cataloging coloring-style SVG and PNG assets.
Project description
Image-Scrapling
Public repository for discovering, evaluating, converting, and cataloging coloring-style image assets.
Repository name:
Image-Scrapling
Published package and import names:
- package:
svg-scrapling - import:
svg_scrapling
Bootstrap
Run the standard project commands from the repository root:
uv sync --group dev
uv run ruff check .
uv run ruff format --check .
uv run mypy src apps
uv run pytest
First Real CLI Run
The CLI now assembles a default runtime without manual dependency wiring in code.
Install dependencies first:
uv sync --group dev
If you want PNG-to-SVG conversion through VTracer, install the optional conversion extra:
uv sync --group dev --extra conversion
See the available commands:
uv run assets --help
Start with a real search run:
uv run assets find \
--query "tygryski do kolorowania" \
--count 10 \
--preferred-format svg \
--fallback-format png \
--convert-to svg \
--mode provenance_only \
--provider duckduckgo_html \
--run-id demo-run \
--output ./data/runs
This writes a deterministic run directory under ./data/runs/demo-run.
Useful operational flags:
--provider duckduckgo_htmlor--provider bing_htmlselects the preferred live discovery provider.- non-disabled providers are tried in ordered fallback after the preferred provider
--disable-provider ...explicitly blocks a provider for one run.--run-id ...resumes or reuses a stable run directory.--skip-existing-downloadsis enabled by default and reuses deterministic asset paths when possible.--fetch-strategy static_first|dynamic_on_failure|dynamic_onlycontrols fetch escalation.
Example with Bing as the preferred provider:
uv run assets find \
--query "tiger coloring page" \
--count 5 \
--preferred-format png \
--provider bing_html \
--disable-provider duckduckgo_html \
--output ./data/runs/live-bing
Example resume run:
uv run assets find \
--query "tiger coloring page" \
--count 5 \
--preferred-format png \
--run-id demo-run \
--output ./data/runs
Inspect a manifest after the run:
uv run assets inspect-manifest ./data/runs/demo-run/manifests/manifest.jsonl
Export CSV and Markdown reports:
uv run assets export-report \
./data/runs/demo-run/manifests/manifest.jsonl \
--csv-output ./data/runs/demo-run/manifests/report.csv \
--markdown-output ./data/runs/demo-run/manifests/report.md
Successful runs now write:
manifests/manifest.jsonlas the canonical machine-readable outputmanifests/summary.jsonandmanifests/summary.txtfor operator-facing summariesmanifests/rejected_candidates.jsonlfor fetch, extraction, policy, and download rejectionslogs/pipeline.logfor stage-level execution details
Dynamic Fetching
Static fetching is the default and should be preferred for normal runs.
If you have a Lightpanda-compatible wrapper, expose it through:
export SVG_SCRAPLING_LIGHTPANDA_CMD="/path/to/lightpanda-wrapper"
The wrapper must support:
<wrapper> fetch <url> <timeout_seconds>
and print JSON to stdout:
{"html":"<html>...</html>","final_url":"https://example.com/final"}
Then you can enable dynamic fallback:
uv run assets find \
--query "tiger coloring page" \
--count 5 \
--fetch-strategy dynamic_on_failure \
--output ./data/runs/dynamic-demo
Library Usage
Supported library entrypoints are exposed from stable package surfaces:
svg_scraplingsvg_scrapling.configsvg_scrapling.pipelinesvg_scrapling.runtime
Example:
from svg_scrapling import (
FetchStrategy,
FindAssetsConfig,
LicenseMode,
OutputFormat,
build_default_pipeline_dependencies,
run_find_assets,
)
config = FindAssetsConfig(
query="tiger coloring page",
count=5,
preferred_format=OutputFormat.SVG,
fallback_format=OutputFormat.PNG,
mode=LicenseMode.PROVENANCE_ONLY,
fetch_strategy=FetchStrategy.STATIC_FIRST,
)
result = run_find_assets(
config,
dependencies=build_default_pipeline_dependencies(config),
)
print(result.manifest_path)
Deep internal module imports outside those entrypoints should be treated as unstable.
Versioning
- distribution version comes from
project.versioninpyproject.toml - runtime version is read from the installed package metadata
- release tags should use the format
vX.Y.Z
Current Limitations
- Live discovery currently uses
duckduckgo_htmlandbing_html. - Static asset downloading now uses conservative provenance-aware request headers, but some hosts may still block media retrieval.
- Dynamic fetching still fails loudly when no Lightpanda-compatible client is configured.
- License handling stays conservative:
licensed_onlyrequires an explicit allowlist andprovenance_onlypreserves uncertain cases rather than silently allowing reuse. - The VTracer conversion backend is currently supported on Python
>=3.10,<3.14. - Raster-to-SVG conversion is optional and requires installing the
conversionextra.
Current Runtime Note
The Python 3.14 compatibility follow-up is tracked in GitHub issue #20.
Reproduction details for the current Python 3.14 blocker live in docs/vtracer-python-314.md.
Developer Workflow
Repository workflow, validation expectations, and run output conventions are documented in docs/developer-workflow.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file svg_scrapling-0.1.0.tar.gz.
File metadata
- Download URL: svg_scrapling-0.1.0.tar.gz
- Upload date:
- Size: 111.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f48ccab3d81ad838d2870cf1169e5af0a5fea5b996e9c023798fa8bf06b22c8
|
|
| MD5 |
1e12cc71af9569dd327033568f742116
|
|
| BLAKE2b-256 |
28f04507a95c8346eef91120031e5ff30bb8c40285b939ddb2a2fae342aad742
|
File details
Details for the file svg_scrapling-0.1.0-py3-none-any.whl.
File metadata
- Download URL: svg_scrapling-0.1.0-py3-none-any.whl
- Upload date:
- Size: 56.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92ece5fa84e32bf7f1a2cbe7a245cb9c0fcc528b17271e4253d39f254d5c449d
|
|
| MD5 |
7977492246605f9df8ffb09271de819b
|
|
| BLAKE2b-256 |
c5f8d8492ac12d19cd41fd8f5a377c50370b84987dee1cf866363ea90660849d
|