Python-dsl code converter to html parser for web scraping
Project description
ssc-codegen
Code generator for web scraping parsers. Describe HTML extraction rules in a declarative KDL 2.0 DSL, then generate ready-to-use parser code for multiple languages and libraries.
.kdl schema --> [kdl parser] --> AST --> [linter] --> [converter] --> output code
Features
- Declarative DSL based on KDL 2.0 syntax
- Static type checking and linting before code generation
- Multiple output targets: Python (bs4, lxml, parsel, selectolax), JavaScript (DOM API)
- Struct types:
item,list,dict,table,flat - LLM-friendly: system prompt + linter loop for AI-assisted schema generation
Install
uv tool install ssc_codegen
Quick example
books.kdl:
struct Book type=list {
@split-doc { css-all ".product-card" }
title { css ".title"; text }
price { css ".price"; text; re #"(\d+\.\d+)"#; to-float }
url { css "a[href]"; attr "href"; fallback #null }
}
Generate Python parser:
ssc-gen generate books.kdl -t py-bs4 -o ./output
Usage
Generate code
# single file
ssc-gen generate schema.kdl -t py-bs4 -o ./output
# all .kdl files in a directory
ssc-gen generate examples/ -t js-pure -o ./output
# with custom package name (for Go and other targets)
ssc-gen generate schema.kdl -t go-goquery -o ./parsers --package scraper
Targets: py-bs4, py-lxml, py-parsel, py-slax, js-pure
Lint schemas
# human-readable output
ssc-gen check schema.kdl
# JSON output (for LLM pipelines)
ssc-gen check schema.kdl -f json
# check all files in a directory
ssc-gen check examples/
Test schema against HTML
# from file
ssc-gen run examples/booksToScrape.kdl:MainCatalogue -t py-bs4 -i page.html
# from stdin
curl https://books.toscrape.com/ | ssc-gen run examples/booksToScrape.kdl:MainCatalogue -t py-bs4
Health check (verify selectors match elements)
# from file
ssc-gen health examples/booksToScrape.kdl:MainCatalogue -i page.html
# from stdin
curl https://books.toscrape.com/ | ssc-gen health examples/booksToScrape.kdl:MainCatalogue
Documentation
- Quick start
- Syntax and file structure
- Type system
- Pipeline operations
- Predicates and logic
- JSON schemas and jsonify
- Transforms and dsl blocks
- LLM-compact reference -- full DSL spec in one file for LLM context
- Examples
LLM integration
LLM agents can generate and validate .kdl schemas automatically using the linter feedback loop.
In chats (ChatGPT, Claude, etc.)
Use SYSTEM_PROMPT.md as system prompt. After generation, run ssc-gen check -f json and send errors back to the LLM for correction.
In AI-powered IDEs (Claude Code, Cursor, etc.)
Use the kdl-schema-dsl skill for automatic generation, validation, and iteration.
Development
uv sync # install dependencies
uv build --wheel # build wheel
uv run pytest # run tests
uv run ruff check ssc_codegen/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ssc_codegen-0.17.1.tar.gz.
File metadata
- Download URL: ssc_codegen-0.17.1.tar.gz
- Upload date:
- Size: 98.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"CachyOS Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1a193209224bc709bf6c638162e321df807087b99b3fc353c88fe620c5a4bf8
|
|
| MD5 |
797697c9d97f6d3da27eec8dcad38b01
|
|
| BLAKE2b-256 |
a2def9185034ee907da88ff1682254a244d6ebd9070070866b65642d32c0d2e1
|
File details
Details for the file ssc_codegen-0.17.1-py3-none-any.whl.
File metadata
- Download URL: ssc_codegen-0.17.1-py3-none-any.whl
- Upload date:
- Size: 121.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"CachyOS Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9117aa3ad4c186c683d2f57311982fe7671ae42b5795e23df7dd3ea89339ec0
|
|
| MD5 |
3ff9c7aa4be4a1c6699f8a87da50fced
|
|
| BLAKE2b-256 |
a239ae8348c7b683a9162e338f928ebc8246a69fa156ad0a47b7afe864cc8679
|