Skip to main content

Python-dsl code converter to html parser for web scraping

Project description

Selector Schema codegen

Experimental PoC implementation of a code generator based on KDL2.0 syntax DSL.

install

From git (requires C/C++ compiler)

# Clone with submodules
git clone --recursive https://github.com/vypivshiy/selector_schema_codegen
cd selector_schema_codegen

# Install with pip (builds tree-sitter-kdl extension)
pip install .

Requirements:

  • Linux/macOS: gcc or clang
  • Windows: MSVC (Visual Studio Build Tools or Visual Studio)

Via uv tool

uv tool install git+https://github.com/vypivshiy/selector_schema_codegen@features-kdl

usage

generate modules

ssc-gen generate examples/ -t js-pure -o .

lint syntax

ssc-gen check examples/

test schema by html output

from file:

python main.py run .\examples\booksToScrape.kdl:MainCatalogue -t py-bs4 -i index.html

from stdin:

curl https://books.toscrape.com/ | python main.py run .\examples\booksToScrape.kdl:MainCatalogue -t py-bs4

test selectors:

from file

python main.py health .\examples\booksToScrape.kdl:MainCatalogue -i index.html

from stdin

curl https://books.toscrape.com/catalogue/page-2.html | python main.py health .\examples\booksToScrape.kdl:MainCatalogue

syntax

see docs and examples how to use syntax

LLM generate dsl config (experimental, not ready)

prompt

use SYSTEM_PROMPT for use in API pipelines or chats. before generate, call ssc-gen check [FILES...] -f json liner and send errors output if exists

skill

use kdl-schema-dsl for generate config

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssc_codegen-0.17.0a0.tar.gz (120.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ssc_codegen-0.17.0a0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (124.3 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

File details

Details for the file ssc_codegen-0.17.0a0.tar.gz.

File metadata

  • Download URL: ssc_codegen-0.17.0a0.tar.gz
  • Upload date:
  • Size: 120.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"CachyOS Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ssc_codegen-0.17.0a0.tar.gz
Algorithm Hash digest
SHA256 b7e5de9af6e170147999fa6b6d749c7a4d605128a509f9d73222ec620839fea1
MD5 086551447fd9f777f9eab1bad322c6a6
BLAKE2b-256 b04d5ce251c67279bf79a4ab6220327a1a6bbea5852f9a86ff2899fdeabfb731

See more details on using hashes here.

File details

Details for the file ssc_codegen-0.17.0a0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

  • Download URL: ssc_codegen-0.17.0a0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
  • Upload date:
  • Size: 124.3 kB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"CachyOS Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ssc_codegen-0.17.0a0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 776a55b05eb376903d29874d82b8d4d9e25ca140edec0ccc914ebbe167496cfa
MD5 3ec3254c3db09be9fa6a99bd4ab29eec
BLAKE2b-256 2354f851471cd5a5a5a1aeac039b6c6bfb99a2bdfac773f44f832d0019a766fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page