智能网页解析代码生成器 - 基于 AI 自动生成网页解析代码
Project description
web2json-agent
Let AI automatically generate web parsing code, say goodbye to manual XPath and CSS selectors, easily get structured data
💡 Project Introduction
web2json-agent is an intelligent data parsing tool that can automatically analyze web page structure and generate high-quality Python parser code with automatic data parsing, saving 80% of development time, from hours to minutes!
📋 Video Demo
https://github.com/user-attachments/assets/772fb610-808e-431d-93b3-d16ca0775b3f
📊 SWDE Benchmark Results
Evaluated on the SWDE dataset (8 verticals, 80 websites, 124,291 pages):
| Metric | Score |
|---|---|
| Average Precision | 91.50% |
| Average Recall | 90.46% |
| Average F1 Score | 89.93% |
🚀 Quick Start
Install via pip
# 1. Install package
pip install web2json-agent
# 2. Initialize configuration
web2json setup
# Mode 1: Auto mode (auto) - Quick exploration, unsure which fields to extract
web2json -d html_samples/ -o output/result
# Mode 2: Predefined mode (predefined) - Know exactly which fields to extract, need precise output control
web2json -d html_samples/ -o output/result --interactive-schema
🎨 Web UI Frontend Interface
The project provides a visual Web UI interface for convenient browser-based operations.
Installation and Launch
# Enter frontend directory
cd web2json_ui/
# Install dependencies
npm install
# Start development server
npm run dev
# Or build production version
npm run build
📄 License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file web2json_agent-1.1.1.tar.gz.
File metadata
- Download URL: web2json_agent-1.1.1.tar.gz
- Upload date:
- Size: 82.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74754fc5bc8cac993a47b2e22ab140f56c4173c438a46bcde5e8ec6ddb6f88bb
|
|
| MD5 |
0c99a2205751bae6b8db006eeb46cc3a
|
|
| BLAKE2b-256 |
856ef1f0378ed9d6486aaf36ef232a1a5a6712dacf6d6c9c6ca1ef043214ec94
|
File details
Details for the file web2json_agent-1.1.1-py3-none-any.whl.
File metadata
- Download URL: web2json_agent-1.1.1-py3-none-any.whl
- Upload date:
- Size: 102.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ab1185d9870d9394af647b631ec0fe4ec7f023d74e5c77d6a235e8bc8bf1b18
|
|
| MD5 |
ffd7ade7086c5350f9c46fc26545b81e
|
|
| BLAKE2b-256 |
0c923a308e6761a5311b6150c29bf1e41edcb63f532f3a032cbae1f05b76fd28
|