Skip to main content

Dead-simple web data extraction. a = weweb(url), then a.images, a.links, a.text...

Project description

weweb 🕷️

Dead-simple web data extraction.
No boilerplate. Just point at a URL and pull what you need.

pip install weweb

Usage

from weweb import weweb

a = weweb("https://example.com")

a.images     # all images        → [{src, alt}, ...]
a.links      # all links         → [{url, text}, ...]
a.text       # all text blocks   → [{text}, ...]
a.headings   # h1–h6 headings    → [{level, text}, ...]
a.tables     # html tables       → [{col: value}, ...]
a.meta       # title, og tags    → [{title, description, ...}]
a.emails     # emails on page    → [{email}, ...]
a.phones     # phone numbers     → [{phone}, ...]

Export in one line

a.images.to_csv("images.csv")
a.links.to_json("links.json")
a.tables.to_db("data.db", table="products")

Custom selector

a.find("article h2")
a.find(".product-card", attrs=["class", "data-id"])

Chain exports

a.links.to_csv("links.csv").to_json("links.json")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weweb-0.1.0.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

weweb-0.1.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file weweb-0.1.0.tar.gz.

File metadata

  • Download URL: weweb-0.1.0.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for weweb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 05a999c56debe728c7c06febb660a3e5c5993ea8a8881c5bb7645b4a07b0af8e
MD5 5408dfe8c3b0c482ccaff786e9a8f5ff
BLAKE2b-256 86530776614e703b136bb637fe7b83ec2961148cec6b14666759dac77fb94345

See more details on using hashes here.

File details

Details for the file weweb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: weweb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for weweb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 872a56736a0dcbb5e336641aa5297299075242d486564fe08a70fe0b696f48b2
MD5 1062acbc6b9db3b5491725e7c91864cf
BLAKE2b-256 5757186ea817825b6d88c0e11429488e04409345fa6c259ca562daa422ad1d0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page