Skip to main content

A web crawler that converts web pages to markdown and prepares them for LLM consumption

Project description

TezzCrawler

TezzCrawler is a command-line tool for crawling entire websites and converting HTML files to Markdown. It’s designed for developers who need to feed structured content from a website into a language model or process it for other analytical tasks.

Features

  • Site-wide Crawling: Crawl all pages listed in a sitemap.
  • Single-page Scraping: Scrape and convert individual pages.
  • Markdown Conversion: Convert HTML pages to Markdown for easy ingestion by LLMs.
  • Proxy Support: Crawl sites using a proxy for added flexibility and access.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tezzcrawler-0.2.0.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

TezzCrawler-0.2.0-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file tezzcrawler-0.2.0.tar.gz.

File metadata

  • Download URL: tezzcrawler-0.2.0.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for tezzcrawler-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9ba75f3524665493cc0aa2cd4319f86760b5a67822585f3af832ac91c3e30762
MD5 447ee5d884603bde5f9e8beb7c3a2829
BLAKE2b-256 4eefeb1d976041611b76e80d5b3a61d56c33423d0b846a8eac894e0f3b2e6d07

See more details on using hashes here.

File details

Details for the file TezzCrawler-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: TezzCrawler-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for TezzCrawler-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af32150064db6ea4b327eaab0879490be5dafef7091f158c8aef384b99d02ead
MD5 876c3fe40f19c1ac0270ecf74c99f38e
BLAKE2b-256 64812f477282512f2486ce540ff1da2541806aa842cda975c1d14cd40a9ae511

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page