Skip to main content

A web crawler that converts web pages to markdown and prepares them for LLM consumption

Project description

TezzCrawler

TezzCrawler is a command-line tool for crawling entire websites and converting HTML files to Markdown. It’s designed for developers who need to feed structured content from a website into a language model or process it for other analytical tasks.

Features

  • Site-wide Crawling: Crawl all pages listed in a sitemap.
  • Single-page Scraping: Scrape and convert individual pages.
  • Markdown Conversion: Convert HTML pages to Markdown for easy ingestion by LLMs.
  • Proxy Support: Crawl sites using a proxy for added flexibility and access.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tezzcrawler-0.1.0.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

TezzCrawler-0.1.0-py3-none-any.whl (3.5 kB view details)

Uploaded Python 3

File details

Details for the file tezzcrawler-0.1.0.tar.gz.

File metadata

  • Download URL: tezzcrawler-0.1.0.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for tezzcrawler-0.1.0.tar.gz
Algorithm Hash digest
SHA256 51c3b9b6a2b84280f38c816a7b7f019de03b19330302c97a89d387e61acb4748
MD5 d840da526495528ef1aaa24ae3219810
BLAKE2b-256 f09b302351c6a72331f9ee05cdca10522afc87651c823fa6df7183c212994bda

See more details on using hashes here.

File details

Details for the file TezzCrawler-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: TezzCrawler-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for TezzCrawler-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a1dcc44c1448fb88dc3997d4ed439e886d5498ddd1177f43472ac29de840ed13
MD5 29b0858f9126a5b62464a05dd1268335
BLAKE2b-256 be726ba86eed3b2549f81cf67206a1a6b3f1a398a3a6ad51095211f7d50f20f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page