Skip to main content

Convert HTML to Markdown using Regex, BeautifulSoup4, and filter useless content with Jina Embeddings.

Reason this release was yanked:

unsure if stable. 0.1.2 is *definitely* stable, plus more efficient over version prior to it

Project description

Convert and Format HTML to Markdown

Purpose

This module provides functionality for converting HTML to Markdown and formatting a dataset of HTML content into structured Markdown, with added capabilities of processing text embeddings to identify and remove redundant content.

Installation & Setup

  • No API keys required

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

conv_html_to_markdown-0.1.0.tar.gz (7.0 kB view hashes)

Uploaded Source

Built Distribution

conv_html_to_markdown-0.1.0-py3-none-any.whl (7.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page