Convert HTML to Markdown using Regex, BeautifulSoup4, and filter useless content with Jina Embeddings.
Reason this release was yanked:
unsure if stable. 0.1.2 is *definitely* stable, plus more efficient over version prior to it
Project description
Convert and Format HTML to Markdown
Purpose
This module provides functionality for converting HTML to Markdown and formatting a dataset of HTML content into structured Markdown, with added capabilities of processing text embeddings to identify and remove redundant content.
Installation & Setup
- No API keys required
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for conv_html_to_markdown-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2dfd7ade350d5dce7e7e179b316b8da7d0a1a7d75eb2dbd9cb37c0330b06cba5 |
|
MD5 | 747bdbdd9be30cf33a3d175705486fcc |
|
BLAKE2b-256 | 77c1f210a4f63e39bc91d02d7054a9a8de853de49c90dcef5fe2719cc22a6a9e |
Close
Hashes for conv_html_to_markdown-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bf4c3264b32e23198cb3ad493cbf4744c40394ca48066bf8a5aa437c48d4d61 |
|
MD5 | 472063a8a2f05d5adfe59177864ef417 |
|
BLAKE2b-256 | f920eeb060a4f589a16d18e3e04f16e4b8b646159b6b879fa83f42ec88310f6a |