Skip to main content

Run URLs through the Trafilatura to get plain text

Project description

llm-fragments-site-text

A fragment loader for LLM that converts websites into markdown plaintext using Trafilatura.

PyPI Changelog Tests License

Installation

Install this plugin in the same environment as LLM:

llm install llm-fragments-site-text

Usage

Use -f 'site:URL' to fetch and convert a webpage to plaintext markdown with metadata. This plugin uses Trafilatura to extract clean text content and metadata from websites, and formats it as markdown.

Example:

llm -f 'site:https://example.com/article' "What is this article about?"

The output includes:

  • Site name (if available)
  • Title (if available)
  • Author (if available)
  • Date (if available)
  • Description (if available)
  • Main content (as markdown with links preserved)

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-fragments-site-text
python -m venv venv
source venv/bin/activate

Install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

Dependencies

  • httpx: For making HTTP requests
  • trafilatura: For extracting clean text and metadata from websites
  • llm: The LLM CLI tool this plugin extends

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_fragments_site_text-0.1.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_fragments_site_text-0.1-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file llm_fragments_site_text-0.1.tar.gz.

File metadata

  • Download URL: llm_fragments_site_text-0.1.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llm_fragments_site_text-0.1.tar.gz
Algorithm Hash digest
SHA256 86fd5fa6fed1746bee96089d95404a7f067f992f2bd3d9cef81306c79a361b23
MD5 46b5bcf9624ab3ff62f5b16be10889f2
BLAKE2b-256 0da8ca96c3b1ecc4761156edc918a6f54244cb113d6afb5aab13689f5f2fcf8d

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_fragments_site_text-0.1.tar.gz:

Publisher: publish.yml on daturkel/llm-fragments-site-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_fragments_site_text-0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_fragments_site_text-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6600e900383ba9f4b85309672f2495009b3c91974322270f0eac3a1b3b1b3491
MD5 7a6df610025ab12240eab3008f31523a
BLAKE2b-256 d57d39c90e0ca834864fb1d7d99565915466dd08540995bd31ede910e39467b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_fragments_site_text-0.1-py3-none-any.whl:

Publisher: publish.yml on daturkel/llm-fragments-site-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page