Skip to main content

langchain-pull-md is a Python package that extends LangChain with a loader to convert URLs into Markdown. It addresses the challenge of extracting content from JavaScript-rendered pages, like those built with React, Angular, or Vue.js, by utilizing the pull.md service. This approach optimizes resource usage and ensures efficient, reliable Markdown conversion directly from URLs.

Project description

PyPI version License: Apache 2.0 Downloads LinkedIn

langchain-pull-md

Deprecated: This package is deprecated because the pull.md service was switched off on 25/12/2025.

langchain-pull-md is a Python package that extends LangChain by providing a markdown loader from URLs using the pull.md service. This package enables the fetching of fully rendered Markdown content, which is especially useful for web pages that utilize JavaScript frameworks such as React, Angular, and Vue.js.


Key Features

  • Convert URLs to Markdown directly, supporting pages rendered with JavaScript frameworks.
  • Efficiently fetch markdown without local server resource consumption using the external pull.md service.

Installation

To install the package, use:

pip install langchain-pull-md

Usage

Here’s how you can use the PullMdLoader from langchain-pull-md:

Basic Example

from langchain_pull_md import PullMdLoader

# Initialize using a URL
loader = PullMdLoader(url="http://example.com")

documents = loader.load()
print(documents)

Parameters

PullMdLoader Constructor

Parameter Type Default Description
url str None The URL to fetch and convert to Markdown.

Testing

To run the tests:

  1. Clone the repository:

    git clone https://github.com/chigwell/langchain-pull-md
    cd langchain-pull-md
    
  2. Install development dependencies:

    pip install -r requirements.txt
    
  3. Run the tests:

    pytest tests/test_markdown_loader.py
    

Contributing

Contributions are welcome! If you have ideas for new features or spot a bug, feel free to:

  • Open an issue on GitHub.
  • Submit a pull request.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.


Acknowledgements

  • LangChain for providing the base integration framework.
  • pull.md for enabling efficient Markdown extraction from dynamic web pages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_pull_md-0.1.2.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_pull_md-0.1.2-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file langchain_pull_md-0.1.2.tar.gz.

File metadata

  • Download URL: langchain_pull_md-0.1.2.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.14

File hashes

Hashes for langchain_pull_md-0.1.2.tar.gz
Algorithm Hash digest
SHA256 0b95b615799e33a2f9aa0e8435bcd0bd7ed46b6dc20b146ca33d4b6c3d6c2b19
MD5 e9070e2b2d8e2e75d802767635641356
BLAKE2b-256 e20de7292f794ebf7538e7999e6c1c395d7fdacf88fb03b7624574d2b802adb3

See more details on using hashes here.

File details

Details for the file langchain_pull_md-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_pull_md-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c859e944255afe7ff00512c5913662d24c9eccfc9272b67e4fbeb39918e775dd
MD5 cc5d2c3890a23e3081c53b81e943bdfd
BLAKE2b-256 4a68dbdab45780ab16c567f798d2557d84b026f6a9fa086279f1941b52ee9c20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page