Skip to main content

langchain-pull-md is a Python package that extends LangChain with a loader to convert URLs into Markdown. It addresses the challenge of extracting content from JavaScript-rendered pages, like those built with React, Angular, or Vue.js, by utilizing the pull.md service. This approach optimizes resource usage and ensures efficient, reliable Markdown conversion directly from URLs.

Project description

PyPI version License: Apache 2.0 Downloads LinkedIn

langchain-pull-md

langchain-pull-md is a Python package that extends LangChain by providing a markdown loader from URLs using the pull.md service. This package enables the fetching of fully rendered Markdown content, which is especially useful for web pages that utilize JavaScript frameworks such as React, Angular, and Vue.js.


Key Features

  • Convert URLs to Markdown directly, supporting pages rendered with JavaScript frameworks.
  • Efficiently fetch markdown without local server resource consumption using the external pull.md service.

Installation

To install the package, use:

pip install langchain-pull-md

Usage

Here’s how you can use the PullMdLoader from langchain-pull-md:

Basic Example

from langchain_pull_md import PullMdLoader

# Initialize using a URL
loader = PullMdLoader(url="http://example.com")

documents = loader.load()
print(documents)

Parameters

PullMdLoader Constructor

Parameter Type Default Description
url str None The URL to fetch and convert to Markdown.

Testing

To run the tests:

  1. Clone the repository:

    git clone https://github.com/chigwell/langchain-pull-md
    cd langchain-pull-md
    
  2. Install development dependencies:

    pip install -r requirements.txt
    
  3. Run the tests:

    pytest tests/test_markdown_loader.py
    

Contributing

Contributions are welcome! If you have ideas for new features or spot a bug, feel free to:

  • Open an issue on GitHub.
  • Submit a pull request.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.


Acknowledgements

  • LangChain for providing the base integration framework.
  • pull.md for enabling efficient Markdown extraction from dynamic web pages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_pull_md-0.1.1.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

langchain_pull_md-0.1.1-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file langchain_pull_md-0.1.1.tar.gz.

File metadata

  • Download URL: langchain_pull_md-0.1.1.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.10 Darwin/23.6.0

File hashes

Hashes for langchain_pull_md-0.1.1.tar.gz
Algorithm Hash digest
SHA256 923a41567f23c3eb09c43d87974b9e17cdc61029264772a32963c304b31f1d69
MD5 05f4ffc52c3e7dc4726908ff4985a346
BLAKE2b-256 c85478dc4fddc2b506917991cc0648419d25281656465ced1c956f2337cf95e1

See more details on using hashes here.

File details

Details for the file langchain_pull_md-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: langchain_pull_md-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.10 Darwin/23.6.0

File hashes

Hashes for langchain_pull_md-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b4d3a1db962f461e8d50dcadd6cc74e236583f2b0865be91da1b5f12a7f85adb
MD5 a5ff710f6ac2e79e1abba5b5f51fc9a9
BLAKE2b-256 48dd1de19e8d9bc85710c3d1a95bd66a6cdb4d4c6d03b65d8fb9cc04bed9ede6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page