Skip to main content

LangChain retriever for Sourcey-generated documentation sites.

Project description

langchain-sourcey

langchain-sourcey is a LangChain retriever for Sourcey-generated documentation sites.

It works against Sourcey's public build artefacts instead of a private hosted API:

  • search-index.json for candidate discovery
  • llms-full.txt for full-page content hydration
  • canonical page URLs for citations

Install

pip install langchain-sourcey

Usage

from langchain_sourcey import SourceyRetriever

retriever = SourceyRetriever(
    site_url="https://docs.example.com/reference",
    top_k=4,
)

docs = retriever.invoke("How does search work?")

for doc in docs:
    print(doc.metadata["source"])
    print(doc.page_content[:160])

The site_url should point at the root of a published Sourcey docs build. The retriever fetches search-index.json and llms-full.txt from that root.

Output requirements

For best results, the Sourcey site should:

  • publish search-index.json
  • publish llms-full.txt
  • set siteUrl in sourcey.config.ts so citations are canonical

If llms-full.txt is not available, the retriever falls back to extracting plain text from the matched HTML page.

Scope

This package currently ships SourceyRetriever only. A document loader is intentionally out of scope until the retriever proves its usage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_sourcey-0.1.1.tar.gz (30.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_sourcey-0.1.1-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file langchain_sourcey-0.1.1.tar.gz.

File metadata

  • Download URL: langchain_sourcey-0.1.1.tar.gz
  • Upload date:
  • Size: 30.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for langchain_sourcey-0.1.1.tar.gz
Algorithm Hash digest
SHA256 63c63f2e415981dcb02f50d0ecc67c56ec1c47647c3486b882e15fb7ef767b18
MD5 04715484aa7f1e3f654c646973c21aa4
BLAKE2b-256 7486a82e18978186ee23e04da3ae25f9b06de73fd7a30c9964ca89ae9e379644

See more details on using hashes here.

File details

Details for the file langchain_sourcey-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_sourcey-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 93104792ffad6c386172c5772471985e1ad9b3c068ba36172170517ba520511f
MD5 f6bb659227f5b2695a01d6b4159e97d6
BLAKE2b-256 8802380e072b6ffd86331ac2c003a112ee125987249e3565d8ed70cb0a076552

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page