Skip to main content

LangChain retriever for Sourcey-generated documentation sites.

Project description

langchain-sourcey

langchain-sourcey is a LangChain retriever for Sourcey-generated documentation sites.

It works against Sourcey's public build artefacts instead of a private hosted API:

  • search-index.json for candidate discovery
  • llms-full.txt for full-page content hydration
  • canonical page URLs for citations

Install

pip install langchain-sourcey

Usage

from langchain_sourcey import SourceyRetriever

retriever = SourceyRetriever(
    site_url="https://docs.example.com/reference",
    top_k=4,
)

docs = retriever.invoke("How does search work?")

for doc in docs:
    print(doc.metadata["source"])
    print(doc.page_content[:160])

The site_url should point at the root of a published Sourcey docs build. The retriever fetches search-index.json and llms-full.txt from that root.

Output requirements

For best results, the Sourcey site should:

  • publish search-index.json
  • publish llms-full.txt
  • set siteUrl in sourcey.config.ts so citations are canonical

If llms-full.txt is not available, the retriever falls back to extracting plain text from the matched HTML page.

Scope

This package currently ships SourceyRetriever only. A document loader is intentionally out of scope until the retriever proves its usage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_sourcey-0.1.0.tar.gz (30.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_sourcey-0.1.0-py3-none-any.whl (30.0 kB view details)

Uploaded Python 3

File details

Details for the file langchain_sourcey-0.1.0.tar.gz.

File metadata

  • Download URL: langchain_sourcey-0.1.0.tar.gz
  • Upload date:
  • Size: 30.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for langchain_sourcey-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fceb23206006002f74c26da5152dfd458b030ba243aac6f62546ff12dc17f8b1
MD5 87aa21b63a0d07e24e1cba238252f319
BLAKE2b-256 952bd9a29cf4e2f83843fcec916f439a86ebb89dd9a2de30dff0f196194e6d39

See more details on using hashes here.

File details

Details for the file langchain_sourcey-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_sourcey-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aaef1d2e2a15ea5aa50dc80e055d152900da41bb6b4130a00b2ff82010b1ff18
MD5 4e347ae713dac666adee45a6758342d1
BLAKE2b-256 3df6677e06691adf2b64cecbc8a2f1b274e9802639ac71f51b459acd9e2966a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page