Native LangChain retriever for Sourcey-generated documentation sites.
Project description
langchain-sourcey
Your docs retriever should not depend on somebody else's SaaS either.
langchain-sourcey reads a published Sourcey docs site directly.
Sourcey already ships the files a retriever needs. This package uses them:
search-index.jsonfor candidate discoveryllms-full.txtfor full-page hydration- canonical page URLs for citations
If llms-full.txt is missing, it falls back to the matched page HTML.
Install
pip install -U langchain-sourcey
Point site_url at the root of a published Sourcey build:
https://sourcey.com/docshttps://sourcey.com/cheesestorehttps://cheesestore.github.io
Quickstart
from langchain_sourcey import SourceyRetriever
retriever = SourceyRetriever(
site_url="https://sourcey.com/docs",
top_k=3,
)
docs = retriever.invoke("mcp integration")
for doc in docs:
print(doc.metadata["title"])
print(doc.metadata["source"])
print(doc.page_content[:160])
print()
For a runnable script, see examples/live_quickstart.py.
Sourcey guide: https://sourcey.com/docs/guides/guide-langchain-retriever.html
Use In A LangChain Chain
Install a chat model integration of your choice. This example uses OpenAI:
pip install -U langchain-openai
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_sourcey import SourceyRetriever
retriever = SourceyRetriever(site_url="https://sourcey.com/docs", top_k=3)
prompt = ChatPromptTemplate.from_template(
"""Answer the question using the documentation context below.
{context}
Question: {question}"""
)
chain = (
RunnablePassthrough.assign(context=(lambda x: x["question"]) | retriever)
| prompt
| ChatOpenAI(model="gpt-4.1-mini")
| StrOutputParser()
)
answer = chain.invoke({"question": "How does Sourcey document MCP servers?"})
print(answer)
For a fuller example, see examples/rag_chain.py.
Sourcey Contract
This package assumes the published Sourcey site exposes:
- publish
search-index.json - publish
llms-full.txt - set
siteUrlinsourcey.config.tsso citations are canonical
search-index.json is required. llms-full.txt is strongly recommended because
it gives the retriever full page content instead of HTML-derived fallback text.
Returned Metadata
Each returned Document includes:
source: canonical page URL used for citationsmatched_url: original matched URL, including anchors when relevantmatched_title: matched search entry titletitle: hydrated page titlepath: Sourcey output path such asguides/search.htmlanchor: matched fragment, if anytab: Sourcey tab labelcategory: Sourcey search categorysite_url: docs root used for retrievalscore: retriever ranking score
Development
python -m pip install -e .[dev] build twine
PYTHONPATH=src pytest -q
SOURCEY_TEST_SITE_URL=https://sourcey.com/docs PYTHONPATH=src pytest tests/integration_tests/test_live_retriever.py -q
python -m build
python -m twine check dist/*
See CONTRIBUTING.md for the release and verification flow.
LangChain Submission Assets
This repo includes draft docs ready to turn into a LangChain docs PR:
Scope
This package ships SourceyRetriever only. No loader yet.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_sourcey-0.1.3.tar.gz.
File metadata
- Download URL: langchain_sourcey-0.1.3.tar.gz
- Upload date:
- Size: 35.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
389dcf753d3a08d6fdc0d127c3006397932392aa201e7f492ac96bee45585d34
|
|
| MD5 |
f0fcfcf83cd2b5b36151ed6ac347cf4f
|
|
| BLAKE2b-256 |
06f24394a7463ff94af5c61c2fa473f76d1e65d97284afe5bb04ff9c36d736ed
|
File details
Details for the file langchain_sourcey-0.1.3-py3-none-any.whl.
File metadata
- Download URL: langchain_sourcey-0.1.3-py3-none-any.whl
- Upload date:
- Size: 31.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f8f95f7121e0aa22ec53cbee8711412c098fd0724803ba0991270a272174ef8
|
|
| MD5 |
846f0c90ef5666bec0960ff8106ef314
|
|
| BLAKE2b-256 |
d56c2f639cd1858b8f3837717136c7e33d5d0926184edf6ac896363aa534238c
|