LangChain retriever for Ask AI over published Sourcey docs sites.
Project description
langchain-sourcey
Build your own Ask AI on top of a published Sourcey docs site.
langchain-sourcey is the retrieval layer behind that feature.
Sourcey already emits the files a retriever needs:
search-index.jsonfor candidate discoveryllms-full.txtfor full-page hydration- canonical page URLs for citations
No hosted index is required. Point site_url at the docs root and use it.
Install
pip install -U langchain-sourcey
Point site_url at the root of a published Sourcey build:
https://sourcey.com/docshttps://sourcey.com/cheesestorehttps://cheesestore.github.io
Quickstart
from langchain_sourcey import SourceyRetriever
retriever = SourceyRetriever(
site_url="https://sourcey.com/docs",
top_k=3,
)
docs = retriever.invoke("mcp integration")
for doc in docs:
print(doc.metadata["title"])
print(doc.metadata["source"])
print(doc.page_content[:160])
print()
For a runnable script, see examples/live_quickstart.py.
More context: https://sourcey.com/docs/guides/guide-langchain-retriever
Implement Ask AI
Install a chat model package. This example uses OpenAI:
pip install -U langchain-openai
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_sourcey import SourceyRetriever
retriever = SourceyRetriever(site_url="https://sourcey.com/docs", top_k=3)
prompt = ChatPromptTemplate.from_template(
"""Answer the question using the documentation context below.
{context}
Question: {question}"""
)
chain = (
RunnablePassthrough.assign(context=(lambda x: x["question"]) | retriever)
| prompt
| ChatOpenAI(model="gpt-4.1-mini")
| StrOutputParser()
)
answer = chain.invoke({"question": "How does Sourcey document MCP servers?"})
print(answer)
For a fuller example, see examples/rag_chain.py.
What Has To Exist
For clean retrieval, the published Sourcey site should expose:
- publish
search-index.json - publish
llms-full.txt - set
siteUrlinsourcey.config.tsso citations are canonical
search-index.json is required.
llms-full.txt is strongly recommended. If it is missing, the retriever falls
back to the matched page HTML.
Returned Metadata
Each returned Document includes:
source: canonical page URL used for citationsmatched_url: original matched URL, including anchors when relevantmatched_title: matched search entry titletitle: hydrated page titlepath: Sourcey output path such asguides/search.htmlanchor: matched fragment, if anytab: Sourcey tab labelcategory: Sourcey search categorysite_url: docs root used for retrievalscore: retriever ranking score
Development
python -m pip install -e .[dev] build twine
PYTHONPATH=src pytest -q
SOURCEY_TEST_SITE_URL=https://sourcey.com/docs PYTHONPATH=src pytest tests/integration_tests/test_live_retriever.py -q
python -m build
python -m twine check dist/*
See CONTRIBUTING.md for the release and verification flow.
LangChain Submission Assets
This repo includes draft docs ready to turn into a LangChain docs PR:
JavaScript Package
This repo also contains the JavaScript package in js.
- npm package:
langchain-sourcey - draft JS docs: docs/langchain-js/provider-sourcey.mdx
- draft JS docs: docs/langchain-js/retriever-sourcey.mdx
Scope
This package ships SourceyRetriever only. No loader yet.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_sourcey-0.1.4.tar.gz.
File metadata
- Download URL: langchain_sourcey-0.1.4.tar.gz
- Upload date:
- Size: 62.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae150a57ddacea7a529369349eaff5fd5fa28907007425389373528d8748aaba
|
|
| MD5 |
c48a3e89ba63bedd711e83bbfefd573e
|
|
| BLAKE2b-256 |
3e44dd82bb90b85a659d816151ee47c60d02dbcbe98586a10f3842082253082a
|
File details
Details for the file langchain_sourcey-0.1.4-py3-none-any.whl.
File metadata
- Download URL: langchain_sourcey-0.1.4-py3-none-any.whl
- Upload date:
- Size: 31.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32412b15c14eea157a3f675bce91d5d36138a216c2aec08b1d68f3e1be62c621
|
|
| MD5 |
54d1416885ddaf72919a4ece7de9ce0a
|
|
| BLAKE2b-256 |
4d8461029d0d7624966d9b7881a4ee57abf91a5e3e6c5f6039c34c2ff5a9cb64
|