Agentic RAG over consent-gated Web3 documentation โ search and answer developer questions from Ethereum, Solidity, and ecosystem docs.
Project description
๐ agentic-web3-rag
Semantic search and AI-assisted answers over consent-gated Web3 documentation.
Ask natural-language questions about Ethereum, Solidity, Geth, and the broader Web3 ecosystem โ get structured answers with cited sources, powered by a local embedding model and Qdrant vector search. Every source ingested requires explicit maintainer consent.
Installation ยท Quickstart ยท API Reference ยท Architecture ยท Configuration ยท Contributing
โจ Features
- Semantic search over Web3 docs using
fastembed+ Qdrant (no GPU required) - AI-assisted answers with structured output and cited sources
- Consent-first ingestion โ only indexes domains with explicit maintainer approval
- Display policy enforcement โ respects license terms (link-only / snippet / fulltext) per domain
- FastAPI backend with OpenAPI docs at
/docs - Next.js web UI for interactive search
- CLI entry points โ
web3rag-apiandweb3rag-ingest - Docker Compose stack for one-command local setup
๐ธ Screenshots
Web UI โ search interface
Live search result for eth_getBalance
OpenAPI interactive docs (/docs)
๐ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Web3 Sources โ
โ (ethereum.org, geth.ethereum.org, โฆ) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ consent gate (consents.yaml)
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Ingest Pipeline โ
โ ingest โ preprocess โ embed (fastembed) โ index โ
โ โ โ
โ data/processed/ data/vectors/ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโ
โ Qdrant โ vector store
โ :6333 โ (Docker)
โโโโโโโโฌโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FastAPI :8080 โ
โ GET /search โ dense vector search + policy filter โ
โ POST /assist โ retrieval + structured answer โ
โ GET /health โ liveness check โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโ
โ Next.js Web UI โ
โ :3000 โ
โโโโโโโโโโโโโโโโโโโโโโโโ
๐ฆ Installation
Requirements: Python 3.11+, Docker
pip install agentic-web3-rag
With optional OpenAI-powered answers:
pip install "agentic-web3-rag[openai]"
For local development:
git clone https://github.com/VinitaSilaparasetty/agentic-web3-rag.git
cd agentic-web3-rag
pip install -e ".[dev]"
๐ Quickstart
1. Configure environment
cp .env.example .env
# Edit .env and fill in your keys:
# OPENAI_API_KEY=... (optional โ only needed for LLM-assisted answers)
# GITHUB_TOKEN=... (optional โ raises GitHub API rate limits for discovery)
2. Start Qdrant
docker compose up -d qdrant
3. Run the ingest pipeline
# Ingest โ chunk โ embed โ index (all four steps)
web3rag-ingest --sources data/sources.yaml
python -m pipelines.preprocess
python -m pipelines.embed
python -m pipelines.index
Or use Make:
make ingest # runs ingest step
make dev # creates venv + installs deps
make up # starts Docker stack
make api # starts API server
make test # runs test suite
make eval # runs retrieval smoke eval
4. Start the API
web3rag-api
# โ http://localhost:8080
# โ http://localhost:8080/docs (OpenAPI)
5. (Optional) Start the Web UI
cd webui
npm install
npm run dev
# โ http://localhost:3000
๐ API Reference
GET /health
Liveness check.
curl http://localhost:8080/health
# {"ok": true}
GET /search
Dense vector search over indexed docs.
| Parameter | Type | Default | Description |
|---|---|---|---|
q |
string |
โ | Required. Natural-language query |
k |
int |
5 |
Number of results to return (max 10) |
project |
string |
โ | Filter by project (e.g. ethereum,geth) |
collection |
string |
โ | Override Qdrant collection name |
offset |
int |
0 |
Pagination offset |
curl "http://localhost:8080/search?q=how+do+I+call+eth_getBalance&k=3&project=geth"
{
"results": [
{
"url": "https://geth.ethereum.org/docs/interacting-with-geth/rpc",
"title": "Rpc",
"snippet": "JSON-RPC Server โ Interacting with Geth requires sending requests...",
"score": 0.82,
"project": "geth",
"source": "geth.ethereum.org"
}
]
}
POST /assist
Retrieval-augmented answer with cited sources.
curl -X POST http://localhost:8080/assist \
-H "Content-Type: application/json" \
-d '{"q": "how do I call eth_getBalance in geth", "k": 3}'
Body parameters:
| Field | Type | Default | Description |
|---|---|---|---|
q |
string |
โ | Required. Developer question |
k |
int |
5 |
Docs to retrieve |
project |
string |
โ | Project filter (ethereum, geth) |
collection |
string |
โ | Override Qdrant collection |
offset |
int |
0 |
Pagination offset |
{
"query": "how do I call eth_getBalance in geth",
"answer": "### Enable JSON-RPC in geth\n...\n**References**\n- Rpc (geth.ethereum.org) โ https://...",
"results": [...]
}
โ๏ธ Configuration
All settings are read from environment variables (or .env). Copy .env.example to get started.
| Variable | Default | Description |
|---|---|---|
QDRANT_URL |
http://localhost:6333 |
Qdrant server URL |
QDRANT_API_KEY |
โ | Qdrant API key (for Qdrant Cloud) |
QDRANT_ALIAS_ACTIVE |
web3_docs_active |
Active collection alias queried by the API |
QDRANT_COLLECTION_STAGING |
web3_docs_staging |
Staging collection written to by the pipeline |
EMBEDDING_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
fastembed model used for indexing and query |
OPENAI_API_KEY |
โ | Enables LLM-assisted answers in /assist |
ASSIST_USE_OPENAI |
false |
Set to true to enable OpenAI answers |
ASSIST_OPENAI_MODEL |
gpt-4o-mini |
OpenAI model for assisted answers |
GITHUB_TOKEN |
โ | Raises GitHub API rate limit for source discovery |
USER_AGENT |
web3-rag-bot/0.1 |
HTTP user-agent used during ingestion |
CACHE_POLICY_DEFAULT |
link-only |
Default display policy for unknown domains |
SNIPPET_CHARS |
320 |
Max characters in returned snippets |
API_HOST |
0.0.0.0 |
API bind address |
API_PORT |
8080 |
API port |
JWT_SECRET |
dev-secret-change-me |
Secret for JWT smoke tokens (change in prod) |
๐ Web3 Maintainers โ Opt In
If you maintain Web3 documentation and want it indexed, click the button below. It takes 2 minutes and you can revoke at any time.
By submitting the form you agree to the Consent to Index terms. Your GitHub account identity and submission timestamp are recorded as the consent record. You can revoke at any time by commenting "REVOKE" on your issue or emailing info@aevoxis.de โ all indexed content is removed within 48 hours.
๐ Adding Your Own Sources
1. Get consent from the doc maintainer
Ask the maintainer to submit the opt-in form above, or raise an issue on their repo pointing them to it. Save the link to their consent issue as proof.
2. Add the domain to data/consents.yaml
consents:
- status: approved
domain: yourdocs.example.com
project: yourproject
proof: "https://github.com/yourorg/yourrepo/issues/123"
scope:
include_paths:
- /docs/
exclude_paths: []
3. Add the URL to data/sources.yaml
sources:
- kind: website
id: yourproject-docs
project: yourproject
url: https://yourdocs.example.com/docs/
consent_proof: "https://github.com/yourorg/yourrepo/issues/123"
4. Re-run the pipeline
web3rag-ingest --sources data/sources.yaml
python -m pipelines.preprocess
python -m pipelines.embed
python -m pipelines.index
๐ณ Docker
A full Docker Compose stack is included:
docker compose up -d # starts Qdrant (+ Postgres)
docker compose down -v # stops and removes volumes
To build and run the API in Docker:
docker build -f infra/docker/api/Dockerfile -t web3rag-api .
docker run -p 8080:8080 --env-file .env web3rag-api
๐งช Testing
pip install -e ".[dev]"
pytest
Run the retrieval smoke eval (requires a running Qdrant with indexed data):
python -m pipelines.eval_retrieval
๐ Consent, Governance & Compliance
This project operates on a deny-by-default consent model:
- Only domains listed as
approvedindata/consents.yamlare ever ingested - Each entry requires a
prooflink (GitHub issue, email, PR) from the maintainer - Display policy per domain is enforced at query time (
link-only/snippet/fulltext) - Takedown requests are honoured within 48 hours โ see LEGAL.md
- Full policy details in GOVERNANCE.md
EU compliance
| Regulation | How it is addressed |
|---|---|
| GDPR (2016/679) | PRIVACY.md โ privacy notice, data subject rights, retention policy, third-country transfer disclosure |
| EU AI Act (2024/1689) Art. 50 | /assist responses carry "ai_generated": true and X-AI-Generated: true header; integrators must surface this to end users |
| DSM Copyright Directive (2019/790) Art. 4 | Consent model is opt-in โ exceeds the opt-out minimum; robots.txt + X-Robots-Tag: noai + TDM reservation headers respected |
| eIDAS (910/2014) Art. 25 | GitHub issue consent = Simple Electronic Signature; legally admissible as evidence โ see CONSENT.md ยง9 |
| DSA (2022/2065) | Micro-enterprise exemption applies; no algorithmic content ranking or advertising |
๐ค Contributing
Contributions are welcome. Please open an issue before submitting a large PR.
git clone https://github.com/VinitaSilaparasetty/agentic-web3-rag.git
cd agentic-web3-rag
pip install -e ".[dev]"
pytest
- Bug reports โ open an issue
- Feature requests โ open an issue
๐ผ Commercial Licensing
This software is licensed under AGPL-3.0. For commercial use, enterprise deployment, or white-label licensing:
๐ง info@aevoxis.de
๐ License
Copyright ยฉ 2025 Vinita Silaparasetty, Aevoxis Solutions. Licensed under the GNU Affero General Public License v3.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentic_web3_rag-0.1.0.tar.gz.
File metadata
- Download URL: agentic_web3_rag-0.1.0.tar.gz
- Upload date:
- Size: 41.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
562777104ed63376bb31548a76b1c737ccce65af2c2fede272fef6034360159f
|
|
| MD5 |
ed50c4056f73f967c8c176035dbb6d80
|
|
| BLAKE2b-256 |
d825fdcd5fec2b16dbb2775632a47086723e525d570cb2684a5d80ad22a2e594
|
Provenance
The following attestation bundles were made for agentic_web3_rag-0.1.0.tar.gz:
Publisher:
release.yml on VinitaSilaparasetty/agentic-web3-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentic_web3_rag-0.1.0.tar.gz -
Subject digest:
562777104ed63376bb31548a76b1c737ccce65af2c2fede272fef6034360159f - Sigstore transparency entry: 2047338756
- Sigstore integration time:
-
Permalink:
VinitaSilaparasetty/agentic-web3-rag@21aab244b64d7715738ade796d13ed72298bd3cc -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/VinitaSilaparasetty
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@21aab244b64d7715738ade796d13ed72298bd3cc -
Trigger Event:
release
-
Statement type:
File details
Details for the file agentic_web3_rag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentic_web3_rag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 48.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dde56ff6512fa731eb46377ae11d448bf447fcb0aab4104e6e6d44108a5164de
|
|
| MD5 |
4c1be77b26c73280998243636281f574
|
|
| BLAKE2b-256 |
992f64c3393e016c5175c4cb6be62408964d217e8002f7ab7eac71792c8bca8d
|
Provenance
The following attestation bundles were made for agentic_web3_rag-0.1.0-py3-none-any.whl:
Publisher:
release.yml on VinitaSilaparasetty/agentic-web3-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentic_web3_rag-0.1.0-py3-none-any.whl -
Subject digest:
dde56ff6512fa731eb46377ae11d448bf447fcb0aab4104e6e6d44108a5164de - Sigstore transparency entry: 2047338814
- Sigstore integration time:
-
Permalink:
VinitaSilaparasetty/agentic-web3-rag@21aab244b64d7715738ade796d13ed72298bd3cc -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/VinitaSilaparasetty
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@21aab244b64d7715738ade796d13ed72298bd3cc -
Trigger Event:
release
-
Statement type: