Skip to main content

Agentic RAG over consent-gated Web3 documentation โ€” search and answer developer questions from Ethereum, Solidity, and ecosystem docs.

Project description

๐Ÿ” agentic-web3-rag

Semantic search and AI-assisted answers over consent-gated Web3 documentation.

PyPI version Python License: AGPL-3.0 Release GitHub issues


Ask natural-language questions about Ethereum, Solidity, Geth, and the broader Web3 ecosystem โ€” get structured answers with cited sources, powered by a local embedding model and Qdrant vector search. Every source ingested requires explicit maintainer consent.

Installation ยท Quickstart ยท API Reference ยท Architecture ยท Configuration ยท Contributing


โœจ Features

  • Semantic search over Web3 docs using fastembed + Qdrant (no GPU required)
  • AI-assisted answers with structured output and cited sources
  • Consent-first ingestion โ€” only indexes domains with explicit maintainer approval
  • Display policy enforcement โ€” respects license terms (link-only / snippet / fulltext) per domain
  • FastAPI backend with OpenAPI docs at /docs
  • Next.js web UI for interactive search
  • CLI entry points โ€” web3rag-api and web3rag-ingest
  • Docker Compose stack for one-command local setup

๐Ÿ“ธ Screenshots

Web UI โ€” search interface Web3 Docs Search UI

Live search result for eth_getBalance Search result

OpenAPI interactive docs (/docs) API docs


๐Ÿ— Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      Web3 Sources                        โ”‚
โ”‚         (ethereum.org, geth.ethereum.org, โ€ฆ)            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                         โ”‚  consent gate (consents.yaml)
                         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Ingest Pipeline                        โ”‚
โ”‚  ingest โ†’ preprocess โ†’ embed (fastembed) โ†’ index        โ”‚
โ”‚                         โ”‚                               โ”‚
โ”‚              data/processed/   data/vectors/            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                         โ”‚
                         โ–ผ
                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                  โ”‚   Qdrant    โ”‚  vector store
                  โ”‚  :6333      โ”‚  (Docker)
                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                         โ”‚
                         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   FastAPI  :8080                         โ”‚
โ”‚   GET  /search   โ€” dense vector search + policy filter  โ”‚
โ”‚   POST /assist   โ€” retrieval + structured answer        โ”‚
โ”‚   GET  /health   โ€” liveness check                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                         โ”‚
                         โ–ผ
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚   Next.js Web UI     โ”‚
              โ”‚      :3000           โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ฆ Installation

Requirements: Python 3.11+, Docker

pip install agentic-web3-rag

With optional OpenAI-powered answers:

pip install "agentic-web3-rag[openai]"

For local development:

git clone https://github.com/VinitaSilaparasetty/agentic-web3-rag.git
cd agentic-web3-rag
pip install -e ".[dev]"

๐Ÿš€ Quickstart

1. Configure environment

cp .env.example .env
# Edit .env and fill in your keys:
#   OPENAI_API_KEY=...   (optional โ€” only needed for LLM-assisted answers)
#   GITHUB_TOKEN=...     (optional โ€” raises GitHub API rate limits for discovery)

2. Start Qdrant

docker compose up -d qdrant

3. Run the ingest pipeline

# Ingest โ†’ chunk โ†’ embed โ†’ index (all four steps)
web3rag-ingest --sources data/sources.yaml
python -m pipelines.preprocess
python -m pipelines.embed
python -m pipelines.index

Or use Make:

make ingest   # runs ingest step
make dev      # creates venv + installs deps
make up       # starts Docker stack
make api      # starts API server
make test     # runs test suite
make eval     # runs retrieval smoke eval

4. Start the API

web3rag-api
# โ†’ http://localhost:8080
# โ†’ http://localhost:8080/docs  (OpenAPI)

5. (Optional) Start the Web UI

cd webui
npm install
npm run dev
# โ†’ http://localhost:3000

๐Ÿ”Œ API Reference

GET /health

Liveness check.

curl http://localhost:8080/health
# {"ok": true}

GET /search

Dense vector search over indexed docs.

Parameter Type Default Description
q string โ€” Required. Natural-language query
k int 5 Number of results to return (max 10)
project string โ€” Filter by project (e.g. ethereum,geth)
collection string โ€” Override Qdrant collection name
offset int 0 Pagination offset
curl "http://localhost:8080/search?q=how+do+I+call+eth_getBalance&k=3&project=geth"
{
  "results": [
    {
      "url": "https://geth.ethereum.org/docs/interacting-with-geth/rpc",
      "title": "Rpc",
      "snippet": "JSON-RPC Server โ€” Interacting with Geth requires sending requests...",
      "score": 0.82,
      "project": "geth",
      "source": "geth.ethereum.org"
    }
  ]
}

POST /assist

Retrieval-augmented answer with cited sources.

curl -X POST http://localhost:8080/assist \
  -H "Content-Type: application/json" \
  -d '{"q": "how do I call eth_getBalance in geth", "k": 3}'

Body parameters:

Field Type Default Description
q string โ€” Required. Developer question
k int 5 Docs to retrieve
project string โ€” Project filter (ethereum, geth)
collection string โ€” Override Qdrant collection
offset int 0 Pagination offset
{
  "query": "how do I call eth_getBalance in geth",
  "answer": "### Enable JSON-RPC in geth\n...\n**References**\n- Rpc (geth.ethereum.org) โ†’ https://...",
  "results": [...]
}

โš™๏ธ Configuration

All settings are read from environment variables (or .env). Copy .env.example to get started.

Variable Default Description
QDRANT_URL http://localhost:6333 Qdrant server URL
QDRANT_API_KEY โ€” Qdrant API key (for Qdrant Cloud)
QDRANT_ALIAS_ACTIVE web3_docs_active Active collection alias queried by the API
QDRANT_COLLECTION_STAGING web3_docs_staging Staging collection written to by the pipeline
EMBEDDING_MODEL sentence-transformers/all-MiniLM-L6-v2 fastembed model used for indexing and query
OPENAI_API_KEY โ€” Enables LLM-assisted answers in /assist
ASSIST_USE_OPENAI false Set to true to enable OpenAI answers
ASSIST_OPENAI_MODEL gpt-4o-mini OpenAI model for assisted answers
GITHUB_TOKEN โ€” Raises GitHub API rate limit for source discovery
USER_AGENT web3-rag-bot/0.1 HTTP user-agent used during ingestion
CACHE_POLICY_DEFAULT link-only Default display policy for unknown domains
SNIPPET_CHARS 320 Max characters in returned snippets
API_HOST 0.0.0.0 API bind address
API_PORT 8080 API port
JWT_SECRET dev-secret-change-me Secret for JWT smoke tokens (change in prod)

๐Ÿ™‹ Web3 Maintainers โ€” Opt In

If you maintain Web3 documentation and want it indexed, click the button below. It takes 2 minutes and you can revoke at any time.

Opt in to indexing

By submitting the form you agree to the Consent to Index terms. Your GitHub account identity and submission timestamp are recorded as the consent record. You can revoke at any time by commenting "REVOKE" on your issue or emailing info@aevoxis.de โ€” all indexed content is removed within 48 hours.


๐Ÿ“‹ Adding Your Own Sources

1. Get consent from the doc maintainer

Ask the maintainer to submit the opt-in form above, or raise an issue on their repo pointing them to it. Save the link to their consent issue as proof.

2. Add the domain to data/consents.yaml

consents:
  - status: approved
    domain: yourdocs.example.com
    project: yourproject
    proof: "https://github.com/yourorg/yourrepo/issues/123"
    scope:
      include_paths:
        - /docs/
      exclude_paths: []

3. Add the URL to data/sources.yaml

sources:
  - kind: website
    id: yourproject-docs
    project: yourproject
    url: https://yourdocs.example.com/docs/
    consent_proof: "https://github.com/yourorg/yourrepo/issues/123"

4. Re-run the pipeline

web3rag-ingest --sources data/sources.yaml
python -m pipelines.preprocess
python -m pipelines.embed
python -m pipelines.index

๐Ÿณ Docker

A full Docker Compose stack is included:

docker compose up -d          # starts Qdrant (+ Postgres)
docker compose down -v        # stops and removes volumes

To build and run the API in Docker:

docker build -f infra/docker/api/Dockerfile -t web3rag-api .
docker run -p 8080:8080 --env-file .env web3rag-api

๐Ÿงช Testing

pip install -e ".[dev]"
pytest

Run the retrieval smoke eval (requires a running Qdrant with indexed data):

python -m pipelines.eval_retrieval

๐Ÿ“œ Consent, Governance & Compliance

This project operates on a deny-by-default consent model:

  • Only domains listed as approved in data/consents.yaml are ever ingested
  • Each entry requires a proof link (GitHub issue, email, PR) from the maintainer
  • Display policy per domain is enforced at query time (link-only / snippet / fulltext)
  • Takedown requests are honoured within 48 hours โ€” see LEGAL.md
  • Full policy details in GOVERNANCE.md

EU compliance

Regulation How it is addressed
GDPR (2016/679) PRIVACY.md โ€” privacy notice, data subject rights, retention policy, third-country transfer disclosure
EU AI Act (2024/1689) Art. 50 /assist responses carry "ai_generated": true and X-AI-Generated: true header; integrators must surface this to end users
DSM Copyright Directive (2019/790) Art. 4 Consent model is opt-in โ€” exceeds the opt-out minimum; robots.txt + X-Robots-Tag: noai + TDM reservation headers respected
eIDAS (910/2014) Art. 25 GitHub issue consent = Simple Electronic Signature; legally admissible as evidence โ€” see CONSENT.md ยง9
DSA (2022/2065) Micro-enterprise exemption applies; no algorithmic content ranking or advertising

๐Ÿค Contributing

Contributions are welcome. Please open an issue before submitting a large PR.

git clone https://github.com/VinitaSilaparasetty/agentic-web3-rag.git
cd agentic-web3-rag
pip install -e ".[dev]"
pytest

๐Ÿ’ผ Commercial Licensing

This software is licensed under AGPL-3.0. For commercial use, enterprise deployment, or white-label licensing:

๐Ÿ“ง info@aevoxis.de


๐Ÿ“„ License

Copyright ยฉ 2025 Vinita Silaparasetty, Aevoxis Solutions. Licensed under the GNU Affero General Public License v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_web3_rag-0.1.0.tar.gz (41.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentic_web3_rag-0.1.0-py3-none-any.whl (48.0 kB view details)

Uploaded Python 3

File details

Details for the file agentic_web3_rag-0.1.0.tar.gz.

File metadata

  • Download URL: agentic_web3_rag-0.1.0.tar.gz
  • Upload date:
  • Size: 41.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentic_web3_rag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 562777104ed63376bb31548a76b1c737ccce65af2c2fede272fef6034360159f
MD5 ed50c4056f73f967c8c176035dbb6d80
BLAKE2b-256 d825fdcd5fec2b16dbb2775632a47086723e525d570cb2684a5d80ad22a2e594

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_web3_rag-0.1.0.tar.gz:

Publisher: release.yml on VinitaSilaparasetty/agentic-web3-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentic_web3_rag-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agentic_web3_rag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dde56ff6512fa731eb46377ae11d448bf447fcb0aab4104e6e6d44108a5164de
MD5 4c1be77b26c73280998243636281f574
BLAKE2b-256 992f64c3393e016c5175c4cb6be62408964d217e8002f7ab7eac71792c8bca8d

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_web3_rag-0.1.0-py3-none-any.whl:

Publisher: release.yml on VinitaSilaparasetty/agentic-web3-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page