Skip to main content

BitSearch Intelligence Engine — real-time, citation-backed web search & extraction for AI apps. Built on Bitscrape.

Project description

BIE — BitSearch Intelligence Engine

PyPI Python License: MIT Built on Bitscrape

The fastest, simplest way to give any LLM, RAG pipeline, or AI agent real-time, citation-backed web search and extraction.

BIE crawls the live web (powered by Bitscrape, our high-performance async crawler), builds a hybrid BM25 + semantic vector index in memory, and returns ranked, source-attributed results — all from a single Python call, REST endpoint, CLI command, or MCP tool.

import bie

results = bie.search(
    "latest semiconductor export rules 2026",
    urls=["https://www.reuters.com/technology/"],
)

for r in results:
    print(r.title, "—", r.url, f"(score={r.score:.3f})")

Why BIE?

  • 🚀 Zero infra — no Elasticsearch, no Milvus, no Kafka. Pure Python, in-memory hybrid index. Scale up later if you need to.
  • 🧠 Hybrid retrieval out of the box — BM25 lexical search fused with sentence-transformer embeddings via Reciprocal Rank Fusion.
  • 🤖 MCP-ready — drop-in tool for Claude Desktop, Claude Code, and any MCP-compatible AI app.
  • Powered by Bitscrape — async, polite (robots.txt-aware), and fast crawling/extraction under the hood.
  • 🔌 Use anywhere — Python library, REST API, CLI, or MCP server.

Install

pip install bits-bie

Note: the PyPI distribution is named bits-bie (since bie was too similar to an existing PyPI project), but you still import bie and run the bie CLI command — same API as shown below.

Optional extras:

pip install "bits-bie[embeddings]"  # semantic/vector search (sentence-transformers)
pip install "bits-bie[server]"      # FastAPI + Uvicorn REST server
pip install "bits-bie[mcp]"         # Model Context Protocol server
pip install "bits-bie[all]"         # everything

BIE depends on bitscrape, our proprietary async crawling & extraction framework, which is installed automatically.


Usage

1. One-shot search (Python)

import bie

results = bie.search("AI regulation news", urls=["https://example.com/news"], top_k=5)
for r in results:
    print(r)

2. Build a reusable index

from bie import BIE

engine = BIE()
engine.crawl(["https://example.com/blog", "https://another-site.com"])

print(engine.search("quarterly earnings"))
print(engine.search("product launch"))  # reuses the same index

3. Index your own text (no crawling)

engine.add_text(
    url="internal://doc-1",
    title="Q2 Strategy Memo",
    text="...",
    trust_score=1.0,
)

4. CLI

# Crawl + search in one command
bie search "global markets today" --url https://www.bbc.com/news --top-k 5

# Just crawl & dump extracted pages
bie crawl https://example.com --max-pages 20 --out docs.jsonl

# Run the REST API
bie serve --port 8000

# Run as an MCP server (stdio)
bie mcp

5. REST API

bie serve --port 8000
curl -X POST http://localhost:8000/crawl/url \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com/news"]}'

curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "latest news", "top_k": 5}'

See the full endpoint contract in docs/API.md.

6. MCP (Model Context Protocol)

Add BIE as a tool in your MCP client (e.g. claude_desktop_config.json):

{
  "mcpServers": {
    "bie": {
      "command": "bie",
      "args": ["mcp"]
    }
  }
}

This exposes three tools to your AI assistant:

  • bie_search(query, urls, top_k, max_pages) — crawl + search in one call
  • bie_crawl(urls, max_pages) — crawl & index into a session-persistent store
  • bie_index_search(query, top_k) — search the session index

Configuration

All settings can be set via environment variables prefixed with BIE_, or passed directly:

from bie import BIE, BIESettings

engine = BIE(BIESettings(
    max_pages=20,
    max_depth=1,
    use_embeddings=True,
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",
    bm25_weight=0.6,
    vector_weight=0.4,
))
Setting Env var Default Description
max_pages BIE_MAX_PAGES 40 Max pages crawled per seed URL
max_depth BIE_MAX_DEPTH 2 Max link-follow depth
concurrent_requests BIE_CONCURRENT_REQUESTS 16 Crawl concurrency
robotstxt_obey BIE_ROBOTSTXT_OBEY true Respect robots.txt
use_embeddings BIE_USE_EMBEDDINGS true Enable semantic search
chunk_size BIE_CHUNK_SIZE 800 Chars per chunk
bm25_weight / vector_weight BIE_BM25_WEIGHT / BIE_VECTOR_WEIGHT 0.5 / 0.5 Fusion weights
api_key BIE_API_KEY None If set, requires Authorization: Bearer <key>

Architecture

              ┌─────────────────────────────────────────┐
              │                  bie                     │
              │                                           │
   urls ──▶   │  Crawler (Bitscrape)                     │
              │     │                                     │
              │     ▼                                     │
              │  Document → Chunker → HybridIndex         │
              │                         │   │             │
              │                  BM25Index  VectorIndex   │
              │                         │   │             │
              │                       Fusion (RRF)        │
              │                         │                 │
   query ──▶  │                         ▼                 │
              │                  Ranked SearchResults      │
              └─────────────────────────────────────────┘
                     │            │            │
                  Python API   REST API    MCP Server

This OSS edition implements the core of the BIE PRD's Module 1 (Crawler), Module 2 (Indexes), Module 3 (Hybrid Retriever), and Module 11 (Agent API) as a single lightweight package — no external services required. Larger deployments can swap BM25Index/VectorIndex for Elasticsearch/Milvus-backed implementations behind the same HybridIndex interface.


Built on Bitscrape

BIE's crawling and extraction layer is powered by Bitscrape (pip install bitscrape), our async, robots.txt-aware web scraping framework — giving BIE high-performance, polite, production-grade crawling out of the box.


License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bits_bie-0.2.0.tar.gz (66.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bits_bie-0.2.0-py3-none-any.whl (68.0 kB view details)

Uploaded Python 3

File details

Details for the file bits_bie-0.2.0.tar.gz.

File metadata

  • Download URL: bits_bie-0.2.0.tar.gz
  • Upload date:
  • Size: 66.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bits_bie-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8ccdb93d33d0cbf0e35ff5922e84ca66b63fa701a05f33746bb71916069fd590
MD5 801cbc87fff4a99d5731c820551602d5
BLAKE2b-256 0233c30f67b068a7324aff5dd14ec7114bb385dd3b67a8b85ea05d45fa4e1182

See more details on using hashes here.

Provenance

The following attestation bundles were made for bits_bie-0.2.0.tar.gz:

Publisher: publish.yml on Sudharsansm/BIE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bits_bie-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: bits_bie-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 68.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bits_bie-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4c2039f65966f25d9cca70623a353b49bd361dc633c3f2376d53c12f34f2072
MD5 4b2a08940a339862393b2029f565eea5
BLAKE2b-256 e4d54ae96fd3e9c1aed05fbec37832b26998a6582b2df5fd7fb385e904f099d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for bits_bie-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Sudharsansm/BIE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page