Skip to main content

Free, unlimited vector database backed by Telegram

Project description

TgVectorDB

Free, unlimited vector database backed by Telegram.

Store embeddings as Telegram messages. Search them semantically. No API keys. No servers. No monthly bills. Just your Telegram account.

Turbopuffer for broke developers.

how it works

Your vectors are stored as messages in a private Telegram channel you own. A tiny local index (~1MB) routes queries to the right cluster. Search fetches only the relevant messages — not the whole database.

cold query:  ~0.5-1.5 seconds  (fetching from telegram)
warm query:  <5 milliseconds   (from local cache)
cost:        $0/month forever

install

pip install tgvectordb

# for pdf support:
pip install tgvectordb[pdf]

get telegram credentials

  1. go to https://my.telegram.org
  2. log in with your phone number
  3. click "API development tools"
  4. create an app, grab the api_id and api_hash

quick start

from tgvectordb import TgVectorDB

db = TgVectorDB(
    api_id=12345,
    api_hash="your_api_hash_here",
    phone="+91xxxxxxxxxx",
    db_name="my-notes",
)

# add some text
db.add("photosynthesis converts sunlight into chemical energy in plants")
db.add("neural networks learn patterns from training data")
db.add("sourdough bread requires a long fermentation process")

# search
results = db.search("how do plants make food?", top_k=3)
for r in results:
    print(f"[{r['score']:.2f}] {r['text'][:80]}...")

# add a whole document
db.add_source("research_paper.pdf")

# check stats
print(db.stats())

use with a RAG chatbot

from tgvectordb import TgVectorDB

db = TgVectorDB(api_id=..., api_hash=..., phone=..., db_name="rag-bot")

# index your docs (one time)
db.add_source("handbook.pdf")
db.add_source("faq.md")

# on each question
def answer(question):
    context = db.search(question, top_k=5)
    context_text = "\n".join([r["text"] for r in context])
    # pass to your LLM of choice (ollama, llama.cpp, openai, whatever)
    return ask_llm(f"Context:\n{context_text}\n\nQuestion: {question}")

features

  • free forever — telegram provides unlimited cloud storage at no cost
  • zero infrastructure — no docker, no servers, no databases to manage
  • durable — your data lives on telegram's multi-datacenter infrastructure
  • portabledb.restore() on any new machine and you're back in business
  • fast enough — 0.5-1.5s cold queries, <5ms warm queries with caching
  • private — data stays in your own private telegram channel

important stuff

  • uses intfloat/e5-small-v2 for embeddings (384 dims, runs on CPU)
  • vectors are int8 quantized to fit in telegram's 4096 char message limit
  • uses Telethon (MTProto) for fast message fetching, not the bot API
  • recommended: use a secondary telegram account, not your main one
  • this is for personal projects and prototyping, not production SaaS

commands

# --- Adding Data ---
db.add("text")                          
# Adds a single string of text to the database.

db.add_batch(texts, metadatas)          
# Adds multiple texts at once with optional metadata. Much faster than calling add() in a loop since it batches the embeddings and Telegram messages.

db.add_source("file.pdf")              
# Automatically parses, chunks, and adds a supported file format (.pdf, .docx, .txt, .md, .html, .csv, .json, .py, etc).

db.add_directory("./my_docs/", extensions=[".pdf", ".docx"])  
# Ingests all supported files within a directory concurrently.

# --- Search & Retrieval ---
db.search("query", top_k=5)            
# Performs a semantic similarity search against the stored vectors and returns the top_k most relevant chunks.

db.search("query", filter={"src": "document.pdf"}) 
# Filters the semantic search to ONLY check against vectors matching that exact metadata tag.

# --- Database Management ---
db.reindex()                           
# Forces a full re-clustering of your entire vector database. Normally this is handled automatically as your dataset grows.

db.backup()                            
# Zips up the small local SQLite index map and uploads it to a private Telegram channel for disaster recovery.

db.restore()                           
# Downloads the latest index backup from Telegram, allowing you to instantly restore your database on any machine.

db.stats()                             
# Returns a dictionary of database statistics, including vector counts, cache hit rates, and clustering metrics.

db.delete(filter={"src": "old.pdf"})   
# Permanently removes all vectors and metadata matching a specific filter from both Telegram and the local index.

supported file formats

works out of the box: .txt, .md, .html, .csv, .tsv, .json, .jsonl, .xml, .yaml, .py, .js, .java, .go, .rs, and most text-based files.

with optional dependencies:

  • .pdfpip install tgvectordb[pdf] (uses pdfplumber)
  • .docxpip install tgvectordb[docx] (uses python-docx)
  • or just: pip install tgvectordb[all]

license

MIT — do whatever you want with it.

disclaimer

this project uses telegram's cloud infrastructure as a storage backend. while projects like Pentaract have done this since 2023 without issues, its not officially sanctioned by telegram. use a secondary account and don't abuse rate limits. see the full disclaimer in the docs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tgvectordb-0.1.0.tar.gz (44.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tgvectordb-0.1.0-py3-none-any.whl (39.2 kB view details)

Uploaded Python 3

File details

Details for the file tgvectordb-0.1.0.tar.gz.

File metadata

  • Download URL: tgvectordb-0.1.0.tar.gz
  • Upload date:
  • Size: 44.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tgvectordb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c2c8f98f1dd1e100b16f0e147ef9663d60d001de0159fd0f7df07c42b70163c9
MD5 8fc20a3314147252c23a2d87cb2b140c
BLAKE2b-256 0501bb4d44e91c7b8fc5a28b0fcb80e91aa1ddebc3024a816993abb0e2eedc55

See more details on using hashes here.

File details

Details for the file tgvectordb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tgvectordb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tgvectordb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 00966a48916a2be4ea64066eb9c658858f00a89537f26a70e11850a19c27a2ad
MD5 29e8ca08819f1fdcb1592a09d29d279b
BLAKE2b-256 5dea937a9ec309f6835f28e24776da1d5327f1881ee5b2ddda173cb49d219eea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page