Skip to main content

An AI-Native Search Database. Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.

Project description

🚀 What is OceanBase seekdb?

OceanBase seekdb is an AI-native search database that unifies relational, vector, text, JSON and GIS in a single engine, enabling hybrid search and in-database AI workflows.


🔥 Why OceanBase seekdb?

Feature seekdb OceanBase Chroma Milvus MySQL 9.0 PostgreSQL
+pgvector
DuckDB Elasticsearch
Embedded [1]
Single-Node
Distributed
MySQL Compatible
Vector Search
Full-Text Search ⚠️
Hybrid Search ⚠️
OLTP
OLAP ⚠️
License Apache 2.0 MulanPubL 2.0 Apache 2.0 Apache 2.0 GPL 2.0 PostgreSQL License MIT AGPLv3
+SSPLv1
+Elastic 2.0

[1] Embedded capability is removed in MySQL 8.0

  • ✅ Supported
  • ❌ Not Supported
  • ⚠️ Limited

✨ Key Features

Build fast + Hybrid search + Multi model

  1. Build fast: From prototype to production in minutes: create AI apps using Python, run VectorDBBench on 1C2G.
  2. Hybrid Search: Combine vector search, full-text search and relational query in a single statement.
  3. Multi-Model: Support relational, vector, text, JSON and GIS in a single engine.

AI inside + SQL inside

  1. AI Inside: Run embedding, reranking, LLM inference and prompt management inside the database, supporting a complete document-in/data-out RAG workflow.
  2. SQL Inside: Powered by the proven OceanBase engine, delivering real-time writes and queries with full ACID compliance, and seamless MySQL ecosystem compatibility.

Installation

pip install pylibseekdb

Requirements

  • CPython >= 3.11
  • Linux x86_64, aarch64/arm64 with glibc version >= 2.28 (Alpine Linux is not supported yet)
  • MacOS >= 15.6

🎬 Quick Start

Installation

🐍 Python (Recommended for AI/ML)
pip install -U pyseekdb

🎯 AI Search Example

Build a semantic search system in 5 minutes:

🗄️ 🐍 Python SDK
# install sdk first
pip install -U pyseekdb
"""
this example demonstrates the most common operations with embedding functions:
1. Create a client connection
2. Create a collection with embedding function
3. Add data using documents (embeddings auto-generated)
4. Query using query texts (embeddings auto-generated)
5. Print query results

This is a minimal example to get you started quickly with embedding functions.
"""

import pyseekdb
from pyseekdb import DefaultEmbeddingFunction

# ==================== Step 1: Create Client Connection ====================
# You can use embedded mode, server mode, or OceanBase mode
# For this example, we'll use server mode (you can change to embedded or OceanBase)

# Embedded mode (local SeekDB)
client = pyseekdb.Client(
    path="./seekdb.db",
    database="test"
)
# Alternative: Server mode (connecting to remote SeekDB server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     database="test",
#     user="root",
#     password=""
# )

# Alternative: Remote server mode (OceanBase Server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     tenant="test",  # OceanBase default tenant
#     database="test",
#     user="root",
#     password=""
# )

# ==================== Step 2: Create a Collection with Embedding Function ====================
# A collection is like a table that stores documents with vector embeddings
collection_name = "my_simple_collection"

# Create collection with default embedding function
# The embedding function will automatically convert documents to embeddings
collection = client.create_collection(
    name=collection_name,
    #embedding_function=DefaultEmbeddingFunction()  # Uses default model (384 dimensions)
)

print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
print(f"Embedding function: {collection.embedding_function}")

# ==================== Step 3: Add Data to Collection ====================
# With embedding function, you can add documents directly without providing embeddings
# The embedding function will automatically generate embeddings from documents

documents = [
    "Machine learning is a subset of artificial intelligence",
    "Python is a popular programming language",
    "Vector databases enable semantic search",
    "Neural networks are inspired by the human brain",
    "Natural language processing helps computers understand text"
]

ids = ["id1", "id2", "id3", "id4", "id5"]

# Add data with documents only - embeddings will be auto-generated by embedding function
collection.add(
    ids=ids,
    documents=documents,  # embeddings will be automatically generated
    metadatas=[
        {"category": "AI", "index": 0},
        {"category": "Programming", "index": 1},
        {"category": "Database", "index": 2},
        {"category": "AI", "index": 3},
        {"category": "NLP", "index": 4}
    ]
)

print(f"\nAdded {len(documents)} documents to collection")
print("Note: Embeddings were automatically generated from documents using the embedding function")

# ==================== Step 4: Query the Collection ====================
# With embedding function, you can query using text directly
# The embedding function will automatically convert query text to query vector

# Query using text - query vector will be auto-generated by embedding function
query_text = "artificial intelligence and machine learning"

results = collection.query(
    query_texts=query_text,  # Query text - will be embedded automatically
    n_results=3  # Return top 3 most similar documents
)

print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")

# ==================== Step 5: Print Query Results ====================
for i in range(len(results['ids'][0])):
    print(f"\nResult {i+1}:")
    print(f"  ID: {results['ids'][0][i]}")
    print(f"  Distance: {results['distances'][0][i]:.4f}")
    if results.get('documents'):
        print(f"  Document: {results['documents'][0][i]}")
    if results.get('metadatas'):
        print(f"  Metadata: {results['metadatas'][0][i]}")

# ==================== Step 6: Cleanup ====================
# Delete the collection
client.delete_collection(collection_name)
print(f"\nDeleted collection '{collection_name}'")

Please refer to the User Guide for more details.

🗄️ SQL
-- Create table with vector column
CREATE TABLE articles (
            id INT PRIMARY KEY,
            title TEXT,
            content TEXT,
            embedding VECTOR(384),
            FULLTEXT INDEX idx_fts(content) WITH PARSER ik,
            VECTOR INDEX idx_vec (embedding) WITH(DISTANCE=l2, TYPE=hnsw, LIB=vsag)
        ) ORGANIZATION = HEAP;

-- Insert documents with embeddings
-- Note: Embeddings should be pre-computed using your embedding model
INSERT INTO articles (id, title, content, embedding)
VALUES
    (1, 'AI and Machine Learning', 'Artificial intelligence is transforming...', '[0.1, 0.2, ...]'),
    (2, 'Database Systems', 'Modern databases provide high performance...', '[0.3, 0.4, ...]'),
    (3, 'Vector Search', 'Vector databases enable semantic search...', '[0.5, 0.6, ...]');

-- Example: Hybrid search combining vector and full-text
-- Replace '[query_embedding]' with your actual query embedding vector
SELECT
    title,
    content,
    l2_distance(embedding, '[query_embedding]') AS vector_distance,
    MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE) AS text_score
FROM articles
WHERE MATCH(content) AGAINST('your keywords' IN NATURAL LANGUAGE MODE)
ORDER BY vector_distance APPROXIMATE
LIMIT 10;

We suggest developers use sqlalchemy to access data by SQL for python developers.

📚 Use Cases

📖 RAG & Knowledge Retrieval

Large language models are limited by their training data. RAG introduces timely and trusted external knowledge to improve answer quality and reduce hallucination. seekdb enhances search accuracy through vector search, full-text search, hybrid search, built-in AI functions, and efficient indexing, while multi-level access control safeguards data privacy across heterogeneous knowledge sources.

  1. Enterprise QA
  2. Customer support
  3. Industry insights
  4. Personal knowledge
🔍 Semantic Search Engine

Traditional keyword search struggles to capture intent. Semantic search leverages embeddings and vector search to understand meaning and connect text, images, and other modalities. seekdb's hybrid search and multi-model querying deliver more precise, context-aware results across complex search scenarios.

  1. Product search
  2. Text-to-image
  3. Image-to-product
🎯 Agentic AI Applications

Agentic AI requires memory, planning, perception, and reasoning. seekdb provides a unified foundation for agents through metadata management, vector/text/mixed queries, multimodal data processing, RAG, built-in AI functions and inference, and robust privacy controls—enabling scalable, production-grade agent systems.

  1. Personal assistants
  2. Enterprise automation
  3. Vertical agents
  4. Agent platforms
💻 AI-Assisted Coding & Development

AI-powered coding combines natural-language understanding and code semantic analysis to enable generation, completion, debugging, testing, and refactoring. seekdb enhances code intelligence with semantic search, multi-model storage for code and documents, isolated multi-project management, and time-travel queries—supporting both local and cloud IDE environments.

  1. IDE plugins
  2. Design-to-web
  3. Local IDEs
  4. Web IDEs
⬆️ Enterprise Application Intelligence

AI transforms enterprise systems from passive tools into proactive collaborators. seekdb provides a unified AI-ready storage layer, fully compatible with MySQL syntax and views, and accelerates mixed workloads with parallel execution and hybrid row-column storage. Legacy applications gain intelligent capabilities with minimal migration across office, workflow, and business analytics scenarios.

  1. Document intelligence
  2. Business insights
  3. Finance systems
📱 On-Device & Edge AI Applications

Edge devices—from mobile to vehicle and industrial terminals—operate with constrained compute and storage. seekdb's lightweight architecture supports embedded and micro-server modes, delivering full SQL, JSON, and hybrid search under low resource usage. It integrates seamlessly with OceanBase cloud services to enable unified edge-to-cloud intelligent systems.

  1. Personal assistants
  2. In-vehicle systems
  3. AI education
  4. Companion robots
  5. Healthcare devices

🌟 Ecosystem & Integrations

HuggingFace LangChain LangGraph Dify Coze LlamaIndex Firecrawl FastGPT DB-GPT Camel-AI spring-ai-alibaba Cloudflare Workers AI Jina AI Ragas Instructor Baseten

Please refer to the [User Guide](docs/user-guide/README.md) for more details.


🤝 Community & Support

License

This package is licensed under Apache 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pylibseekdb-1.3.0.dev2-cp314-cp314-manylinux_2_28_x86_64.whl (160.1 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

pylibseekdb-1.3.0.dev2-cp314-cp314-manylinux_2_28_aarch64.whl (140.9 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ ARM64

pylibseekdb-1.3.0.dev2-cp314-cp314-macosx_15_0_arm64.whl (142.7 MB view details)

Uploaded CPython 3.14macOS 15.0+ ARM64

pylibseekdb-1.3.0.dev2-cp312-cp312-manylinux_2_28_x86_64.whl (160.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

pylibseekdb-1.3.0.dev2-cp312-cp312-manylinux_2_28_aarch64.whl (140.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

pylibseekdb-1.3.0.dev2-cp312-cp312-macosx_15_0_arm64.whl (142.7 MB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

pylibseekdb-1.3.0.dev2-cp311-cp311-manylinux_2_28_x86_64.whl (160.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

pylibseekdb-1.3.0.dev2-cp311-cp311-manylinux_2_28_aarch64.whl (140.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

pylibseekdb-1.3.0.dev2-cp311-cp311-macosx_15_0_arm64.whl (142.8 MB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

pylibseekdb-1.3.0.dev2-cp310-cp310-manylinux_2_28_x86_64.whl (160.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

pylibseekdb-1.3.0.dev2-cp310-cp310-manylinux_2_28_aarch64.whl (140.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ ARM64

File details

Details for the file pylibseekdb-1.3.0.dev2-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pylibseekdb-1.3.0.dev2-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 67f059e52038f0befedc2893408fd8e231d095a24ae00ecafd59fe486b436ae3
MD5 34d6a0c67c493d14e184d3261b8e0f6b
BLAKE2b-256 58f526573a9913400f4df28799e1defda9b9d115ec0c0e5855d094055434098a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylibseekdb-1.3.0.dev2-cp314-cp314-manylinux_2_28_x86_64.whl:

Publisher: release-seekdb-python.yml on hnwyllmm/docker-images

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pylibseekdb-1.3.0.dev2-cp314-cp314-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pylibseekdb-1.3.0.dev2-cp314-cp314-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e6bccea003973f8dd5bc25d12032afe4bfb48f9c34eb98cedbda62285daa8c2e
MD5 15a5f9d7c051616a393f9172b5226de2
BLAKE2b-256 e582c3993c5413d397146e2302477222fb23b5825c3f6b4d2ba8a03dcb139bb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylibseekdb-1.3.0.dev2-cp314-cp314-manylinux_2_28_aarch64.whl:

Publisher: release-seekdb-python.yml on hnwyllmm/docker-images

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pylibseekdb-1.3.0.dev2-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for pylibseekdb-1.3.0.dev2-cp314-cp314-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 db2d96b079462b0cfda85746ddb9f1cc9c90e6b709799d2bc3b2bbf4c6c01d6b
MD5 fb974edc931a36d17185b17746b11ab0
BLAKE2b-256 f104fca477916cb98aa71c7716dfe1d2ffa78006bfe469d523673c835087e48d

See more details on using hashes here.

File details

Details for the file pylibseekdb-1.3.0.dev2-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pylibseekdb-1.3.0.dev2-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 095b98912c496ebba13d6f5fbfc3cf3843bb629a5512fedc1126cab4f7c7a8d4
MD5 943f6c023a28dc4b23e1d25c34a18811
BLAKE2b-256 ab8a9dc058dd288d2e69196497f2d71ed58f757c511abac599e08d6d805c5f9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylibseekdb-1.3.0.dev2-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: release-seekdb-python.yml on hnwyllmm/docker-images

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pylibseekdb-1.3.0.dev2-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pylibseekdb-1.3.0.dev2-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3c6f8fa7eb3d9e16e2a2f9aedda71249826dbdbe705e2b98ba0b37fab61a382a
MD5 d52f21085db42c4114fd79a40e1bef1a
BLAKE2b-256 2c3a0df7de71bae4217488daa9e0b23a86ccc063f5573cf9e2f54ade24bc0957

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylibseekdb-1.3.0.dev2-cp312-cp312-manylinux_2_28_aarch64.whl:

Publisher: release-seekdb-python.yml on hnwyllmm/docker-images

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pylibseekdb-1.3.0.dev2-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for pylibseekdb-1.3.0.dev2-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 bb535b6842c2b2abbfa270b0985b21e635f5a3010b5c54ffa603a29f9a68ed90
MD5 7d72be5351d9286e922e16e5b7f0fe46
BLAKE2b-256 0bc3db6dee2ed711d4a3cb047050597f162618e533bda31509c88e60e732891a

See more details on using hashes here.

File details

Details for the file pylibseekdb-1.3.0.dev2-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pylibseekdb-1.3.0.dev2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 eb2590228544f7c8773a329e0df2e3ff0703b519d3fee0b04e763ed4fdb58348
MD5 7749b966b3c0f83c2884a075127100e7
BLAKE2b-256 a1f8f8b1154091e1f5297bf52c4dae451fb221951fda784811c98738a62c2928

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylibseekdb-1.3.0.dev2-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: release-seekdb-python.yml on hnwyllmm/docker-images

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pylibseekdb-1.3.0.dev2-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pylibseekdb-1.3.0.dev2-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c4329dcf1eeef6c8cb423497444a90100138b55d3ce7ea7208eccd3bc4713e7d
MD5 762a2e91ecb437311b964a8483c53284
BLAKE2b-256 d22b4a7844168fb8ca4f614d68865a9459d4e1c055e8727888da46580e58dedb

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylibseekdb-1.3.0.dev2-cp311-cp311-manylinux_2_28_aarch64.whl:

Publisher: release-seekdb-python.yml on hnwyllmm/docker-images

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pylibseekdb-1.3.0.dev2-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for pylibseekdb-1.3.0.dev2-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 537ade3a143dc73ba77ccf172ae514bcc2fca0d56666c766fcea624837841ca3
MD5 c37405244221141c4c3040ee6ed426ac
BLAKE2b-256 211656e6dcb49914edd7944f5f33d992dfa9485bb7ba7d0a02486c8216c18bbc

See more details on using hashes here.

File details

Details for the file pylibseekdb-1.3.0.dev2-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pylibseekdb-1.3.0.dev2-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 47769f4535134384454625ccd2453cd4f4c1ab94f698e04fabee82e710a873ef
MD5 a09906c29d2c892e016ce92caa028a35
BLAKE2b-256 bbeef96df309304ce1ae355f9fa3a2531650ffd513bf37ff97bf4713abd1eb33

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylibseekdb-1.3.0.dev2-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: release-seekdb-python.yml on hnwyllmm/docker-images

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pylibseekdb-1.3.0.dev2-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pylibseekdb-1.3.0.dev2-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6637bbe63bf6b8e5a10d6030907d825af3405f1a7195e0c610f7ba5f64a3f88a
MD5 48cbb5c5dbfdac4ab2a7c38759c7f618
BLAKE2b-256 d148465e71ae3782fd0f12b834a023dcfc25bc178075c933a70839be816a4cd4

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylibseekdb-1.3.0.dev2-cp310-cp310-manylinux_2_28_aarch64.whl:

Publisher: release-seekdb-python.yml on hnwyllmm/docker-images

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page