Skip to main content

An integration package connecting MariaDB and LangChain

Project description

langchain-mariadb

CI License: MIT

Released under the MIT license, LangChain's MariaDB integration (langchain-mariadb) provides vector capabilities for working with MariaDB version 11.7.1 and above. Users can use the provided implementations as-is or customize them for specific needs. Key features include:

  • Built-in vector similarity search
  • Support for cosine and euclidean distance metrics
  • Robust metadata filtering options
  • Performance optimization through connection pooling
  • Configurable table and column settings

Getting Started

Setting Up MariaDB

Launch a MariaDB Docker container with:

docker run --name mariadb-container -e MARIADB_ROOT_PASSWORD=langchain -e MARIADB_DATABASE=langchain -p 3306:3306 -d mariadb:11.7

Installing the Package

The package uses SQLAlchemy but works best with the MariaDB connector, which requires C/C++ components:

# Debian, Ubuntu
sudo apt install libmariadb3 libmariadb-dev

# CentOS, RHEL, Rocky Linux
sudo yum install MariaDB-shared MariaDB-devel

# Install Python connector
pip install --quiet -U mariadb

Then install langchain-mariadb package

pip install -U langchain-mariadb

VectorStore works along with an LLM model, here using langchain-openai as example.

pip install langchain-openai
export OPENAI_API_KEY=...

Creating a Vector Store

from langchain_openai import OpenAIEmbeddings
from langchain_mariadb import MariaDBStore
from langchain_core.documents import Document

# connection string
url = f"mariadb+mariadbconnector://myuser:mypassword@localhost/langchain"

# Initialize vector store
vectorstore = MariaDBStore(
    embeddings=OpenAIEmbeddings(),
    embedding_length=1536,
    datasource=url,
    collection_name="my_docs"
)

Adding Data

You can add data as documents with metadata:

# adding documents
docs = [
    Document(page_content='there are cats in the pond', metadata={"id": 1, "location": "pond", "topic": "animals"}),
    Document(page_content='ducks are also found in the pond', metadata={"id": 2, "location": "pond", "topic": "animals"}),
    # More documents...
]
vectorstore.add_documents(docs)

Or as plain text with optional metadata:

texts = ['a sculpture exhibit is also at the museum', 'a new coffee shop opened on Main Street',]
metadatas = [
    {"id": 6, "location": "museum", "topic": "art"},
    {"id": 7, "location": "Main Street", "topic": "food"},
]

vectorstore.add_texts(texts=texts, metadatas=metadatas)

Searching

# Basic similarity search
results = vectorstore.similarity_search("Hello", k=2)

# Search with metadata filtering
results = vectorstore.similarity_search(
    "Hello",
    filter={"category": "greeting"}
)

Filter Options

The system supports various filtering operations on metadata:

  • Equality: $eq
  • Inequality: $ne
  • Comparisons: $lt, $lte, $gt, $gte
  • List operations: $in, $nin
  • Text matching: $like, $nlike
  • Logical operations: $and, $or, $not

Example:

# Search with simple filter
results = vectorstore.similarity_search('kitty', k=10, filter={
    'id': {'$in': [1, 5, 2, 9]}
})

# Search with multiple conditions (AND)
results = vectorstore.similarity_search('ducks', k=10, filter={
    'id': {'$in': [1, 5, 2, 9]},
    'location': {'$in': ["pond", "market"]}
})

Configuration Options

The MariaDBStore can be configured with various options to customize its behavior. Here are all available options:

Basic Configuration

Parameter Type Default Description
embeddings Embeddings Required The embeddings model to use for creating vector embeddings
embedding_length int 1536 Length of the embedding vectors
datasource Union[Engine, str] Required Database connection string or SQLAlchemy engine
collection_name str "langchain" Name of the collection to store vectors
collection_metadata Optional[dict] None Optional metadata for the collection
distance_strategy DistanceStrategy COSINE Strategy for computing distances (COSINE or EUCLIDEAN)
logger Optional[logging.Logger] None Optional logger instance for debugging
relevance_score_fn Optional[Callable] None Optional function to override relevance score calculation
engine_args Optional[dict] None Additional arguments passed to SQLAlchemy engine creation
lazy_init bool False Whether to delay table creation until first use

Table and Column Configuration

You can customize table and column names using the MariaDBStoreSettings class:

from langchain_mariadb import MariaDBStoreSettings, TableConfig, ColumnConfig

config = MariaDBStoreSettings(
    tables=TableConfig(
        embedding_table="custom_embeddings",  # Default: "langchain_embedding"
        collection_table="custom_collections"  # Default: "langchain_collection"
    ),
    columns=ColumnConfig(
        # Embedding table columns
        embedding_id="doc_id",        # Default: "id"
        embedding="vector",           # Default: "embedding"
        content="text_content",       # Default: "content"
        metadata="doc_metadata",      # Default: "metadata"
        
        # Collection table columns
        collection_id="coll_id",      # Default: "id"
        collection_label="name",      # Default: "label"
        collection_metadata="meta"    # Default: "metadata"
    ),
    pre_delete_collection=False       # Whether to delete existing collection
)

vectorstore = MariaDBStore(
    embeddings=embeddings,
    datasource=url,
    config=config
)

Search Options

When performing searches, you can use these additional parameters:

Parameter Type Default Description
k int 4 Number of results to return
fetch_k int 20 Number of documents to fetch before selecting top-k (for MMR search)
lambda_mult float 0.5 Balance between relevance and diversity for MMR search (0-1)
filter Optional[dict] None Optional metadata filter
score_threshold Optional[float] None Optional minimum score threshold for results

Distance Strategies

The vector store supports two distance strategies:

  • DistanceStrategy.COSINE (default): Uses cosine similarity
  • DistanceStrategy.EUCLIDEAN: Uses Euclidean distance
from langchain_mariadb import DistanceStrategy

vectorstore = MariaDBStore(
    embeddings=embeddings,
    datasource=url,
    distance_strategy=DistanceStrategy.EUCLIDEAN
)

Chat Message History

The package also provides a way to store chat message history in MariaDB:

import uuid
from langchain_core.messages import SystemMessage, AIMessage, HumanMessage
from langchain_mariadb import MariaDBChatMessageHistory

# Set up database connection
url = f"mariadb+mariadbconnector://myuser:mypassword@localhost/chatdb"

# Create table (one-time setup)
table_name = "chat_history"
MariaDBChatMessageHistory.create_tables(url, table_name)

# Initialize chat history manager
chat_history = MariaDBChatMessageHistory(
    table_name,
    str(uuid.uuid4()), # session_id
    datasource=pool
)

# Add messages to the chat history
chat_history.add_messages([
    SystemMessage(content="Meow"),
    AIMessage(content="woof"),
    HumanMessage(content="bark"),
])

print(chat_history.messages)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_mariadb-0.0.20.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_mariadb-0.0.20-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file langchain_mariadb-0.0.20.tar.gz.

File metadata

  • Download URL: langchain_mariadb-0.0.20.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langchain_mariadb-0.0.20.tar.gz
Algorithm Hash digest
SHA256 70a7965dd9a7384d2ee08516a10d7b0939cfc6373d97992b5efa809073f26fab
MD5 ae8f81ac2ea36952a902b38a539be7e2
BLAKE2b-256 78528286575afe96cd87f22dc05dd8471054bd4885a77b58f5d1a4b4818860b4

See more details on using hashes here.

File details

Details for the file langchain_mariadb-0.0.20-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_mariadb-0.0.20-py3-none-any.whl
Algorithm Hash digest
SHA256 62157e0ae5a92ed5abd4eb3faadc6b1fcc439aa899efa3f246ba4e862f71f838
MD5 b6f831ba7a955c4a4e9882ba543179c3
BLAKE2b-256 a4fd962dbe0c3ab5e7dbd9e98a65b74c30d39352b8f1adac0f6be842ea77aec9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page