An integration package connecting MariaDB and LangChain
Project description
langchain-mariadb
Released under the MIT license, LangChain's MariaDB integration (langchain-mariadb) provides vector capabilities for working with MariaDB version 11.7.1 and above. Users can use the provided implementations as-is or customize them for specific needs.
Key features include:
- Built-in vector similarity search
- Support for cosine and euclidean distance metrics
- Robust metadata filtering options
- Performance optimization through connection pooling
- Configurable table and column settings
Getting Started
Setting Up MariaDB
Launch a MariaDB Docker container with:
docker run --name mariadb-container -e MARIADB_ROOT_PASSWORD=langchain -e MARIADB_DATABASE=langchain -p 3306:3306 -d mariadb:11.7
Installing the Package
The package uses SQLAlchemy but works best with the MariaDB connector, which requires C/C++ components:
# Debian, Ubuntu
sudo apt install libmariadb3 libmariadb-dev
# CentOS, RHEL, Rocky Linux
sudo yum install MariaDB-shared MariaDB-devel
# Install Python connector
pip install --quiet -U mariadb
Then install langchain-mariadb package
pip install -U langchain-mariadb
VectorStore works along with an LLM model, here using langchain-openai as example.
pip install langchain-openai
export OPENAI_API_KEY=...
Creating a Vector Store
from langchain_openai import OpenAIEmbeddings
from langchain_mariadb import MariaDBStore
from langchain_core.documents import Document
# connection string
url = f"mariadb+mariadbconnector://myuser:mypassword@localhost/langchain"
# Initialize vector store
vectorstore = MariaDBStore(
embeddings=OpenAIEmbeddings(),
embedding_length=1536,
datasource=url,
collection_name="my_docs"
)
Adding Data
You can add data as documents with metadata:
# adding documents
docs = [
Document(page_content='there are cats in the pond', metadata={"id": 1, "location": "pond", "topic": "animals"}),
Document(page_content='ducks are also found in the pond', metadata={"id": 2, "location": "pond", "topic": "animals"}),
# More documents...
]
vectorstore.add_documents(docs)
Or as plain text with optional metadata:
texts = ['a sculpture exhibit is also at the museum', 'a new coffee shop opened on Main Street',]
metadatas = [
{"id": 6, "location": "museum", "topic": "art"},
{"id": 7, "location": "Main Street", "topic": "food"},
]
vectorstore.add_texts(texts=texts, metadatas=metadatas)
Searching
# Basic similarity search
results = vectorstore.similarity_search("Hello", k=2)
# Search with metadata filtering
results = vectorstore.similarity_search(
"Hello",
filter={"category": "greeting"}
)
Filter Options
The system supports various filtering operations on metadata:
- Equality: $eq
- Inequality: $ne
- Comparisons: $lt, $lte, $gt, $gte
- List operations: $in, $nin
- Text matching: $like, $nlike
- Logical operations: $and, $or, $not
Example:
# Search with simple filter
results = vectorstore.similarity_search('kitty', k=10, filter={
'id': {'$in': [1, 5, 2, 9]}
})
# Search with multiple conditions (AND)
results = vectorstore.similarity_search('ducks', k=10, filter={
'id': {'$in': [1, 5, 2, 9]},
'location': {'$in': ["pond", "market"]}
})
Configuration Options
The MariaDBStore can be configured with various options to customize its behavior. Here are all available options:
Basic Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
embeddings |
Embeddings | Required | The embeddings model to use for creating vector embeddings |
embedding_length |
int | 1536 | Length of the embedding vectors |
datasource |
Union[Engine, str] | Required | Database connection string or SQLAlchemy engine |
collection_name |
str | "langchain" | Name of the collection to store vectors |
collection_metadata |
Optional[dict] | None | Optional metadata for the collection |
distance_strategy |
DistanceStrategy | COSINE | Strategy for computing distances (COSINE or EUCLIDEAN) |
logger |
Optional[logging.Logger] | None | Optional logger instance for debugging |
relevance_score_fn |
Optional[Callable] | None | Optional function to override relevance score calculation |
engine_args |
Optional[dict] | None | Additional arguments passed to SQLAlchemy engine creation |
lazy_init |
bool | False | Whether to delay table creation until first use |
Table and Column Configuration
You can customize table and column names using the MariaDBStoreSettings class:
from langchain_mariadb import MariaDBStoreSettings, TableConfig, ColumnConfig
config = MariaDBStoreSettings(
tables=TableConfig(
embedding_table="custom_embeddings", # Default: "langchain_embedding"
collection_table="custom_collections" # Default: "langchain_collection"
),
columns=ColumnConfig(
# Embedding table columns
embedding_id="doc_id", # Default: "id"
embedding="vector", # Default: "embedding"
content="text_content", # Default: "content"
metadata="doc_metadata", # Default: "metadata"
# Collection table columns
collection_id="coll_id", # Default: "id"
collection_label="name", # Default: "label"
collection_metadata="meta" # Default: "metadata"
),
pre_delete_collection=False # Whether to delete existing collection
)
vectorstore = MariaDBStore(
embeddings=embeddings,
datasource=url,
config=config
)
Search Options
When performing searches, you can use these additional parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
k |
int | 4 | Number of results to return |
fetch_k |
int | 20 | Number of documents to fetch before selecting top-k (for MMR search) |
lambda_mult |
float | 0.5 | Balance between relevance and diversity for MMR search (0-1) |
filter |
Optional[dict] | None | Optional metadata filter |
score_threshold |
Optional[float] | None | Optional minimum score threshold for results |
Distance Strategies
The vector store supports two distance strategies:
DistanceStrategy.COSINE(default): Uses cosine similarityDistanceStrategy.EUCLIDEAN: Uses Euclidean distance
from langchain_mariadb import DistanceStrategy
vectorstore = MariaDBStore(
embeddings=embeddings,
datasource=url,
distance_strategy=DistanceStrategy.EUCLIDEAN
)
Chat Message History
The package also provides a way to store chat message history in MariaDB:
import uuid
from langchain_core.messages import SystemMessage, AIMessage, HumanMessage
from langchain_mariadb import MariaDBChatMessageHistory
# Set up database connection
url = f"mariadb+mariadbconnector://myuser:mypassword@localhost/chatdb"
# Create table (one-time setup)
table_name = "chat_history"
MariaDBChatMessageHistory.create_tables(url, table_name)
# Initialize chat history manager
chat_history = MariaDBChatMessageHistory(
table_name,
str(uuid.uuid4()), # session_id
datasource=pool
)
# Add messages to the chat history
chat_history.add_messages([
SystemMessage(content="Meow"),
AIMessage(content="woof"),
HumanMessage(content="bark"),
])
print(chat_history.messages)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_mariadb-0.0.20.tar.gz.
File metadata
- Download URL: langchain_mariadb-0.0.20.tar.gz
- Upload date:
- Size: 24.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70a7965dd9a7384d2ee08516a10d7b0939cfc6373d97992b5efa809073f26fab
|
|
| MD5 |
ae8f81ac2ea36952a902b38a539be7e2
|
|
| BLAKE2b-256 |
78528286575afe96cd87f22dc05dd8471054bd4885a77b58f5d1a4b4818860b4
|
File details
Details for the file langchain_mariadb-0.0.20-py3-none-any.whl.
File metadata
- Download URL: langchain_mariadb-0.0.20-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62157e0ae5a92ed5abd4eb3faadc6b1fcc439aa899efa3f246ba4e862f71f838
|
|
| MD5 |
b6f831ba7a955c4a4e9882ba543179c3
|
|
| BLAKE2b-256 |
a4fd962dbe0c3ab5e7dbd9e98a65b74c30d39352b8f1adac0f6be842ea77aec9
|