An integration package connecting LakeFS and LangChain
Project description
langchain-lakefs
This package provides a LangChain integration with lakeFS, allowing you to load documents from lakeFS repositories into your LangChain workflows.
Features
- Load documents from lakeFS repositories using the official lakeFS Python SDK
- Support for user metadata retrieval
- Configurable repository, reference, and path specifications
- Integration with LangChain's document loading infrastructure
Installation
pip install -U langchain-lakefs
Configuration
You can configure the LakeFSLoader in three ways:
1. Direct Initialization
Provide the access key, secret key, and endpoint during initialization:
from langchain_lakefs.document_loaders import LakeFSLoader
lakefs_loader = LakeFSLoader(
lakefs_access_key='your_access_key',
lakefs_secret_key='your_secret_key',
lakefs_endpoint='https://path-to.lakefs.com',
repo='your_repo',
ref='main',
path='path/to/files'
)
2. Configuration File
The package will automatically read credentials from the ~/.lakectl.yaml file if available.
3. Environment Variables
Set the following environment variables to configure the loader:
export LAKECTL_CREDENTIALS_ACCESS_KEY_ID='your_access_key'
export LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY='your_secret_key'
export LAKECTL_SERVER_ENDPOINT_URL='https://path-to.lakefs.com'
Usage
Document Loader
The LakeFSLoader class allows you to load documents from lakeFS. You need to specify:
- The repository (
repo) - The reference (
ref) - branch, commit or tag - The path to the files you want to load
If you would like to load the metadata of the files, you can set the user_metadata parameter to True:
from langchain_lakefs.document_loaders import LakeFSLoader
# Initialize the loader
lakefs_loader = LakeFSLoader(
lakefs_access_key='your_access_key',
lakefs_secret_key='your_secret_key',
lakefs_endpoint='https://path-to.lakefs.com',
repo='your_repo',
ref='main',
path='path/to/files',
user_metadata=True
)
# Load documents from lakeFS
documents = lakefs_loader.load()
# Process the documents
for doc in documents:
print(f"Content: {doc.page_content}")
print(f"Metadata: {doc.metadata}")
Modifying Loader Settings
You can modify the loader settings after initialization:
# Change the repository
lakefs_loader.set_repo("another-repo")
# Change the reference (branch or commit)
lakefs_loader.set_ref("feature-branch")
# Change the path
lakefs_loader.set_path("another/path")
# Toggle user metadata retrieval
lakefs_loader.set_user_metadata(True)
Examples
Loading Documents from a Specific Path
from langchain_lakefs.document_loaders import LakeFSLoader
loader = LakeFSLoader(
lakefs_endpoint="https://example.my-lakefs.com",
lakefs_access_key="your-access-key",
lakefs_secret_key="your-secret-key",
repo="my-repo",
ref="main",
path="data/documents"
)
documents = loader.load()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_lakefs-0.1.1.tar.gz.
File metadata
- Download URL: langchain_lakefs-0.1.1.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29e50b6364aa5a976da583a1f049eb6c8dd4980055aac0a509c9bc88424ce658
|
|
| MD5 |
8031b2917339a72bb8d2bedc858b2a66
|
|
| BLAKE2b-256 |
35df30b28aaa5ca514ba6db6a1e787981c55691e8d56045f53dfb93685c401fe
|
Provenance
The following attestation bundles were made for langchain_lakefs-0.1.1.tar.gz:
Publisher:
publish-to-pypi.yaml on treeverse/langchain-lakefs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_lakefs-0.1.1.tar.gz -
Subject digest:
29e50b6364aa5a976da583a1f049eb6c8dd4980055aac0a509c9bc88424ce658 - Sigstore transparency entry: 191039815
- Sigstore integration time:
-
Permalink:
treeverse/langchain-lakefs@89f26dab606e29ee2df1afa1d5dda03c8f852e2e -
Branch / Tag:
refs/heads/fix/release-workflow - Owner: https://github.com/treeverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yaml@89f26dab606e29ee2df1afa1d5dda03c8f852e2e -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file langchain_lakefs-0.1.1-py3-none-any.whl.
File metadata
- Download URL: langchain_lakefs-0.1.1-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b49aaae5ab0402622d51e30445744dff3df0710527c135b92bea82e9072e27ce
|
|
| MD5 |
f2fc2ba5e8766d92d3bb781785580b1d
|
|
| BLAKE2b-256 |
ffbbdffb24ee1e0262e88b81b27cf8f7fd00059772790925d23b952c5902a862
|
Provenance
The following attestation bundles were made for langchain_lakefs-0.1.1-py3-none-any.whl:
Publisher:
publish-to-pypi.yaml on treeverse/langchain-lakefs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_lakefs-0.1.1-py3-none-any.whl -
Subject digest:
b49aaae5ab0402622d51e30445744dff3df0710527c135b92bea82e9072e27ce - Sigstore transparency entry: 191039821
- Sigstore integration time:
-
Permalink:
treeverse/langchain-lakefs@89f26dab606e29ee2df1afa1d5dda03c8f852e2e -
Branch / Tag:
refs/heads/fix/release-workflow - Owner: https://github.com/treeverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yaml@89f26dab606e29ee2df1afa1d5dda03c8f852e2e -
Trigger Event:
workflow_dispatch
-
Statement type: