A package to fetch, store, and process documents using MinIO and Weaviate.
Project description
Hydrate-Minio-Weaviate
Hydrate-Minio-Weaviate is a powerful Python package designed to automate the extraction, transformation, and loading of data from web resources directly into MinIO and Weaviate. This tool simplifies the process of hydrating your data lake and knowledge graph with fresh data, enhancing your AI and machine learning workflows with minimal effort.
Features
- Automated Data Extraction: Fetch data seamlessly from specified URLs.
- Data Transformation: Process and clean the fetched data to ensure quality before storage.
- Seamless Integration: Store transformed data directly into MinIO buckets and index it within Weaviate for immediate usage in applications.
- Configurable: Flexible configuration options to cater to different environments and use cases.
- Logging and Monitoring: Comprehensive logging to track data processing and facilitate troubleshooting.
Getting Started
These instructions will get you a copy of the project up and running on your local machine or production environment for development and testing purposes.
Prerequisites
What you need to install the software:
- Python 3.8 or later
- MinIO server (local or remote)
- Weaviate instance
Installation
Install hydrate-minio-weaviate
using pip:
pip install hydrate-minio-weaviate
Configuration
To configure the system, edit the config.py
file or pass parameters directly into the function calls. Detailed documentation on configuration parameters is available here.
Environment Variables
To run the hydrate
package successfully, you need to configure several environment variables. These variables can be set in your local development environment or configured in CI/CD pipelines for automation.
Setting up Environment Variables Locally
For local development, use a .env
file to manage your environment settings securely. Here's how to set it up:
Create a .env
file in your project root (the same directory as your hydrate.py
script):
MINIO_ACCESS_KEY=your_minio_access_key
MINIO_SECRET_KEY=your_minio_secret_key
WEAVIATE_ENDPOINT=your_weaviate_endpoint
Install python-dotenv
to easily load the variables from the .env
file:
pip install python-dotenv
Load the variables in your script:
from dotenv import load_dotenv
import os
load_dotenv() # Load the variables from .env
# Your configuration class or setup
class ClientConfig(BaseModel):
minio_endpoint: str = os.getenv('MINIO_ENDPOINT', 'default_endpoint')
minio_access_key: str = os.getenv('MINIO_ACCESS_KEY', 'default_access_key')
minio_secret_key: str = os.getenv('MINIO_SECRET_KEY', 'default_secret_key')
weaviate_endpoint: str = os.getenv('WEAVIATE_ENDPOINT', 'default_endpoint')
Usage
Here is a quick start to using hydrate-minio-weaviate
:
from hydrate_minio_weaviate import main
# Define the URLs and bucket name
urls = ["https://example.com", "https://another-example.com"]
bucket_name = "your-minio-bucket"
# Call the main function
main(urls, bucket_name)
For detailed usage and more examples, refer to the Documentation.
Configuring Environment Variables in GitHub Actions
For projects using GitHub Actions for CI/CD, configure your secrets in the GitHub repository to keep them secure:
- Navigate to your GitHub repository Settings.
- Go to Secrets and create new repository secrets for
MINIO_ACCESS_KEY
,MINIO_SECRET_KEY
, andWEAVIATE_ENDPOINT
. - Use these secrets in your GitHub Actions workflow:
jobs:
build-and-publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.8'
- name: Load environment variables
run: |
echo "MINIO_ACCESS_KEY=${{ secrets.MINIO_ACCESS_KEY }}" >> $GITHUB_ENV
echo "MINIO_SECRET_KEY=${{ secrets.MINIO_SECRET_KEY }}" >> $GITHUB_ENV
echo "WEAVIATE_ENDPOINT=${{ secrets.WEAVIATE_ENDPOINT }}" >> $GITHUB_ENV
- name: Run script
run: python hydrate/hydrate.py
Best Practices
- Security: Avoid hardcoding your sensitive keys directly in the code. Always use environment variables or secure secrets management practices.
- Documentation: Ensure that any environment configurations are well-documented to facilitate easy setup for new users or contributors to your project.
By following these instructions, users can configure the hydrate
package correctly in any environment.
Contributing
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE
for more information.
Support
If you need assistance or have any queries, please email us at support@example.com.
Acknowledgments
- Thanks to the MinIO team for the robust storage solution.
- Appreciation to Weaviate for their innovative approach to knowledge graph management.
- All contributors who have been part of this project.
Todo!
Roadmap
- Future development plans and feature additions can be found on the issues page.
Notes:
- Documentation Link: Replace
#
with the actual link to your documentation, which might be on GitHub pages or another site. - Issues Page: Link to the GitHub issues page for your project to show the roadmap and current issues.
This template provides a solid base for your README, making your GitHub repository professional and informative for potential users and contributors. Adjust it as necessary to fit the specific aspects of your project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hydrate-minio-weaviate-0.1.0.tar.gz
.
File metadata
- Download URL: hydrate-minio-weaviate-0.1.0.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bbc951245e9ef21cfc2a8991f57667631c0f30f366426da176aa07f309f2f37f |
|
MD5 | bb7c729ff6949dc40d5c068233b4b91a |
|
BLAKE2b-256 | 7f1a581897ed837ed2d29840aad8e2fe28797c757238bd247afdabf21ead4e1e |
File details
Details for the file hydrate_minio_weaviate-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: hydrate_minio_weaviate-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 152821fa16e9785b33d5420993ee44ea4e513b23bd152e5f369bb42e2a0d9608 |
|
MD5 | 1349d6e5f91ef5102a12ea0bea9379b4 |
|
BLAKE2b-256 | adbce0f0c74a101623d6c798fcfa37e48b7fa17f85a66c82ed677534f96455f1 |