OpenETL backend lib with connectors.
Project description
OpenETL is a robust and scalable ETL (Extract, Transform, Load) application built with modern technologies like FastAPI, Next.js, and Apache Spark. The application offers an intuitive, user-friendly interface that simplifies the ETL process, empowering users to effortlessly extract data from various sources, apply transformations, and load it into your desired target destinations.
Key Features of OpenETL:
- Backend: Powered by Python 3.12 and FastAPI, ensuring fast and efficient data processing and API interactions.
- Frontend: Built with Next.js, providing a smooth and interactive user experience.
- Compute Engine: Apache Spark is integrated for distributed data processing, enabling scalable and high-performance operations.
- Task Execution: Utilizes Celery to handle background task processing, ensuring reliable execution of long-running operations.
- Scheduling: APScheduler is used to manage and schedule ETL jobs, allowing for automated workflows.
Features
- ETL with Full Load: Easily extract data from different sources and load it into your preferred target location.
- Scheduled Timing: Schedule your ETL tasks to run at specific intervals, ensuring your data is always up-to-date.
- User Interface: A clean and user-friendly UI to monitor and control your ETL processes with ease.
- Logging: Comprehensive logging to track every action, error, and data transformation throughout the ETL pipeline.
- Integration History: Keep track of all your integration jobs with detailed records of past runs, including statuses and errors.
- Batch Processing: Handle large volumes of data by processing it in batches for better efficiency.
- Distributed Spark Computing: Utilize Spark for distributed computing, allowing you to process large datasets efficiently across multiple nodes.
Benchmark
Check the detailed performance benchmark of OpenETL here.
Getting Started
To get started with OpenETL, follow these steps:
Environment Variables
OpenETL relies on a .env file for configuration. Ensure the following variables are defined in your local .env file,
and update them according to your environment:
1. Generate and set encryption key (do this once):
from cryptography.fernet import Fernet
key = Fernet.generate_key()
print(key.decode()) # Save this in your .env file as DB_ENCRYPTION_KEY
OPENETL_DOCUMENT_HOST=postgres
OPENETL_DOCUMENT_DB=openetl_db
OPENETL_DOCUMENT_SCHEMA=public
OPENETL_DOCUMENT_USER=openetl
OPENETL_DOCUMENT_PASS=openetl123
OPENETL_DOCUMENT_PORT=5432
OPENETL_DOCUMENT_ENGINE=PostgreSQL
OPENETL_HOME=/app
CELERY_BROKER_URL=redis://redis:6379/0
SPARK_MASTER=spark://spark-master:7077
SPARK_DRIVER_HOST=openetl-celery-worker-1
DB_ENCRYPTION_KEY=
Using Docker
- Ensure that you have Docker installed on your local machine.
- Clone this repository to your local environment.
- Open a terminal or command prompt and navigate to the project directory.
- Build the
backendimage by running the following command: 4.1docker compose up --build -d backend
- Launch the Docker container:
docker compose up --build -d
- Open your web browser and visit
http://localhost:3001to access the OpenETL application.
After running successfully, the API documentation can be found at http://localhost:5009/docs, and the UI can be accessed at http://localhost:3001.
Need More?
OpenETL is a free application that offers a range of powerful features. However, if you're looking for advanced capabilities, we also offer Pro and an Enterprise version with additional features and customizations.
Features Comparison
| Feature | Basic Version | Pro Version | Enterprise Version |
|---|---|---|---|
| Free Full Load ETL | ✅ Available | ✅ Available | ✅ Available |
| Scheduled Timing | ✅ Available | ✅ Available | ✅ Available |
| User Interface (UI) | ✅ Available | ✅ Available | ✅ Available |
| Logging | ✅ Available | ✅ Available | ✅ Available |
| Integration History | ✅ Available | ✅ Available | ✅ Available |
| Batches | ✅ Available | ✅ Available | ✅ Available |
| Distributed Spark Computing (Configurable) | ✅ Available | ✅ Available | ✅ Available |
| NaN Value Replacement Based on Data Type | ✅ Available | ✅ Available | ✅ Available |
| Views (ID mapping and data attachment) | ❌ Not Available | ✅ Available | ✅ Available |
| Support | ❌ Not Available | ✅ Available | ✅ Available |
| Dedicated Machine for Running the App | ❌ Not Available | ✅ Available | ✅ Available |
| Custom Schema Declarations | ❌ Not Available | ✅ Available | ✅ Available |
| Python-Based Transformations | ❌ Not Available | ✅ Available | ✅ Available |
| Permission-Based Users | ❌ Not Available | ✅ Available | ✅ Available |
| Dtype Casting | ❌ Not Available | ✅ Available | ✅ Available |
| Custom Development | ❌ Not Available | ❌ Not Available | ✅ Available |
If the features in the base version of OpenETL aren't quite cutting it for you, fear not! We're here to help. If you require additional functionality, customizations, or have specific requirements, reach out to us.
For more information, visit dataomnisolutions.com or contact us at sales.team@dataomnisolutions.com.
Support and Feedback
If you encounter any issues or have suggestions for improving OpenETL, please don't hesitate to open an issue in the GitHub repository. We greatly appreciate your feedback and are dedicated to enhancing the application based on user input. You can read the proper way to report issues in the Security Section.
License
This project is licensed under the Apache 2.0 License.
Thank you for choosing OpenETL! We hope it simplifies your ETL tasks and provides a seamless experience.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openetl_sdk-0.0.1.tar.gz.
File metadata
- Download URL: openetl_sdk-0.0.1.tar.gz
- Upload date:
- Size: 38.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.12.11 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50286b6f27069f321f83032817fbcf6da34872763658f821f22924519672f962
|
|
| MD5 |
882c9dec3248b870e532a031c0efa711
|
|
| BLAKE2b-256 |
142c5bbfefaedf35dd761777439f590790e9a0391372858135cf4b95822d4f11
|
File details
Details for the file openetl_sdk-0.0.1-py3-none-any.whl.
File metadata
- Download URL: openetl_sdk-0.0.1-py3-none-any.whl
- Upload date:
- Size: 48.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.12.11 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1baf0061fb6b511bd1809d7e817e686802740d5e89097a0d3a5749775c1934ad
|
|
| MD5 |
48255e41a0d765f76abf89218f62d0ab
|
|
| BLAKE2b-256 |
7b997f316e0f679f616b32c2a215ffdd479f72fd50f4d0c019ac4931aa5fee9b
|