Skip to main content

OpenETL backend lib with connectors.

Project description

OpenETL Logo

By DataOmni Solutions

dashboard_new

OpenETL is a robust and scalable ETL (Extract, Transform, Load) application built with modern technologies like FastAPI, Next.js, and Apache Spark. The application offers an intuitive, user-friendly interface that simplifies the ETL process, empowering users to effortlessly extract data from various sources, apply transformations, and load it into your desired target destinations.

Key Features of OpenETL:

  • Backend: Powered by Python 3.12 and FastAPI, ensuring fast and efficient data processing and API interactions.
  • Frontend: Built with Next.js, providing a smooth and interactive user experience.
  • Compute Engine: Apache Spark is integrated for distributed data processing, enabling scalable and high-performance operations.
  • Task Execution: Utilizes Celery to handle background task processing, ensuring reliable execution of long-running operations.
  • Scheduling: APScheduler is used to manage and schedule ETL jobs, allowing for automated workflows.

Features

  • ETL with Full Load: Easily extract data from different sources and load it into your preferred target location.
  • Scheduled Timing: Schedule your ETL tasks to run at specific intervals, ensuring your data is always up-to-date.
  • User Interface: A clean and user-friendly UI to monitor and control your ETL processes with ease.
  • Logging: Comprehensive logging to track every action, error, and data transformation throughout the ETL pipeline.
  • Integration History: Keep track of all your integration jobs with detailed records of past runs, including statuses and errors.
  • Batch Processing: Handle large volumes of data by processing it in batches for better efficiency.
  • Distributed Spark Computing: Utilize Spark for distributed computing, allowing you to process large datasets efficiently across multiple nodes.

Benchmark

Check the detailed performance benchmark of OpenETL here.


Getting Started

To get started with OpenETL, follow these steps:

Environment Variables

OpenETL relies on a .env file for configuration. Ensure the following variables are defined in your local .env file, and update them according to your environment:

1. Generate and set encryption key (do this once):

from cryptography.fernet import Fernet
key = Fernet.generate_key()
print(key.decode())  # Save this in your .env file as DB_ENCRYPTION_KEY
OPENETL_DOCUMENT_HOST=postgres
OPENETL_DOCUMENT_DB=openetl_db
OPENETL_DOCUMENT_SCHEMA=public
OPENETL_DOCUMENT_USER=openetl
OPENETL_DOCUMENT_PASS=openetl123
OPENETL_DOCUMENT_PORT=5432
OPENETL_DOCUMENT_ENGINE=PostgreSQL
OPENETL_HOME=/app
CELERY_BROKER_URL=redis://redis:6379/0
SPARK_MASTER=spark://spark-master:7077
SPARK_DRIVER_HOST=openetl-celery-worker-1
DB_ENCRYPTION_KEY=

Using Docker

  1. Ensure that you have Docker installed on your local machine.
  2. Clone this repository to your local environment.
  3. Open a terminal or command prompt and navigate to the project directory.
  4. Build the backend image by running the following command: 4.1
    docker compose up --build -d backend
    
  5. Launch the Docker container:
    docker compose up --build -d
    
  6. Open your web browser and visit http://localhost:3001 to access the OpenETL application.

After running successfully, the API documentation can be found at http://localhost:5009/docs, and the UI can be accessed at http://localhost:3001.

Need More?

OpenETL is a free application that offers a range of powerful features. However, if you're looking for advanced capabilities, we also offer Pro and an Enterprise version with additional features and customizations.

Features Comparison

Feature Basic Version Pro Version Enterprise Version
Free Full Load ETL ✅ Available ✅ Available ✅ Available
Scheduled Timing ✅ Available ✅ Available ✅ Available
User Interface (UI) ✅ Available ✅ Available ✅ Available
Logging ✅ Available ✅ Available ✅ Available
Integration History ✅ Available ✅ Available ✅ Available
Batches ✅ Available ✅ Available ✅ Available
Distributed Spark Computing (Configurable) ✅ Available ✅ Available ✅ Available
NaN Value Replacement Based on Data Type ✅ Available ✅ Available ✅ Available
Views (ID mapping and data attachment) ❌ Not Available ✅ Available ✅ Available
Support ❌ Not Available ✅ Available ✅ Available
Dedicated Machine for Running the App ❌ Not Available ✅ Available ✅ Available
Custom Schema Declarations ❌ Not Available ✅ Available ✅ Available
Python-Based Transformations ❌ Not Available ✅ Available ✅ Available
Permission-Based Users ❌ Not Available ✅ Available ✅ Available
Dtype Casting ❌ Not Available ✅ Available ✅ Available
Custom Development ❌ Not Available ❌ Not Available ✅ Available

If the features in the base version of OpenETL aren't quite cutting it for you, fear not! We're here to help. If you require additional functionality, customizations, or have specific requirements, reach out to us.

For more information, visit dataomnisolutions.com or contact us at sales.team@dataomnisolutions.com.

Support and Feedback

If you encounter any issues or have suggestions for improving OpenETL, please don't hesitate to open an issue in the GitHub repository. We greatly appreciate your feedback and are dedicated to enhancing the application based on user input. You can read the proper way to report issues in the Security Section.

License

This project is licensed under the Apache 2.0 License.

Thank you for choosing OpenETL! We hope it simplifies your ETL tasks and provides a seamless experience.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openetl_sdk-0.0.4.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openetl_sdk-0.0.4-py3-none-any.whl (48.1 kB view details)

Uploaded Python 3

File details

Details for the file openetl_sdk-0.0.4.tar.gz.

File metadata

  • Download URL: openetl_sdk-0.0.4.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.11 Linux/6.11.0-1018-azure

File hashes

Hashes for openetl_sdk-0.0.4.tar.gz
Algorithm Hash digest
SHA256 b82eb3fc9f57a41167bf30a578652644613e5deda5ba4cdef10c0d5e6b2645d7
MD5 5f4c1bd3848725a1ad6cb3ba59a38c5f
BLAKE2b-256 065afee1d84b80d7b06b2199d1c51db275114c0f8b7519623912bf174a7596f6

See more details on using hashes here.

File details

Details for the file openetl_sdk-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: openetl_sdk-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 48.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.11 Linux/6.11.0-1018-azure

File hashes

Hashes for openetl_sdk-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b23c3a7e13a6a7ae0a9752d32456e83a7f0da4f2dc90b652c12cacd219f2882d
MD5 306cfe462ff0c49364cb5a53e01c3bb6
BLAKE2b-256 e032616a59235bb45b4791212b2915bc9664f7ba4c9ae97f0cce6e3f7e0b3a1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page