Skip to main content

OpenETL backend lib with connectors.

Project description

OpenETL Logo

By DataOmni Solutions

dashboard_new

OpenETL is a robust and scalable ETL (Extract, Transform, Load) application built with modern technologies like FastAPI, Next.js, and Apache Spark. The application offers an intuitive, user-friendly interface that simplifies the ETL process, empowering users to effortlessly extract data from various sources, apply transformations, and load it into your desired target destinations.

Key Features of OpenETL:

  • Backend: Powered by Python 3.12 and FastAPI, ensuring fast and efficient data processing and API interactions.
  • Frontend: Built with Next.js, providing a smooth and interactive user experience.
  • Compute Engine: Apache Spark is integrated for distributed data processing, enabling scalable and high-performance operations.
  • Task Execution: Utilizes Celery to handle background task processing, ensuring reliable execution of long-running operations.
  • Scheduling: APScheduler is used to manage and schedule ETL jobs, allowing for automated workflows.

Features

  • ETL with Full Load: Easily extract data from different sources and load it into your preferred target location.
  • Scheduled Timing: Schedule your ETL tasks to run at specific intervals, ensuring your data is always up-to-date.
  • User Interface: A clean and user-friendly UI to monitor and control your ETL processes with ease.
  • Logging: Comprehensive logging to track every action, error, and data transformation throughout the ETL pipeline.
  • Integration History: Keep track of all your integration jobs with detailed records of past runs, including statuses and errors.
  • Batch Processing: Handle large volumes of data by processing it in batches for better efficiency.
  • Distributed Spark Computing: Utilize Spark for distributed computing, allowing you to process large datasets efficiently across multiple nodes.

Benchmark

Check the detailed performance benchmark of OpenETL here.


Getting Started

To get started with OpenETL, follow these steps:

Environment Variables

OpenETL relies on a .env file for configuration. Ensure the following variables are defined in your local .env file, and update them according to your environment:

1. Generate and set encryption key (do this once):

from cryptography.fernet import Fernet
key = Fernet.generate_key()
print(key.decode())  # Save this in your .env file as DB_ENCRYPTION_KEY
OPENETL_DOCUMENT_HOST=postgres
OPENETL_DOCUMENT_DB=openetl_db
OPENETL_DOCUMENT_SCHEMA=public
OPENETL_DOCUMENT_USER=openetl
OPENETL_DOCUMENT_PASS=openetl123
OPENETL_DOCUMENT_PORT=5432
OPENETL_DOCUMENT_ENGINE=PostgreSQL
OPENETL_HOME=/app
CELERY_BROKER_URL=redis://redis:6379/0
SPARK_MASTER=spark://spark-master:7077
SPARK_DRIVER_HOST=openetl-celery-worker-1
DB_ENCRYPTION_KEY=

Using Docker

  1. Ensure that you have Docker installed on your local machine.
  2. Clone this repository to your local environment.
  3. Open a terminal or command prompt and navigate to the project directory.
  4. Build the backend image by running the following command: 4.1
    docker compose up --build -d backend
    
  5. Launch the Docker container:
    docker compose up --build -d
    
  6. Open your web browser and visit http://localhost:3001 to access the OpenETL application.

After running successfully, the API documentation can be found at http://localhost:5009/docs, and the UI can be accessed at http://localhost:3001.

Need More?

OpenETL is a free application that offers a range of powerful features. However, if you're looking for advanced capabilities, we also offer Pro and an Enterprise version with additional features and customizations.

Features Comparison

Feature Basic Version Pro Version Enterprise Version
Free Full Load ETL ✅ Available ✅ Available ✅ Available
Scheduled Timing ✅ Available ✅ Available ✅ Available
User Interface (UI) ✅ Available ✅ Available ✅ Available
Logging ✅ Available ✅ Available ✅ Available
Integration History ✅ Available ✅ Available ✅ Available
Batches ✅ Available ✅ Available ✅ Available
Distributed Spark Computing (Configurable) ✅ Available ✅ Available ✅ Available
NaN Value Replacement Based on Data Type ✅ Available ✅ Available ✅ Available
Views (ID mapping and data attachment) ❌ Not Available ✅ Available ✅ Available
Support ❌ Not Available ✅ Available ✅ Available
Dedicated Machine for Running the App ❌ Not Available ✅ Available ✅ Available
Custom Schema Declarations ❌ Not Available ✅ Available ✅ Available
Python-Based Transformations ❌ Not Available ✅ Available ✅ Available
Permission-Based Users ❌ Not Available ✅ Available ✅ Available
Dtype Casting ❌ Not Available ✅ Available ✅ Available
Custom Development ❌ Not Available ❌ Not Available ✅ Available

If the features in the base version of OpenETL aren't quite cutting it for you, fear not! We're here to help. If you require additional functionality, customizations, or have specific requirements, reach out to us.

For more information, visit dataomnisolutions.com or contact us at sales.team@dataomnisolutions.com.

Support and Feedback

If you encounter any issues or have suggestions for improving OpenETL, please don't hesitate to open an issue in the GitHub repository. We greatly appreciate your feedback and are dedicated to enhancing the application based on user input. You can read the proper way to report issues in the Security Section.

License

This project is licensed under the Apache 2.0 License.

Thank you for choosing OpenETL! We hope it simplifies your ETL tasks and provides a seamless experience.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openetl_sdk-0.0.3.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openetl_sdk-0.0.3-py3-none-any.whl (48.1 kB view details)

Uploaded Python 3

File details

Details for the file openetl_sdk-0.0.3.tar.gz.

File metadata

  • Download URL: openetl_sdk-0.0.3.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.11 Linux/6.11.0-1018-azure

File hashes

Hashes for openetl_sdk-0.0.3.tar.gz
Algorithm Hash digest
SHA256 46eb302ba4d1734cdd3445f485cb230575f6d3b9cbeb61e37c7ad0184d004c29
MD5 e7b54982ef3f69bb5cbc3d17c7ebd240
BLAKE2b-256 495bb9ebcd600045b83fa3ac3cdff461dff0331f52d46c0529e7613575b63906

See more details on using hashes here.

File details

Details for the file openetl_sdk-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: openetl_sdk-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 48.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.12.11 Linux/6.11.0-1018-azure

File hashes

Hashes for openetl_sdk-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ea63a0532cd4a4ec1de4f66161cfb2d75509f5d4904b5b1a3c45c873fdd1c1e6
MD5 b65b3b62584f4b6df4de68946ab53503
BLAKE2b-256 51484fd01eebcb3d3d68394901ecc1b30576f68e9a0025dc43c0133bd200e9fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page