Skip to main content

Python ETL and Data Pipeline Tool

Project description

Overview

Petaly is an open-source ETL/ELT (Extract, Load, "Transform") tool, created by and for data professionals! Our mission is to simplify data movement across different platforms with a tool that truly understands the needs of the data community.

Key Features

  • Multiple Data Sources: Support for various endpoints:

    • PostgreSQL
    • MySQL
    • BigQuery
    • Redshift
    • Google Cloud Storage (GCS Bucket)
    • S3 Bucket
    • Local CSV files
  • Features:

    • Source to target schema evaluation and mapping
    • CSV file load with column-type recognition
    • Target table structure generation
    • Configurable type mapping between different databases
    • Full table unload/load in CSV format
  • User-Friendly: No programming knowledge required

  • YAML/JSON Configuration: Easy pipeline setup

  • Cloud Ready: Full support for AWS and GCP

[EXPERIMENTAL]:

Petaly went agentic!
The AI Agent can create and run pipeline using natural language prompts.
If you're interested in exploring, check out the experimental branch: petaly-ai-agent

Feedback is welcome!

Quick Start

  1. Installation
  2. Configuration
  3. Create Pipeline
  4. Run Pipeline

Requirements

System Requirements

  • Python 3.10 - 3.12
  • Operating System:
    • Linux
    • MacOS

Note: Petaly may work on other operating systems and Python versions, but these haven't been tested yet.

Installation

Basic Installation

# Create and activate virtual environment
mkdir petaly
cd petaly
python3 -m venv .venv
source .venv/bin/activate

# Install Petaly
python3 -m pip install petaly

Cloud Provider Support

GCP Support

# Install with GCP support
python3 -m pip install petaly[gcp]

Prerequisites:

  1. Install Google Cloud SDK
  2. Configure access to your Google Project
  3. Set up service account authentication

AWS Support

# Install with AWS support
python3 -m pip install petaly[aws]

Prerequisites:

  1. Install AWS CLI
  2. Configure AWS credentials

Full Installation

# Install all features including AWS, GCP
python3 -m pip install petaly[all]

From Source

# Clone the repository
git clone https://github.com/petaly-labs/petaly.git
cd petaly

# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install development dependencies
pip3 install -r requirements.txt

# Install in editable mode (recommended)
pip install -e .

# Alternative: Add src to PYTHONPATH
export PYTHONPATH=$PYTHONPATH:$(pwd)/src

Configuration

1. Initialize Configuration

# Create petaly.ini in default location (~/.petaly/petaly.ini)
python3 -m petaly init

# Or specify custom location
python3 -m petaly -c /absolute-path-to-your-config-dir/petaly.ini init

2. Set Environment Variable (Optional)

# Set the environment variable if the folder differs from the default location
export PETALY_CONFIG_DIR=/absolute-path-to-your-config-dir

# Alternative run command using the main config parameter: -c /absolute-path-to-your-config-dir/petaly.ini
python3 -m petaly -c /absolute-path-to-your-config-dir/petaly.ini [command]

3. Initialize Workspace

  1. Configure petaly.ini:
[workspace_config]
pipeline_dir_path=/home/user/petaly/pipelines
logs_dir_path=/home/user/petaly/logs
output_dir_path=/home/user/petaly/output

[global_settings]
logging_mode=INFO
pipeline_format=yaml
  1. Create workspace:
python3 -m petaly init --workspace

Create Pipeline

Initialize a new pipeline:

python3 -m petaly init -p my_pipeline

Follow the wizard to configure your pipeline. For detailed configuration options, see Pipeline Configuration Guide.

Run Pipeline

Execute your pipeline:

python3 -m petaly run -p my_pipeline

Run Specific Operations

# Extract data from source only
python3 -m petaly run -p my_pipeline --source_only

# Load data to target only
python3 -m petaly run -p my_pipeline --target_only

# Run specific objects
python3 -m petaly run -p my_pipeline -o object1,object2

Tutorial: CSV to PostgreSQL

Prerequisites

  • Petaly installed and workspace initialized
  • PostgreSQL server running

Steps

  1. Initialize Pipeline
python3 -m petaly init -p csv_to_postgres
  1. Download Test Data
# Download and extract test files
gunzip options.csv.gz
gunzip stocks.csv.gz
  1. Configure Pipeline
  • Use csv as source
  • Use postgres as target
  • Configure database connection details
  1. Run Pipeline
python3 -m petaly run -p csv_to_postgres

Example Configuration

pipeline:
  pipeline_attributes:
    pipeline_name: csv_to_postgres
    is_enabled: true
  source_attributes:
    connector_type: csv
  target_attributes:
    connector_type: postgres
    database_user: root
    database_password: db-password
    database_host: localhost
    database_port: 5432
    database_name: petalydb
    database_schema: petaly_tutorial
  data_attributes:
    use_data_objects_spec: only
    object_default_settings:
      header: true
      columns_delimiter: ","
      columns_quote: none

Documentation

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

Petaly is licensed under the Apache License 2.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

petaly-0.1.0.tar.gz (75.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

petaly-0.1.0-py3-none-any.whl (120.5 kB view details)

Uploaded Python 3

File details

Details for the file petaly-0.1.0.tar.gz.

File metadata

  • Download URL: petaly-0.1.0.tar.gz
  • Upload date:
  • Size: 75.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for petaly-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a62a464bf7a69ff63cb6ed00b8ade418f5bb903a484f382c811a1a179a18100c
MD5 412dbbda18217548319da1dfeade8e7c
BLAKE2b-256 4de6974dc6897d6b8171cea5bd596732ad9c963d3a90b056b80dfc861fbb5476

See more details on using hashes here.

File details

Details for the file petaly-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: petaly-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 120.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for petaly-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6998579e6171ace4b5ecc6e1ab818e929501729c9c9755dcb26d1bb76b2e12d1
MD5 2da4c0390e62baf85bc4ed9ebd68ef74
BLAKE2b-256 d3296df5b4722489a6abc5240022f41c5bb9e975ac1a54f153e79a654a42282c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page