Skip to main content

eazyetl, an end-to-end ETL (Extract, Transform, Load) pipeline development package using pandas, requests, sqlalchemy, psycopg2-binary

Project description

eazyetl

Introduction

eazyetl is a lightweight, beginner-friendly, and modular Python package for building end-to-end ETL (Extract, Transform, Load) pipelines. It provides intuitive classes and methods for working with data from various sources like APIs, CSV/JSON files, and databases, and helps you clean, transform, and load that data with ease.

Installation

Install the package from TestPyPI:

pip install --index-url https://test.pypi.org/simple/ eazyetl

Features

  • 📦 Extract from CSV, JSON, APIs, and PostgreSQL

  • 🧹 Transform using common operations like dropna, replace, explode, to_datetime, and rename

  • 📂 Load into CSV, JSON, Excel, PostgreSQL databases

  • ☁️ Modular, static-method design (no complex setup required)

  • 🐍 Designed with Pandas and SQLAlchemy for powerful data handling

Usage

a. Import the eazyetl library

from eazyetl import Extract, Transform, Load

b. Extract data from various sources using the Extract() methods

NOTE: The Extract.read_db() will include a database URL parameter to connect to databases more seamlessly rather than entering credentials which is more tiring. This will be available in version 0.2.0

NOTE: Version 0.2.0 will also contain a Extract.read_bucket() method which will enable users to read data from Amazon Web Services (AWS) Simple Storage Services (S3) buckets.

df = Extract.read_csv("data/data.csv")
api_data = Extract.read_api(url= 'https://fantasypremierleague.com/users/data') # not a real URL
db_data = Extract.read_db(database='employees', user='postgres', password='postgressuperuser', host='localhost', port='5432')

c. Transform data

df = Transform.drop_na(df, columns=["name", "price"])
df = Transform.to_datetime(df, "release_date")
df = Transform.rename(df, columns={"old_name": "new_name"})

d. Load data

Load.load_csv(df, "cleaned_data.csv", overwrite=True)
Load.load_to_excel(df, 'weather_data.xlsx', overwrite=False)
Load.load_to_db(df, name="salaries", url="postgresql://user:pass@localhost:5432/mydb")

Documentation

1. Extract

Method Description
read_csv(filepath) Load data from CSV
read_json(filepath) Load data from JSON
read_api(url) Load JSON data from an API
read_db(database, url, username, password, query) Load data from PostgreSQL database

2. Transform

Method Description
drop_na(data, columns=None, drop='index', inplace=False, how='any') Drop missing values
replace(data, item_a, item_b, inplace=False) Replace values
explode(data, columns) Explode rows containing lists
changetype(data, dtype) Change column or Series data type
to_datetime(data, column) Convert column to datetime format
rename(data, columns=None, index=None, inplace=False) Rename columns or index

3. Load

Method Description
load_csv(data, filepath, overwrite=False) Save data to CSV
load_json(data, filepath, overwrite=False) Save data to JSON
load_to_excel(data, filepath, overwrite=False) Save data to Excel (requires openpyxl)
load_to_db(data, name, url) Save data to PostgreSQL table

Requirements.

These will be automatically installed by running the pip install eazyetl command.

  • Python 3.7+

  • pandas

  • requests

  • sqlalchemy

  • psycopg2-binary

  • openpyxl (for Excel file export)

Author

Name: Denzel 'deecodes' Kinyua

Data Engineer

GitHub: https://github.com/dkkinyua

Portfolio: https://denzel-kinyua.vercel.app

Email: denzelkinyua11@gmail.com

License

This project is licensed under the MIT License.

Contributions

Pull requests are welcome! If you'd like to suggest a feature or report a bug, open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eazyetl-0.1.4.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eazyetl-0.1.4-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file eazyetl-0.1.4.tar.gz.

File metadata

  • Download URL: eazyetl-0.1.4.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for eazyetl-0.1.4.tar.gz
Algorithm Hash digest
SHA256 72c6e49a2e2188b2c65b0a9537ba5713c392194253c523c2b966221222a7e9c7
MD5 219a8a28d3bd030ee73054401497756a
BLAKE2b-256 30e28346ed3dff31cc779ed4a9543d9e45a8820ca0d8cc9c15713fec46922135

See more details on using hashes here.

File details

Details for the file eazyetl-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: eazyetl-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for eazyetl-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 df667ab0fee7ff9f43baf4f63adf0ca00206f6a2da6315017bff5abad027f707
MD5 4bf6b62bed3672a0cdb22f1e03740179
BLAKE2b-256 bc573a51c07a6e7df370d0f6be8c17e1eb11818826b5c77c70b8546935d561d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page