Skip to main content

eazyetl, an end-to-end ETL (Extract, Transform, Load) pipeline development package using pandas, requests, sqlalchemy, psycopg2-binary

Project description

eazyetl

Introduction

eazyetl is a lightweight, beginner-friendly, and modular Python package for building end-to-end ETL (Extract, Transform, Load) pipelines. It provides intuitive classes and methods for working with data from various sources like APIs, CSV/JSON files, and databases, and helps you clean, transform, and load that data with ease.

Installation

Install the package from TestPyPI:

pip install eazyetl

Features

  • 📦 Extract from CSV, JSON, APIs, and PostgreSQL

  • 🧹 Transform using common operations like dropna, replace, explode, to_datetime, and rename

  • 📂 Load into CSV, JSON, Excel, PostgreSQL databases

  • ☁️ Modular, static-method design (no complex setup required)

  • 🐍 Designed with Pandas and SQLAlchemy for powerful data handling

Usage

a. Import the eazyetl library

from eazyetl import Extract, Transform, Load

b. Extract data from various sources using the Extract() methods

NOTE: The Extract.read_db() will include a database URL parameter to connect to databases more seamlessly rather than entering credentials which is more tiring. This will be available in version 0.2.0

NOTE: Version 0.2.0 will also contain a Extract.read_bucket() method which will enable users to read data from Amazon Web Services (AWS) Simple Storage Services (S3) buckets.

df = Extract.read_csv("data/data.csv")
api_data = Extract.read_api(url= 'https://fantasypremierleague.com/users/data') # not a real URL
db_data = Extract.read_db(database='employees', user='postgres', password='postgressuperuser', host='localhost', port='5432')

c. Transform data

df = Transform.drop_na(df, columns=["name", "price"])
df = Transform.to_datetime(df, "release_date")
df = Transform.rename(df, columns={"old_name": "new_name"})

d. Load data

Load.load_csv(df, "cleaned_data.csv", overwrite=True)
Load.load_to_excel(df, 'weather_data.xlsx', overwrite=False)
Load.load_to_db(df, name="salaries", url="postgresql://user:pass@localhost:5432/mydb")

Documentation

1. Extract

Method Description
read_csv(filepath) Load data from CSV
read_json(filepath) Load data from JSON
read_api(url) Load JSON data from an API
read_db(database, url, username, password, query) Load data from PostgreSQL database

2. Transform

Method Description
drop_na(data, columns=None, drop='index', inplace=False, how='any') Drop missing values
replace(data, item_a, item_b, inplace=False) Replace values
explode(data, columns) Explode rows containing lists
changetype(data, dtype) Change column or Series data type
to_datetime(data, column) Convert column to datetime format
rename(data, columns=None, index=None, inplace=False) Rename columns or index

3. Load

Method Description
load_csv(data, filepath, overwrite=False) Save data to CSV
load_json(data, filepath, overwrite=False) Save data to JSON
load_to_excel(data, filepath, overwrite=False) Save data to Excel (requires openpyxl)
load_to_db(data, name, url) Save data to PostgreSQL table

Requirements.

These will be automatically installed by running the pip install eazyetl command.

  • Python 3.7+

  • pandas

  • requests

  • sqlalchemy

  • psycopg2-binary

  • openpyxl (for Excel file export)

Author

Name: Denzel 'deecodes' Kinyua

Data Engineer

GitHub: https://github.com/dkkinyua

Portfolio: https://denzel-kinyua.vercel.app

Email: denzelkinyua11@gmail.com

License

This project is licensed under the MIT License.

Contributions

Pull requests are welcome! If you'd like to suggest a feature or report a bug, open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eazyetl-0.1.5.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eazyetl-0.1.5-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file eazyetl-0.1.5.tar.gz.

File metadata

  • Download URL: eazyetl-0.1.5.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for eazyetl-0.1.5.tar.gz
Algorithm Hash digest
SHA256 af29d5d37145c51c1cc9e4bfc465b40a219c4c843cb972b3ed8f1a0ed2487598
MD5 9cba5adec91bf09d6d26b5402f7e2600
BLAKE2b-256 ce1e35ff0dc775de989a26315609ac2899783db99c12b79ed022522da8543403

See more details on using hashes here.

File details

Details for the file eazyetl-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: eazyetl-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for eazyetl-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 be017e555d9a7d867f03e7fcc952d94f44d0c79ced495aa974eda0a7f675c9ed
MD5 a3d4e1ea9c42a8e0f8574704a7e78703
BLAKE2b-256 4d8833dcb26354004c22c0f95988174b155b0906c24fe3aa54581d1f12252b67

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page