eazyetl, an end-to-end ETL (Extract, Transform, Load) pipeline development package using pandas, requests, sqlalchemy, psycopg2-binary
Project description
eazyetl
Introduction
eazyetl is a lightweight, beginner-friendly, and modular Python package for building end-to-end ETL (Extract, Transform, Load) pipelines. It provides intuitive classes and methods for working with data from various sources like APIs, CSV/JSON files, and databases, and helps you clean, transform, and load that data with ease.
Key changes.
In version 0.1.6, two new methods have been added to the Extract() class.
Extract.read_db_via_urlandExtract.read_db_via_params
Extract.read_db_via_url reads a SQL database e.g. MySQL or PostgreSQL via a database URI. This is easier and cleaner to use as compared to Extract.read_db_via_params which only reads a PostgreSQL database and entering those database values seems quite cumbersome for the user, but it is still included for users who might prefer this to using URIs.
i. Extract.read_db_via_url parameters:
urldatatype, str. The database URI, required parametertable_namedatatype, str. The table name, required parametercolumnsdatatype, list. A list of columns from the tableparse_datesdatatype, list or dict. This parameter tells our function which columns to parse as datetime objects.
ii. Extract.read_db_via_params parameters:
databasedatatype str. The database namequerydatatype str. The SQL query to be executed when running the method. Example:'SELECT * FROM employees'userdatatype str. The user's usernamepassworddatatype str. The user's passwordhostdatatype str. The database's host, default localhostportdatatype str. The database's port number, default 5432
Installation
Install the package from TestPyPI:
pip install eazyetl
Features
-
📦 Extract from CSV, JSON, APIs, and PostgreSQL
-
🧹 Transform using common operations like dropna, replace, explode, to_datetime, and rename
-
📂 Load into CSV, JSON, Excel, PostgreSQL databases
-
☁️ Modular, static-method design (no complex setup required)
-
🐍 Designed with Pandas and SQLAlchemy for powerful data handling
Usage
a. Import the eazyetl library
from eazyetl import Extract, Transform, Load
b. Extract data from various sources using the Extract() methods
NOTE: The Extract.read_db() will include a database URL parameter to connect to databases more seamlessly rather than entering credentials which is more tiring. This will be available in version 0.2.0
NOTE: Version 0.2.0 will also contain a Extract.read_bucket() method which will enable users to read data from Amazon Web Services (AWS) Simple Storage Services (S3) buckets.
df = Extract.read_csv("data/data.csv")
api_data = Extract.read_api(url= 'https://fantasypremierleague.com/users/data') # not a real URL
db_data = Extract.read_db(database='employees', user='postgres', password='postgressuperuser', host='localhost', port='5432')
c. Transform data
df = Transform.drop_na(df, columns=["name", "price"])
df = Transform.to_datetime(df, "release_date")
df = Transform.rename(df, columns={"old_name": "new_name"})
d. Load data
Load.load_csv(df, "cleaned_data.csv", overwrite=True)
Load.load_to_excel(df, 'weather_data.xlsx', overwrite=False)
Load.load_to_db(df, name="salaries", url="postgresql://user:pass@localhost:5432/mydb")
Documentation
1. Extract
| Method | Description |
|---|---|
read_csv(filepath) |
Load data from CSV |
read_json(filepath) |
Load data from JSON |
read_api(url) |
Load JSON data from an API |
read_db_via_params(query, user, password, host, port, database) |
Load data from PostgreSQL database |
read_db_via_url(url, table_name, columns=None, parse_dates=None) |
Load data from database URI |
2. Transform
| Method | Description |
|---|---|
drop_na(data, columns=None, drop='index', inplace=False, how='any') |
Drop missing values |
replace(data, item_a, item_b, inplace=False) |
Replace values |
explode(data, columns) |
Explode rows containing lists |
changetype(data, dtype) |
Change column or Series data type |
to_datetime(data, column) |
Convert column to datetime format |
rename(data, columns=None, index=None, inplace=False) |
Rename columns or index |
3. Load
| Method | Description |
|---|---|
load_csv(data, filepath, overwrite=False) |
Save data to CSV |
load_json(data, filepath, overwrite=False) |
Save data to JSON |
load_to_excel(data, filepath, overwrite=False) |
Save data to Excel (requires openpyxl) |
load_to_db(data, name, url) |
Save data to PostgreSQL table |
Requirements.
These will be automatically installed by running the pip install eazyetl command.
-
Python 3.7+
-
pandas
-
requests
-
sqlalchemy
-
psycopg2-binary
-
openpyxl (for Excel file export)
Author
Name: Denzel 'deecodes' Kinyua
Data Engineer
GitHub: https://github.com/dkkinyua
Portfolio: https://denzel-kinyua.vercel.app
Email: denzelkinyua11@gmail.com
License
This project is licensed under the MIT License.
Contributions
Pull requests are welcome! If you'd like to suggest a feature or report a bug, open an issue on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eazyetl-0.1.6.tar.gz.
File metadata
- Download URL: eazyetl-0.1.6.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c77ddadb934da1a22c06330eed37d8ac4aebdf765782f3ee40d177d015937c89
|
|
| MD5 |
550c5bd69215e1584d7b58937880c4d8
|
|
| BLAKE2b-256 |
418c4c202544b5d2607ed6bb3339e2ae9b7c13969e7ad48e53e14ceeb7ca5ecc
|
File details
Details for the file eazyetl-0.1.6-py3-none-any.whl.
File metadata
- Download URL: eazyetl-0.1.6-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
425f11f70ada8da909a90bf6c79577c05299a77b2fd1122697a7c8cf768b04c3
|
|
| MD5 |
4108a52ce52ec39a402014d29acaf89d
|
|
| BLAKE2b-256 |
97ca87a3295768721077430e11abbb10cfc2d5dbce9694df83b967e440473b61
|