An end-to-end ETL package for extracting HTML tables and transforming and loading data.
Project description
web_etl
web_etl is an end-to-end ETL (Extract, Transform, Load) Python package designed for extracting tables from HTML web pages, transforming the data, and loading it into various destinations such as CSV, Excel, or PostgreSQL databases.
Features
- Extract: Retrieve tables from HTML web pages using BeautifulSoup and pandas.
- Transform: Clean and manipulate your data with functions for dropping columns, renaming columns, converting data types, handling missing values, and more.
- Load: Save your transformed data to CSV, Excel, or directly to a PostgreSQL database.
Installation
Install from PyPI:
pip install web_etl
Or install from source:
git clone https://github.com/yourusername/web_etl.git
cd web_etl
pip install .
Requirements
- pandas
- requests
- beautifulsoup4
- sqlalchemy
Usage
from web_etl.extract import Extract
from web_etl.transform import Transformer
from web_etl.load import Load
# Extract table from HTML page
df = Extract.from_html_table("https://example.com/table.html")
# Transform data
df = Transformer.drop_columns(df, columns_to_drop=['LOW RANGE', 'HIGH RANGE'])
df = Transformer.rename_columns(df, {'old_name': 'new_name'})
df = Transformer.to_lowercase(df)
# Load data to CSV
Load.to_csv(df, "output.csv")
# Load data to PostgreSQL
Load.to_postgres(df, "table_name", "postgresql://user:password@host:port/dbname")
License
MIT
Author
Rose Wabere
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file web_etl-0.1.1.tar.gz.
File metadata
- Download URL: web_etl-0.1.1.tar.gz
- Upload date:
- Size: 3.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d0757f49197d1cb362905340c127501a5544277aeb907063dac13946401ba35
|
|
| MD5 |
8ba04dca82adaa61499b189de7ac86e8
|
|
| BLAKE2b-256 |
1c05271b76478b130898fb3ee3a9ca00a89908d5158c7f2925069ed2ee2a6990
|
File details
Details for the file web_etl-0.1.1-py3-none-any.whl.
File metadata
- Download URL: web_etl-0.1.1-py3-none-any.whl
- Upload date:
- Size: 3.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c75e0a9a7c15a8a5732a1e1eaf23590331890a2d36c96bf372a8cbf656b21ec
|
|
| MD5 |
acc845862a57db6b1b035716a8c8cc3f
|
|
| BLAKE2b-256 |
a4385cdcd4dc3e5072e16b70a4694fedb02c4c2d7129e39c6f2669164999daf9
|