Skip to main content

An end-to-end ETL package for extracting HTML tables and transforming and loading data.

Project description

web_etl

web_etl is an end-to-end ETL (Extract, Transform, Load) Python package designed for extracting tables from HTML web pages, transforming the data, and loading it into various destinations such as CSV, Excel, or PostgreSQL databases.


Features

  • Extract: Retrieve tables from HTML web pages using BeautifulSoup and pandas.
  • Transform: Clean and manipulate your data with functions for dropping columns, renaming columns, converting data types, handling missing values, and more.
  • Load: Save your transformed data to CSV, Excel, or directly to a PostgreSQL database.

Installation

Install from PyPI:

pip install web_etl

Or install from source:

git clone https://github.com/yourusername/web_etl.git
cd web_etl
pip install .

Requirements

  • pandas
  • requests
  • beautifulsoup4
  • sqlalchemy

Usage

from web_etl.extract import Extract
from web_etl.transform import Transformer
from web_etl.load import Load

# Extract table from HTML page
df = Extract.from_html_table("https://example.com/table.html")

# Transform data
df = Transformer.drop_columns(df, columns_to_drop=['LOW RANGE', 'HIGH RANGE'])
df = Transformer.rename_columns(df, {'old_name': 'new_name'})
df = Transformer.to_lowercase(df)

# Load data to CSV
Load.to_csv(df, "output.csv")

# Load data to PostgreSQL
Load.to_postgres(df, "table_name", "postgresql://user:password@host:port/dbname")

License

MIT


Author

Rose Wabere

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

web_etl-0.1.1.tar.gz (3.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

web_etl-0.1.1-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file web_etl-0.1.1.tar.gz.

File metadata

  • Download URL: web_etl-0.1.1.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for web_etl-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7d0757f49197d1cb362905340c127501a5544277aeb907063dac13946401ba35
MD5 8ba04dca82adaa61499b189de7ac86e8
BLAKE2b-256 1c05271b76478b130898fb3ee3a9ca00a89908d5158c7f2925069ed2ee2a6990

See more details on using hashes here.

File details

Details for the file web_etl-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: web_etl-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for web_etl-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9c75e0a9a7c15a8a5732a1e1eaf23590331890a2d36c96bf372a8cbf656b21ec
MD5 acc845862a57db6b1b035716a8c8cc3f
BLAKE2b-256 a4385cdcd4dc3e5072e16b70a4694fedb02c4c2d7129e39c6f2669164999daf9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page