Skip to main content

Unified data ingestion framework to move datasets from SFTP/Oracle/Postgres/Snowflake to S3.

Project description

📦 extract-load-s3

A simple, extensible Python utility to extract data from different sources and load the original files into Amazon S3 It can be used in different data pipeline orchestration tool like airflow.

Pypi repo link

https://pypi.org/project/extract-load-s3/

Currently supported:

  • SFTP → S3 (fully functional)

More data sources (Postgres, Oracle, Snowflake, etc.) will be added soon.


🚀 Installation

pip install extract-load-s3

once installed run

extract-load-s3 \
    --flow sftp_to_s3 \
    --file_name "/path/to/file.zip" \
    --s3_bucket raw \
    --ssh_host 192.168.1.15 \
    --ssh_user your_username \
    --ssh_password your_password

You do not need to pass AWS credentials or endpoint URL unless: you want to override environment/IAM role credentials you’re using LocalStack or MinIO

Argument Required? Description
--flow Yes Which ingestion flow to run (sftp_to_s3, more coming soon)
--file_name Yes for SFTP Remote SFTP file path
--s3_bucket Yes Destination S3 bucket
--s3_key No Custom S3 key / prefix; if omitted, timestamp is appended
--ssh_host Required for SFTP SFTP server host
--ssh_user Required for SFTP SFTP username
--ssh_password Required for SFTP SFTP password
--aws_access_key_id No AWS key; if omitted, boto3 uses IAM role / env vars
--aws_secret_access_key No AWS secret
--aws_endpoint_url No Custom S3 endpoint (LocalStack, MinIO, custom S3 gateways)
--db_conn_str No Future DB connection string

1. SFTP → S3

This flow:

Connects to an SFTP server
Streams the remote file
Uploads the file to S3 using multipart upload
Validates file integrity via SHA256 checksum

--flow sftp_to_s3

to use with localstacl

--aws_endpoint_url http://localhost:4566 \
--aws_access_key_id test \
--aws_secret_access_key test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract_load_s3-1.0.10.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

extract_load_s3-1.0.10-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file extract_load_s3-1.0.10.tar.gz.

File metadata

  • Download URL: extract_load_s3-1.0.10.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for extract_load_s3-1.0.10.tar.gz
Algorithm Hash digest
SHA256 ca1939ec40ae3a6138f57eca53acb964b75d2d2d8cf1c780effbb48d5df4ac0c
MD5 775674304241dcdaf6417d50ef0b3585
BLAKE2b-256 056ab2c6bbbe4ec0be56cfc7283580c7f42a5464581ec4f326f42b52e78fbb05

See more details on using hashes here.

File details

Details for the file extract_load_s3-1.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for extract_load_s3-1.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 cadbb3a196d317823c5ec0dc5115f227a064e047f8803de355af62be44732175
MD5 e0c70cd76a372dff7e42fca2084759c2
BLAKE2b-256 ec569990b813197ea4b9c36ad53beff4a26fb6ae096c90bd62b7a0d647978c44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page