Skip to main content

Unified data ingestion framework to move datasets from SFTP/Oracle/Postgres/Snowflake to S3.

Project description

📦 extract-load-s3

A simple, extensible Python utility to extract data from different sources and load the original files into Amazon S3 It can be used in different data pipeline orchestration tool like airflow.

Pypi repo link

https://pypi.org/project/extract-loads3/

Currently supported:

  • SFTP → S3 (fully functional)

More data sources (Postgres, Oracle, Snowflake, etc.) will be added soon.


🚀 Installation

pip install extract-loads3

once installed run

extract-load-s3 \
    --flow sftp_to_s3 \
    --file_name "/path/to/file.zip" \
    --s3_bucket raw \
    --ssh_host 192.168.1.15 \
    --ssh_user your_username \
    --ssh_password your_password

You do not need to pass AWS credentials or endpoint URL unless: you want to override environment/IAM role credentials you’re using LocalStack or MinIO

Argument Required? Description
--flow Yes Which ingestion flow to run (sftp_to_s3, more coming soon)
--file_name Yes for SFTP Remote SFTP file path
--s3_bucket Yes Destination S3 bucket
--s3_key No Custom S3 key / prefix; if omitted, timestamp is appended
--ssh_host Required for SFTP SFTP server host
--ssh_user Required for SFTP SFTP username
--ssh_password Required for SFTP SFTP password
--aws_access_key_id No AWS key; if omitted, boto3 uses IAM role / env vars
--aws_secret_access_key No AWS secret
--aws_endpoint_url No Custom S3 endpoint (LocalStack, MinIO, custom S3 gateways)
--db_conn_str No Future DB connection string

1. SFTP → S3

This flow:

Connects to an SFTP server
Streams the remote file
Uploads the file to S3 using multipart upload
Validates file integrity via SHA256 checksum

--flow sftp_to_s3

to use with localstacl

--aws_endpoint_url http://localhost:4566 \
--aws_access_key_id test \
--aws_secret_access_key test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract_load_s3-1.0.9.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

extract_load_s3-1.0.9-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file extract_load_s3-1.0.9.tar.gz.

File metadata

  • Download URL: extract_load_s3-1.0.9.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for extract_load_s3-1.0.9.tar.gz
Algorithm Hash digest
SHA256 cd3e9936117f426845fc6e00e854c6c62e81f08fbff951fd9851fe286e47b5bd
MD5 07fc87e9009379aad821a16843c0ac54
BLAKE2b-256 1009df18991fc904f5161e5b351daa5dec9c982d27b2ca8515874c5528568bae

See more details on using hashes here.

File details

Details for the file extract_load_s3-1.0.9-py3-none-any.whl.

File metadata

File hashes

Hashes for extract_load_s3-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 072a557f4f477ab400e02aead2a75c7fd11c4180e3a31eea1757f2e4cd0fc0ea
MD5 04463d6378853ea1ef1dde1a757d45ae
BLAKE2b-256 e86a0d7a26de55a882cc6d42175641e4dd861b581995057f84212a0031e419c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page