Skip to main content

Python plugin/extra to load data files from an external source (such as AWS S3) to a local directory

Project description

Data Loader Plugin - Python

Table of Content (ToC)

Created by gh-md-toc

Overview

The data loader plugin, aims at supporting running programs (e.g., API service backends) when downloading data from cloud services such as AWS S3. It provides a base Python library, namely data-loader-plugin, offering a few methods to download data files from AWS S3.

References

Python module

Installation

Clone this Git repository

$ mkdir -p ~/dev/infra && \
  git clone git@github.com:cloud-helpers/python-plugin-data-loader.git ~/dev/infra/python-plugin-data-loader
$ cd ~/dev/infra/python-plugin-data-loader

Python environment

  • If not already done so, install pyenv, Python 3.12, uv, ty and ruff
    • PyEnv:
$ brew instal pyenv
  • Python 3.12:
$ pyenv install 3.12.12 && pyenv local 3.12

Rust-enhanced Python utilities

  • All of those utilities are made in Rust and intend to improve the development life-cycle with Python
    • As they are Rust-based, they cannot be installed with pip, as for many other Python utilities
    • On most of Linux distributions and on MacOS, there are however native packages (e.g., installable with HomeBrew on MacOS)
  • If not already done so, install uv, ty and ruff
    • uv:
$ brew instal uv
  • ruff:
$ brew instal ruff
  • ty:
$ brew instal ty

Usage

Install the data-loader-plugin module

  • Just add data-loader-plugin in the dependencies section of the pyproject.toml project specification file
  • In the remainder of that Usage section, it will be assumed that the data-loader-plugin module has been installed and readily available from the environment, whether that environment is virtual or not. In other words, to adapt the documentation for the case where pipenv is used, just add pipenv run in front of every Python-related command.

Install in the Python user space

  • Install and use the data-loader-plugin module in the user space (with pip):
$ python -mpip uninstall data-loader-plugin
$ python -mpip install -U data-loader-plugin

Installation in a dedicated Python virtual environment

  • uv creates a Python virtual environment, located in .venv
  • Install and use the data-loader-plugin module in a virtual environment:
$ source .venv/bin/activate
(.venv)  python -mpip install -U data-loader-plugin
(.venv)  python -mpip install -U data-loader-plugin
(.venv)  deactivate

Use data-loader-plugin as a module from another Python program

  • Check the data file with the AWS command-line (CLI):
$ aws s3 ls --human s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv --no-sign-request
2021-10-29 20:44:34  249.3 MiB yellow_tripdata_2021-07.csv
  • Module import statements:
>>> import importlib
>>> from types import ModuleType
>>> from data_loader_plugin.base import DataLoaderBase
  • Create an instance of the DataLoaderBase Python class:
>>> plugin: ModuleType = importlib.import_module("data_loader_plugin.copyfile")
>>> data_loader: DataLoaderBase = plugin.DataLoader(
        local_path='/tmp/yellow_tripdata_2021-07.csv',
        external_url='s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv',
    )
>>> data_load_success, message = data_loader.load()

Development / Contribution

  • Build the source distribution and Python artifacts (wheels):
$ make clean
$ make init update
$ make build
  • Upload to Test PyPi (no Linux binary wheel can be uploaded on PyPi):
$ PYPIURL="https://test.pypi.org"
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://test.pypi.org/legacy/
Uploading data_loader_plugin-0.0.2.dev0-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.2.dev0.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]

View at:
https://test.pypi.org/project/data-loader-plugin/0.0.2.dev0/
  • Upload/release the Python packages onto the PyPi repository:
    • Register the authentication token for access to PyPi:
$ PYPIURL="https://upload.pypi.org"
$ pipenv run keyring set ${PYPIURL}/ __token__
Password for '__token__' in '${PYPIURL}/':
  • Register the authentication token for access to PyPi:
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://upload.pypi.org/legacy/
Uploading data_loader_plugin-0.0.2.dev0-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.2.dev0.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]

View at:
https://pypi.org/project/data-loader-plugin/0.0.2.dev0/
$ pipenv run python setup.py build_sphinx
running build_sphinx
Running Sphinx v4.3.0
[autosummary] generating autosummary for: README.md
myst v0.15.2: ..., words_per_minute=200)
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: [new config] 1 added, 0 changed, 0 removed
reading sources... [100%] README
...
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] README
...
build succeeded.

The HTML pages are in build/sphinx/html.

Test the data loader plugin Python module

  • Launch a simple test with pytest
$ make tests
=================== test session starts ==================
platform darwin -- Python 3.9.8, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: ~/dev/infra/python-plugin-data-loader
plugins: cov-3.0.0
collected 3 items

tests/test_copyfile.py .                             [ 33%]
tests/test_s3.py ..                                  [100%]
====================== 3 passed in 1.22s ==================

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_loader_plugin-0.0.2.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_loader_plugin-0.0.2-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file data_loader_plugin-0.0.2.tar.gz.

File metadata

  • Download URL: data_loader_plugin-0.0.2.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for data_loader_plugin-0.0.2.tar.gz
Algorithm Hash digest
SHA256 e580743a5b378ac4d556cadf63d5edc71d5390e58957797fed93d28a55a3a511
MD5 a8bc92e95b4c1f697fc70a17f732e6ad
BLAKE2b-256 6b28344ad69c121585e5d8209ca57dadc98dc6a750dbe029c2e6a7d98f02a5e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_loader_plugin-0.0.2.tar.gz:

Publisher: publish-pypi.yml on cloud-helpers/python-plugin-data-loader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file data_loader_plugin-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for data_loader_plugin-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5ecccdc993b0ea286904656e04a67ab43d992a2c431f42fc07dab6e20725b36a
MD5 120b9082356ad9720cbd06b68d8d13de
BLAKE2b-256 7cd0b3e304a98eba3735fc07d74e558ef69ffddfe0ae0da4724cba59b06bbd83

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_loader_plugin-0.0.2-py3-none-any.whl:

Publisher: publish-pypi.yml on cloud-helpers/python-plugin-data-loader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page