Python plugin/extra to load data files from an external source (such as AWS S3) to a local directory
Project description
Data Loader Plugin - Python
Table of Content (ToC)
Created by gh-md-toc
Overview
The
data loader plugin,
aims at supporting running programs (e.g., API service backends) when
downloading data from cloud services such as
AWS S3. It provides a base Python library,
namely data-loader-plugin,
offering a few methods to download data files from AWS S3.
References
Python module
-
GitHub: https://github.com/cloud-helpers/python-plugin-data-loader/tree/main/src/data_loader_plugin
-
Read the Docs (RTD): https://readthedocs.org/projects/data-loader-plugin/
Installation
Clone this Git repository
$ mkdir -p ~/dev/infra && \
git clone git@github.com:cloud-helpers/python-plugin-data-loader.git ~/dev/infra/python-plugin-data-loader
$ cd ~/dev/infra/python-plugin-data-loader
Python environment
- If not already done so, install
pyenv, Python 3.12,uv,tyandruff- PyEnv:
$ brew instal pyenv
- Python 3.12:
$ pyenv install 3.12.12 && pyenv local 3.12
Rust-enhanced Python utilities
- All of those utilities are made in Rust and intend to improve
the development life-cycle with Python
- As they are Rust-based, they cannot be installed with
pip, as for many other Python utilities - On most of Linux distributions and on MacOS, there are however native packages (e.g., installable with HomeBrew on MacOS)
- As they are Rust-based, they cannot be installed with
- If not already done so, install
uv,tyandruff- uv:
$ brew instal uv
- ruff:
$ brew instal ruff
- ty:
$ brew instal ty
Usage
Install the data-loader-plugin module
- Just add
data-loader-pluginin thedependenciessection of thepyproject.tomlproject specification file- Example of
pyproject.tomlspecification file for this project - uv will then install it in the virtual environment as needed (e.g.,
with the
uv lockanduv synccommands)
- Example of
- In the remainder of that Usage section, it will be assumed
that the
data-loader-pluginmodule has been installed and readily available from the environment, whether that environment is virtual or not. In other words, to adapt the documentation for the case wherepipenvis used, just addpipenv runin front of every Python-related command.
Install in the Python user space
- Install and use the
data-loader-pluginmodule in the user space (withpip):
$ python -mpip uninstall data-loader-plugin
$ python -mpip install -U data-loader-plugin
Installation in a dedicated Python virtual environment
- uv creates a Python virtual environment, located in
.venv - Install and use the
data-loader-pluginmodule in a virtual environment:
$ source .venv/bin/activate
(.venv) ✔ python -mpip install -U data-loader-plugin
(.venv) ✔ python -mpip install -U data-loader-plugin
(.venv) ✔ deactivate
Use data-loader-plugin as a module from another Python program
- Check the data file with the AWS command-line (CLI):
$ aws s3 ls --human s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv --no-sign-request
2021-10-29 20:44:34 249.3 MiB yellow_tripdata_2021-07.csv
- Module import statements:
>>> import importlib
>>> from types import ModuleType
>>> from data_loader_plugin.base import DataLoaderBase
- Create an instance of the DataLoaderBase Python class:
>>> plugin: ModuleType = importlib.import_module("data_loader_plugin.copyfile")
>>> data_loader: DataLoaderBase = plugin.DataLoader(
local_path='/tmp/yellow_tripdata_2021-07.csv',
external_url='s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv',
)
>>> data_load_success, message = data_loader.load()
Development / Contribution
- Build the source distribution and Python artifacts (wheels):
$ make clean
$ make init update
$ make build
- Upload to Test PyPi (no Linux binary wheel can be uploaded on PyPi):
$ PYPIURL="https://test.pypi.org"
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://test.pypi.org/legacy/
Uploading data_loader_plugin-0.0.2.dev0-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.2.dev0.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]
View at:
https://test.pypi.org/project/data-loader-plugin/0.0.2.dev0/
- Upload/release the Python packages onto the
PyPi repository:
- Register the authentication token for access to PyPi:
$ PYPIURL="https://upload.pypi.org"
$ pipenv run keyring set ${PYPIURL}/ __token__
Password for '__token__' in '${PYPIURL}/':
- Register the authentication token for access to PyPi:
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://upload.pypi.org/legacy/
Uploading data_loader_plugin-0.0.2.dev0-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.2.dev0.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]
View at:
https://pypi.org/project/data-loader-plugin/0.0.2.dev0/
-
Note that the documentation is built automatically by ReadTheDocs (RTD)
- The documentation is available from https://data-loader-plugin.readthedocs.io/en/latest/
- The RTD project is setup on https://readthedocs.org/projects/data-loader-plugin/
-
Build the documentation manually (with Sphinx):
$ pipenv run python setup.py build_sphinx
running build_sphinx
Running Sphinx v4.3.0
[autosummary] generating autosummary for: README.md
myst v0.15.2: ..., words_per_minute=200)
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: [new config] 1 added, 0 changed, 0 removed
reading sources... [100%] README
...
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] README
...
build succeeded.
The HTML pages are in build/sphinx/html.
Test the data loader plugin Python module
- Launch a simple test with
pytest
$ make tests
=================== test session starts ==================
platform darwin -- Python 3.9.8, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: ~/dev/infra/python-plugin-data-loader
plugins: cov-3.0.0
collected 3 items
tests/test_copyfile.py . [ 33%]
tests/test_s3.py .. [100%]
====================== 3 passed in 1.22s ==================
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_loader_plugin-0.0.2.tar.gz.
File metadata
- Download URL: data_loader_plugin-0.0.2.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e580743a5b378ac4d556cadf63d5edc71d5390e58957797fed93d28a55a3a511
|
|
| MD5 |
a8bc92e95b4c1f697fc70a17f732e6ad
|
|
| BLAKE2b-256 |
6b28344ad69c121585e5d8209ca57dadc98dc6a750dbe029c2e6a7d98f02a5e0
|
Provenance
The following attestation bundles were made for data_loader_plugin-0.0.2.tar.gz:
Publisher:
publish-pypi.yml on cloud-helpers/python-plugin-data-loader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
data_loader_plugin-0.0.2.tar.gz -
Subject digest:
e580743a5b378ac4d556cadf63d5edc71d5390e58957797fed93d28a55a3a511 - Sigstore transparency entry: 813710594
- Sigstore integration time:
-
Permalink:
cloud-helpers/python-plugin-data-loader@24d8ea2f95728bdb531af0d83e64d79c84057ed8 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/cloud-helpers
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@24d8ea2f95728bdb531af0d83e64d79c84057ed8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file data_loader_plugin-0.0.2-py3-none-any.whl.
File metadata
- Download URL: data_loader_plugin-0.0.2-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ecccdc993b0ea286904656e04a67ab43d992a2c431f42fc07dab6e20725b36a
|
|
| MD5 |
120b9082356ad9720cbd06b68d8d13de
|
|
| BLAKE2b-256 |
7cd0b3e304a98eba3735fc07d74e558ef69ffddfe0ae0da4724cba59b06bbd83
|
Provenance
The following attestation bundles were made for data_loader_plugin-0.0.2-py3-none-any.whl:
Publisher:
publish-pypi.yml on cloud-helpers/python-plugin-data-loader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
data_loader_plugin-0.0.2-py3-none-any.whl -
Subject digest:
5ecccdc993b0ea286904656e04a67ab43d992a2c431f42fc07dab6e20725b36a - Sigstore transparency entry: 813710595
- Sigstore integration time:
-
Permalink:
cloud-helpers/python-plugin-data-loader@24d8ea2f95728bdb531af0d83e64d79c84057ed8 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/cloud-helpers
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@24d8ea2f95728bdb531af0d83e64d79c84057ed8 -
Trigger Event:
push
-
Statement type: