Planetary Data Service Data Delivery Manager
Project description
PDS Data Upload Manager
The PDS Data Upload Manager provides the client application and server interface for managing data deliveries and retrievals between Data Providers and the Planetary Data Cloud.
Prerequisites
The PDS Data Upload Manager has the following prerequisites:
python3for running the client application and unit tests (Python 3.13 or later)terraformfor creating and deploying DUM server components to AWSDockerfor Terraform to build and package Lambda functions- Minimum version: Docker Engine 20.10+ or Docker Desktop 4.x+
- Architecture: If deploying from an ARM64 machine (for example, Apple M1/M2/M3) to x86_64 Lambdas, ensure Docker is configured to support
linux/amd64builds
Node.jsversion 18.x for the Authorizer Lambda. While the Terraform build process manages this via Docker (usingnode:18-slim), local development should align with this version to ensure dependency consistency.
User Quickstart
Install with:
pip install pds-data-upload-manager
To deploy the service components to an AWS environment:
cd terraform/
terraform init
terraform apply
To execute the client, run:
pds-ingress-client -c <config path> -n <PDS node ID> -- <ingress path> [<ingress_path> ...]
To see a listing of all available arguments for the client:
pds-ingress-client --help
Data Upload Manager Client Workflow
When using the DUM client script (pds-ingress-client), the following workflow is executed:
- Index the requested input files and paths to determine the full input file set
- Generate a manifest file containing information, including MD5 checksums, for each file to be ingested
- Submit batch ingress requests for the input file set to the DUM Ingress Service in AWS
- Upload the input file set to AWS S3 in batches
- Create an ingress report
Determination of the input file set occurs in Step 1 by resolving the paths provided on the command line to the DUM client. Any directories provided are traversed recursively to determine the full set of files within them. Any file paths provided are included as-is in the input file set. By default, symbolic links are followed during path resolution. To avoid uploading duplicate data when files are symlinked into multiple locations, use the --skip-symlinks flag to skip symbolic links during traversal.
Depending on the size of the input file set, manifest file creation in Step 2 can become time-consuming because each file in the input file set must be hashed. To save time, use the --manifest-path command-line option to write the manifest contents to local disk. Specifying the same path via --manifest-path on subsequent executions of the DUM client causes the existing manifest to be read from disk. Any files within the input set that are referenced in the existing manifest will reuse the precomputed values, reducing upfront time before upload to S3 begins. The manifest is then rewritten to the path specified by --manifest-path to include any newly encountered files. In this way, a manifest file can expand across DUM executions and serve as a cache for file information.
The batch size used by Steps 3 and 4 can be configured in the INI configuration provided to the DUM client. The number of batches processed in parallel can be controlled with the --num-threads command-line argument.
By default, upon completion of an ingress request (Step 5), the DUM client provides a summary of the transfer results:
Ingress Summary Report for 2025-02-25 11:41:29.507022
-----------------------------------------------------
Uploaded: 200 file(s)
Skipped: 0 file(s)
Failed: 0 file(s)
Unprocessed: 0 file(s)
Total: 200 files(s)
Time elapsed: 3019.00 seconds
Bytes transferred: 3087368895
A more detailed JSON-format report, containing full listings of all uploaded, skipped, and failed paths, can be written to disk via the --report-path command-line argument:
{
"Arguments": "Namespace(config_path='mcp.test.ingress.config.ini', node='sbn', prefix='/PDS/SBN/', force_overwrite=True, num_threads=4, log_path='/tmp/dum_log.txt', manifest_path='/tmp/dum_manifest.json', report_path='/tmp/dum_report.json', dry_run=False, log_level='info', ingress_paths=['/PDS/SBN/gbo.ast.catalina.survey/'])",
"Batch Size": 3,
"Total Batches": 67,
"Start Time": "2025-02-25 18:51:10.507562+00:00",
"Finish Time": "2025-02-25 19:41:29.504806+00:00",
"Uploaded": [
"gbo.ast.catalina.survey/data_calibrated/703/2020/20Apr02/703_20200402_2B_F48FC1_01_0001.arch.fz",
"...",
"gbo.ast.catalina.survey/data_calibrated/703/2020/20Apr02/703_20200402_2B_N02055_01_0001.arch.xml"
],
"Total Uploaded": 200,
"Skipped": [],
"Total Skipped": 0,
"Failed": [],
"Total Failed": 0,
"Unprocessed": [],
"Total Unprocessed": 0,
"Bytes Transferred": 3087368895,
"Total Files": 200
}
Lastly, a detailed log file containing trace statements for each uploaded file and batch can be written to disk via the --log-path command-line argument. The log file path may also be specified in the INI configuration.
Code of Conduct
All users and developers of NASA-PDS software are expected to abide by our Code of Conduct. Please read it to ensure you understand the expectations of our community.
Development
To develop this project, use your favorite text editor or an integrated development environment with Python support, such as PyCharm.
Contributing
For information on how to contribute to NASA-PDS codebases, please see our Contributing guidelines.
Installation
Install in editable mode with extra developer dependencies into your virtual environment of choice:
pip install --editable '.[dev]'
Configure the pre-commit hooks:
pre-commit install && pre-commit install -t pre-push
Packaging
To isolate and reproduce the environment for this package, use a Python virtual environment. To do so, run:
python -m venv venv
source bin/venv/activate # Substitute with `source bin/venv/activate.csh` for csh/tcsh users
If you have tox installed and would like it to create your environment and install dependencies for you, run:
tox --devenv <name you'd like for env> -e dev
Dependencies for development are specified as the dev extra in setup.cfg; they are installed into the virtual environment as follows:
pip install --editable '.[dev]'
Tooling
The dev extra included in this repository installs black, flake8 (plus some plugins), and mypy, along with default configuration for all of them. You can run all of these, and more, with:
tox -e lint
Tests
A complete build, including test execution, linting (mypy, black, flake8, and more), and documentation generation, is executed via:
tox
Unit tests
Our unit tests are launched with:
pytest
Documentation
You can build this project's documentation with:
sphinx-build -b html docs/source docs/build
You can access the build files in the following directory relative to the project root:
build/sphinx/html/
Migration Steps for Existing Deployments
If you are migrating from an existing deployment, follow these steps in order to transition to the new Python 3.13 and Docker-based build system:
-
Install and Start Docker Install Docker Desktop or Docker Engine and ensure the daemon is running. Docker is now a mandatory dependency for compiling Linux-compatible binaries.
-
Clear Local Build Artifacts Remove existing temporary files to prevent version conflicts between Python 3.11, 3.12, and 3.13:
rm -rf terraform/modules/lambda/service/files/ rm -rf terraform/modules/lambda/authorizer/files/
-
Initialize and Refresh Terraform Update providers and synchronize the state with the new
null_resourcelogic:cd terraform/ terraform init -upgrade terraform refresh
-
Deploy Infrastructure Execute the deployment to build the new Python 3.13 layers and Node.js 18 authorizer:
terraform apply -
Verify Runtime and Authorizer Confirm that:
- Lambda functions are using Python 3.13
- The authorizer is running on Node.js 18
- Files can be uploaded using the DUM client
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pds_data_upload_manager-2.5.4-py3-none-any.whl.
File metadata
- Download URL: pds_data_upload_manager-2.5.4-py3-none-any.whl
- Upload date:
- Size: 92.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6588e500c48ef7ead141a2e2fe423776657b6cbdc34fd7fc9da3c170e30429b4
|
|
| MD5 |
6ddb78a9045ec35c9ca6aa08872bda84
|
|
| BLAKE2b-256 |
5458a09bf4d1076867f05c6590e143fed26f3bbc2265bffdc866cffc670fda70
|