Skip to main content

Pam Python Library

Project description

pam-python-data-plugin-framework

This repository provides the pam CLI and runtime framework to build Data Plugin services for PAM Real CDP. It generates a ready-to-run project, standardizes service lifecycle, and handles common tasks like input handling, temp storage, uploads, and service monitoring.

This README is a practical, step-by-step guide you can follow to create and run a real service.

What you get

  • CLI to initialize a project and scaffold services
  • Service lifecycle contract (start, data input, upload, exit)
  • Temp file and SQLite helpers
  • A monitoring loop for service timeouts and periodic cleanup

Table of Contents

  1. Prerequisites
  2. Install
  3. Initialize a Project
  4. Create a Service
  5. Understand the Lifecycle
  6. Using Temp Files Correctly
  7. Running the Server
  8. Testing a Service
  9. Configuration
  10. Project Structure
  11. Troubleshooting

Prerequisites

  • Python 3.8+ recommended
  • pip and a working virtual environment

Install Create a project folder and a virtual environment.

mkdir my_data_plugin
cd my_data_plugin
python3 -m venv venv
source venv/bin/activate

Install the framework:

pip install pam-python

Initialize a Project This creates a runnable project with templates (including AGENT.md).

pam init

When requirements.txt already exists, you will be prompted to choose how to proceed:

  • overwrite
  • keep
  • merge

Create a Service Generate a service scaffold. Do not hand-create service templates.

pam new service rfm_segment

This creates a new folder (e.g. rfm_segment/) with:

  • a service class (RfmSegmentSvc.py)
  • functions.py for your logic
  • service.yaml for registration
  • a test file

Understand the Lifecycle The runtime calls your service in two main phases.

  1. on_start
  • Called once at the beginning
  • Read parameters from self.request.runtime_parameters
  • Should return quickly (start a thread for long work)
  1. on_data_input
  • Called when CDP sends input files
  • req.input_files contains ordered CSV files
  • Should also return quickly (use a thread if needed)

When your service is done:

  • Call self._upload_result(...) or self._upload_report(...)
  • Call self._exit() to signal completion

Using Temp Files Correctly Temp storage is managed by the framework. Do not delete temp files manually.

Standard helpers:

  • TempfileUtils.get_temp_path_for_service(self, self.service_name)
  • TempfileUtils.get_temp_file_name_for_service(self, self.service_name, prefix, extension)

Notes:

  • get_temp_path_for_service(...) returns a directory path without a trailing slash.
  • The temp path includes date/service/token in this structure: TEMP_DATASOURCE_PATH/YYYY_MM_DD/<service>/<token>

Uploading Results in Batches If your service produces too many rows (or too few per event), use the batch uploader to handle chunking and flushing automatically.

Recommended usage:

from pam.result_batch_uploader import ResultBatchUploader

batch_uploader = ResultBatchUploader(self, batch_size=50000)
batch_uploader.upload(df, "data-name")
batch_uploader.flush()
status = batch_uploader.get_status()

Notes:

  • name separates different result streams (A/B) to avoid schema conflicts.
  • flush() uploads any remaining rows that are below the batch size.

Running the Server The generated main.py runs the Flask server.

python main.py

By default it binds to 0.0.0.0:8000. You can override with:

export SERVER_HOST=0.0.0.0
export SERVER_PORT=8000

Testing a Service Run unit tests for a service:

pam test rfm_segment

If you write custom tests, place them in the service folder and name them test_<service>.py.


Configuration Environment variables you can set:

  • SERVER_HOST
  • SERVER_PORT
  • TEMP_BASE_PATH (default /app/data)
  • TEMP_DATASOURCE_PATH (default /app/data/data_sources)
  • TEMP_CLEAN_DAYS (default 10)
  • TEMP_CLEAN_INTERVAL_HOURS (default 6, set empty to disable periodic cleanup)

Project Structure After pam init and one service:

.
├── main.py
├── AGENT.md
├── Dockerfile
├── rfm_segment/
│   ├── RfmSegmentSvc.py
│   ├── functions.py
│   ├── service.yaml
│   └── test_rfm_segment.py
├── requirements.txt
└── run_unit_test.sh

Troubleshooting

  • If pam command is missing, ensure your virtualenv is activated.
  • If pam new service fails, confirm the service name is provided.
  • If temp cleanup is too frequent or too slow, adjust TEMP_CLEAN_INTERVAL_HOURS and TEMP_CLEAN_DAYS.

Next Steps

  • Implement your logic in functions.py.
  • Wire it into on_start and on_data_input in your service class.
  • Use the temp utilities to write intermediate files.
  • Use _upload_result to return output to CDP.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pam_python-0.1.42.tar.gz (33.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pam_python-0.1.42-py3-none-any.whl (38.6 kB view details)

Uploaded Python 3

File details

Details for the file pam_python-0.1.42.tar.gz.

File metadata

  • Download URL: pam_python-0.1.42.tar.gz
  • Upload date:
  • Size: 33.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for pam_python-0.1.42.tar.gz
Algorithm Hash digest
SHA256 300a148819526484db27c87a32be8b6eebd2a9c0fd91a98f75edad1bbe93070a
MD5 5ab4de3a1589de3e422e65128351fb10
BLAKE2b-256 6265702eb22ce7dc79abb2ec6ef95e9aa59841bdb4d54071dffaa14b8987bc6a

See more details on using hashes here.

File details

Details for the file pam_python-0.1.42-py3-none-any.whl.

File metadata

  • Download URL: pam_python-0.1.42-py3-none-any.whl
  • Upload date:
  • Size: 38.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for pam_python-0.1.42-py3-none-any.whl
Algorithm Hash digest
SHA256 9c05e3d0655e641d91c04a0f807fa76da3d24253eb3940e04be0afea7570b856
MD5 a40b7611ee840e1ea086a0cee6f3d491
BLAKE2b-256 96fe7f6ce7361fca67bdc007a48cf835b2a192a3fc96fe06da025c766ae4a5cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page