Skip to main content

Pam Python Library

Project description

pam-python-data-plugin-framework

This repository provides the pam CLI and runtime framework to build Data Plugin services for PAM Real CDP. It generates a ready-to-run project, standardizes service lifecycle, and handles common tasks like input handling, temp storage, uploads, and service monitoring.

This README is a practical, step-by-step guide you can follow to create and run a real service.

What you get

  • CLI to initialize a project and scaffold services
  • Service lifecycle contract (start, data input, upload, exit)
  • Temp file and SQLite helpers
  • A monitoring loop for service timeouts and periodic cleanup

Table of Contents

  1. Prerequisites
  2. Install
  3. Initialize a Project
  4. Create a Service
  5. Understand the Lifecycle
  6. Using Temp Files Correctly
  7. Running the Server
  8. Testing a Service
  9. Configuration
  10. Project Structure
  11. Troubleshooting

Prerequisites

  • Python 3.8+ recommended
  • pip and a working virtual environment

Install Create a project folder and a virtual environment.

mkdir my_data_plugin
cd my_data_plugin
python3 -m venv venv
source venv/bin/activate

Install the framework:

pip install pam-python

Initialize a Project This creates a runnable project with templates (including AGENT.md).

pam init

When requirements.txt already exists, you will be prompted to choose how to proceed:

  • overwrite
  • keep
  • merge

pam init now writes a project baseline requirements.txt. It pins pam-python to the scaffold version and includes core runtime/test dependencies used by the generated project. It does not copy the current machine's pip freeze.


Create a Service Generate a service scaffold. Do not hand-create service templates.

pam new service rfm_segment

This creates a new folder (e.g. rfm_segment/) with:

  • a service class (RfmSegmentSvc.py)
  • functions.py for your logic
  • service.yaml for registration
  • a test file

Understand the Lifecycle The runtime calls your service in two main phases.

  1. on_start
  • Called once at the beginning
  • Read parameters from self.request.runtime_parameters
  • Should return quickly (start a thread for long work)
  1. on_data_input
  • Called when CDP sends input files
  • req.input_files contains ordered CSV files
  • Should also return quickly (use a thread if needed)

When your service is done:

  • Call self._upload_result(...) or self._upload_report(...)
  • Call self._exit() to signal completion

Using Temp Files Correctly Temp storage is managed by the framework. Do not delete temp files manually.

Standard helpers:

  • TempfileUtils.get_temp_path_for_service(self, self.service_name)
  • TempfileUtils.get_temp_file_name_for_service(self, self.service_name, prefix, extension)

Notes:

  • get_temp_path_for_service(...) returns a directory path without a trailing slash.
  • The temp path includes date/service/token in this structure: TEMP_DATASOURCE_PATH/YYYY_MM_DD/<service>/<token>

Uploading Results in Batches If your service produces too many rows (or too few per event), use the batch uploader to handle chunking and flushing automatically.

Recommended usage:

from pam.result_batch_uploader import ResultBatchUploader

batch_uploader = ResultBatchUploader(self, batch_size=50000)
batch_uploader.upload(df, "data-name")
batch_uploader.flush()
status = batch_uploader.get_status()

Notes:

  • name separates different result streams (A/B) to avoid schema conflicts.
  • flush() uploads any remaining rows that are below the batch size.

Running the Server The generated main.py runs the Flask server.

python main.py

By default it binds to 0.0.0.0:8000. You can override with:

export SERVER_HOST=0.0.0.0
export SERVER_PORT=8000

Testing a Service Run unit tests for a service:

pam test rfm_segment

If you write custom tests, place them in the service folder and name them test_<service>.py.


Configuration Environment variables you can set:

  • SERVER_HOST
  • SERVER_PORT
  • TEMP_BASE_PATH (default /app/data)
  • TEMP_DATASOURCE_PATH (default /app/data/data_sources)
  • TEMP_CLEAN_DAYS (default 10)
  • TEMP_CLEAN_INTERVAL_HOURS (default 6, set empty to disable periodic cleanup)

Project Structure After pam init and one service:

.
├── main.py
├── AGENT.md
├── Dockerfile
├── rfm_segment/
│   ├── RfmSegmentSvc.py
│   ├── functions.py
│   ├── service.yaml
│   └── test_rfm_segment.py
├── requirements.txt
└── run_unit_test.sh

Troubleshooting

  • If pam command is missing, ensure your virtualenv is activated.
  • If pam new service fails, confirm the service name is provided.
  • If temp cleanup is too frequent or too slow, adjust TEMP_CLEAN_INTERVAL_HOURS and TEMP_CLEAN_DAYS.

Next Steps

  • Implement your logic in functions.py.
  • Wire it into on_start and on_data_input in your service class.
  • Use the temp utilities to write intermediate files.
  • Use _upload_result to return output to CDP.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pam_python-0.1.43.tar.gz (34.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pam_python-0.1.43-py3-none-any.whl (39.2 kB view details)

Uploaded Python 3

File details

Details for the file pam_python-0.1.43.tar.gz.

File metadata

  • Download URL: pam_python-0.1.43.tar.gz
  • Upload date:
  • Size: 34.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pam_python-0.1.43.tar.gz
Algorithm Hash digest
SHA256 aa2eb61d2ea497a25de7f0393a23bcaef78fb7a66ce493c6342e2f5562b6bd26
MD5 c3c29a5e1b43ffc77b0f3a125be44f02
BLAKE2b-256 bc7dc0800a484e2e91e6273ed6d2470953a8befb5e23c307595b40418be4313c

See more details on using hashes here.

File details

Details for the file pam_python-0.1.43-py3-none-any.whl.

File metadata

  • Download URL: pam_python-0.1.43-py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pam_python-0.1.43-py3-none-any.whl
Algorithm Hash digest
SHA256 ed405137d60b5b8085c42f6017c07a8deb152036282423cf990ad7c1f77a2486
MD5 6943f715b75a1246facb77af8254aa5b
BLAKE2b-256 ed5293e125cab339b124bf6a3b9e00360dec97eb272490f59812d48830c6f961

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page