Data Cloud Custom Code SDK
Project description
Data Cloud Custom Code SDK (BETA)
This package provides a development kit for creating custom data transformations in Data Cloud. It allows you to write your own data processing logic in Python while leveraging Data Cloud's infrastructure for data access and running data transformations, mapping execution into Data Cloud data structures like Data Model Objects and Data Lake Objects.
More specifically, this codebase gives you ability to test code locally before pushing to Data Cloud's remote execution engine, greatly reducing how long it takes to develop.
Use of this project with Salesforce is subject to the TERMS OF USE
Prerequisites
- Python 3.11 only (currently supported version - if your system version is different, we recommend using pyenv to configure 3.11)
- Azul Zulu OpenJDK 17.x
- Docker support like Docker Desktop
- A salesforce org with some DLOs or DMOs with data and this feature enabled (it is not GA)
- A connected app
Installation
The SDK can be downloaded directly from PyPI with pip:
pip install salesforce-data-customcode
You can verify it was properly installed via CLI:
datacustomcode version
Quick start
Ensure you have all the prerequisites prepared on your machine.
To get started, create a directory and initialize a new project with the CLI:
mkdir datacloud && cd datacloud
python3.11 -m venv .venv
source .venv/bin/activate
pip install salesforce-data-customcode
datacustomcode init my_package
This will yield all necessary files to get started:
.
├── Dockerfile
├── README.md
├── requirements.txt
├── requirements-dev.txt
├── payload
│ ├── config.json
│ ├── entrypoint.py
├── jupyterlab.sh
└── requirements.txt
Dockerfile(Do not update) – Development container emulating the remote execution environment.requirements-dev.txt(Do not update) – These are the dependencies for the development environment.jupyterlab.sh(Do not update) – Helper script for setting up Jupyter.requirements.txt– Here you define the requirements that you will need for your script.payload– This folder will be compressed and deployed to the remote execution environment.config.json– This config defines permissions on the back and can be generated programmatically withscanCLI method.entrypoint.py– The script that defines the data transformation logic.
A functional entrypoint.py is provided so you can run once you've configured your connected app:
cd my_package
datacustomcode configure
datacustomcode run ./payload/entrypoint.py
[!IMPORTANT] The example entrypoint.py requires a
Account_Home__dllDLO to be present. And in order to deploy the script (next step), the output DLO (which isAccount_Home_copy__dllin the example entrypoint.py) also needs to exist and be in the same dataspace asAccount_Home__dll.
After modifying the entrypoint.py as needed, using any dependencies you add in the .venv virtual environment, you can run this script in Data Cloud:
To Add New Dependencies:
- Make sure your virtual environment is activated
- Add dependencies to
requirements.txt - Run
pip install -r requirements.txt - The SDK automatically packages all dependencies when you run
datacustomcode zip
datacustomcode scan ./payload/entrypoint.py
datacustomcode deploy --path ./payload --name my_custom_script --cpu-size CPU_L
[!TIP] The
deployprocess can take several minutes. If you'd like more feedback on the underlying process, you can add--debugto the command likedatacustomcode --debug deploy --path ./payload --name my_custom_script
[!NOTE] CPU Size: Choose the appropriate CPU/Compute Size based on your workload requirements:
- CPU_L / CPU_XL / CPU_2XL / CPU_4XL: Large, X-Large, 2X-Large and 4X-Large CPU instances for data processing
- Default is
CPU_2XLwhich provides a good balance of performance and cost for most use cases
You can now use the Salesforce Data Cloud UI to find the created Data Transform and use the Run Now button to run it.
Once the Data Transform run is successful, check the DLO your script is writing to and verify the correct records were added.
Dependency Management
The SDK automatically handles all dependency packaging for Data Cloud deployment. Here's how it works:
- Add dependencies to
requirements.txt- List any Python packages your script needs - Install locally - Use
pip install -r requirements.txtin your virtual environment - Automatic packaging - When you run
datacustomcode zip, the SDK automatically:- Packages all dependencies from
requirements.txt - Uses the correct platform and architecture for Data Cloud
- Packages all dependencies from
No need to worry about platform compatibility - the SDK handles this automatically through the Docker-based packaging process.
API
Your entry point script will define logic using the Client object which wraps data access layers.
You should only need the following methods:
read_dlo(name)– Read from a Data Lake Object by nameread_dmo(name)– Read from a Data Model Object by namewrite_to_dlo(name, spark_dataframe, write_mode)– Write to a Data Model Object by name with a Spark dataframewrite_to_dmo(name, spark_dataframe, write_mode)– Write to a Data Lake Object by name with a Spark dataframe
For example:
from datacustomcode import Client
client = Client()
sdf = client.read_dlo('my_DLO')
# some transformations
# ...
client.write_to_dlo('output_DLO')
[!WARNING] Currently we only support reading from DMOs and writing to DMOs or reading from DLOs and writing to DLOs, but they cannot mix.
CLI
The Data Cloud Custom Code SDK provides a command-line interface (CLI) with the following commands:
Global Options
--debug: Enable debug-level logging
Commands
datacustomcode version
Display the current version of the package.
datacustomcode configure
Configure credentials for connecting to Data Cloud.
Options:
--profile TEXT: Credential profile name (default: "default")--username TEXT: Salesforce username--password TEXT: Salesforce password--client-id TEXT: Connected App Client ID--client-secret TEXT: Connected App Client Secret--login-url TEXT: Salesforce login URL
datacustomcode init
Initialize a new development environment with a template.
Argument:
DIRECTORY: Directory to create project in (default: ".")
datacustomcode scan
Scan a Python file to generate a Data Cloud configuration.
Argument:
FILENAME: Python file to scan
Options:
--config TEXT: Path to save the configuration file (default: same directory as FILENAME)--dry-run: Preview the configuration without saving to a file
datacustomcode run
Run an entrypoint file locally for testing.
Argument:
ENTRYPOINT: Path to entrypoint Python file
Options:
--config-file TEXT: Path to configuration file--dependencies TEXT: Additional dependencies (can be specified multiple times)
datacustomcode zip
Zip a transformation job in preparation to upload to Data Cloud.
Options:
--path TEXT: Path to the code directory (default: ".")
datacustomcode deploy
Deploy a transformation job to Data Cloud.
Options:
--profile TEXT: Credential profile name (default: "default")--path TEXT: Path to the code directory (default: ".")--name TEXT: Name of the transformation job [required]--version TEXT: Version of the transformation job (default: "0.0.1")--description TEXT: Description of the transformation job (default: "")--cpu-size TEXT: CPU size for the deployment (default: "CPU_XL"). Available options: CPU_L(Large), CPU_XL(Extra Large), CPU_2XL(2X Large), CPU_4XL(4X Large)
Docker usage
The SDK provides Docker-based development options that allow you to test your code in an environment that closely resembles Data Cloud's execution environment.
How Docker Works with the SDK
When you initialize a project with datacustomcode init my_package, a Dockerfile is created automatically. This Dockerfile:
- Isn't used during local development with virtual environments
- Becomes active during packaging when you run
datacustomcode zipordeploy - Ensures compatibility by using the same base image as Data Cloud
- Handles dependencies automatically regardless of platform differences
VS Code Dev Containers
Within your inited package, you will find a .devcontainer folder which allows you to run a docker container while developing inside of it.
Read more about Dev Containers here: https://code.visualstudio.com/docs/devcontainers/containers.
Setup Instructions
- Install the VS Code extension "Dev Containers" by microsoft.com.
- Open your package folder in VS Code, ensuring that the
.devcontainerfolder is at the root of the File Explorer - Bring up the Command Palette (on mac: Cmd + Shift + P), and select "Dev Containers: Rebuild and Reopen in Container"
- Allow the docker image to be built, then you're ready to develop
Development Workflow
Once inside the Dev Container:
- Terminal access: Open a terminal within the container
- Run your code: Execute
datacustomcode run ./payload/entrypoint.py - Environment consistency: Your code will run inside a docker container that more closely resembles Data Cloud compute than your machine
[!TIP] IDE Configuration: Use
CMD+Shift+P(orCtrl+Shift+Pon Windows/Linux), then select "Python: Select Interpreter" to configure the correct Python Interpreter
[!IMPORTANT] Dev Containers get their own tmp file storage, so you'll need to re-run
datacustomcode configureevery time you "Rebuild and Reopen in Container".
JupyterLab
Within your inited package, you will find a jupyterlab.sh file that can open a jupyter notebook for you. Jupyter notebooks, in
combination with Data Cloud's Query Editor
and Data Explorer, can be extremely helpful for data
exploration. Instead of running an entire script, one can run one code cell at a time as they discover and experiment with the DLO or DMO data.
You can read more about Jupyter Notebooks here: https://jupyter.org/
- Within the root project of your package folder, run
./jupyterlab.sh start - Double-click on "account.ipynb" file, which provides a starting point for a notebook
- Use shift+enter to execute each cell within the notebook. Add/edit/delete cells of code as needed for your data exploration.
- Don't forget to run
./jupyterlab.sh stopto stop the docker container
[!IMPORTANT] JupyterLab uses its own tmp file storage, so you'll need to re-run
datacustomcode configureeach time you./jupyterlab.sh start.
Prerequisite details
Creating a connected app
- Log in to salesforce as an admin. In the top right corner, click on the gear icon and go to
Setup - In the left hand side, search for "App Manager" and select the
App ManagerunderneathApps - Click on
New Connected Appin the upper right - Fill in the required fields within the
Basic Informationsection - Under the
API (Enable OAuth Settings)section:- Click on the checkbox to Enable OAuth Settings.
- Provide a callback URL like http://localhost:55555/callback
- In the Selected OAuth Scopes, make sure that
refresh_token,api,cdp_query_api,cdp_profile_apiis selected. - Click on Save to save the connected app
- From the detail page that opens up afterwards, click the "Manage Consumer Details" button to find your client id and client secret
- Go back to
Setup, thenOAuth and OpenID Connect Settings, and enable the "Allow OAuth Username-Password Flows" option
You now have all fields necessary for the datacustomcode configure command.
Other docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file salesforce_data_customcode-0.1.11.tar.gz.
File metadata
- Download URL: salesforce_data_customcode-0.1.11.tar.gz
- Upload date:
- Size: 34.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.11.13 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0900daf4627784ddb76edeac59a755970990c427ed6de832e6ddffd0d2a168c
|
|
| MD5 |
d0789df4bd509b9ca1270dfd957d0043
|
|
| BLAKE2b-256 |
1783badf86b0c9159d2af662aaee4258ea6e5b96077a74eadb7428b19b8204d8
|
File details
Details for the file salesforce_data_customcode-0.1.11-py3-none-any.whl.
File metadata
- Download URL: salesforce_data_customcode-0.1.11-py3-none-any.whl
- Upload date:
- Size: 47.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.11.13 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aec8a75d5e37732a4ba9adb7134e91d4fec4c09be4c4f3d54ae35e488335f739
|
|
| MD5 |
6086685f8e3be6906d786176817ce39a
|
|
| BLAKE2b-256 |
88e369cb280a1f64b405d01ea7dab372c6217e312273d449699cdf2257911fb3
|