Salesforce Data Cloud Python connector (V3 driver, beta)

Project description

Salesforce Data Cloud Python Connector

The official Salesforce Data Cloud Python connector — a DB-API 2.0 compliant driver for querying Salesforce Data Cloud using the Query API. Designed for use from Jupyter notebooks, pandas pipelines, ETL scripts, and any other Python data tooling.

This package (salesforce-datacloud-connector) supersedes the salesforce-cdp-connector package. New projects should adopt this package; existing salesforce-cdp-connector users should plan to migrate.

Important: this is a beta release

salesforce-datacloud-connector is currently published as a beta on PyPI (2.0.0b1). The public API surface may change before the GA release.

Install with the --pre flag — pip will not pick up pre-releases otherwise:
```
pip install --pre salesforce-datacloud-connector
```
Pin your version explicitly in production code and your dependency manifests:
```
salesforce-datacloud-connector==2.0.0b1
```
This protects you from accidental upgrades to a later beta or release candidate that may include breaking changes.

Features

DB-API 2.0 compliant — standard Python database interface.
Three OAuth flows — Username/Password, JWT Bearer Token, Refresh Token.
Pandas integration — works with pandas.read_sql() and cursor.fetch_df() (under the [pandas] extra).
Notebook-ready — interactive exploration and visualization in Jupyter.
Parameterized queries — safe SQL execution with named parameters (:param).
Efficient pagination — chunked, low-memory fetching for large result sets.
Type conversion — Data Cloud types map to native Python types (str, int, Decimal, float, bool, datetime.date, datetime.datetime).
Comprehensive error hierarchy — full DB-API 2.0 exception tree.

Installation

# Basic installation (always include --pre during the beta)
pip install --pre salesforce-datacloud-connector

# With pandas + pyarrow support (recommended for analytics use)
pip install --pre "salesforce-datacloud-connector[pandas]"

Requirements

Python 3.8 or newer.
requests >= 2.31.0
cryptography >= 41.0.0
pyjwt >= 2.8.0
python-dateutil >= 2.8.0

Optional (under [pandas] extra):

pandas >= 2.0.0
pyarrow >= 14.0.0

Quickstart

import salesforce_datacloud_connector as sfdc

# Connect to Salesforce Data Cloud (refresh token flow shown — see below for
# all three flows)
conn = sfdc.connect(
    login_url="https://login.salesforce.com",
    auth_type="refresh_token",
    client_id="YOUR_CLIENT_ID",
    client_secret="YOUR_CLIENT_SECRET",
    refresh_token="YOUR_REFRESH_TOKEN",
)

cursor = conn.cursor()
cursor.execute("SELECT Id, Name FROM Account LIMIT 10")

for row in cursor:
    print(row)

cursor.close()
conn.close()

The connector also supports the standard Python context-manager idioms — both the connection and the cursor close cleanly when their with blocks exit:

import salesforce_datacloud_connector as sfdc

with sfdc.connect(...) as conn:
    with conn.cursor() as cursor:
        cursor.execute("SELECT COUNT(*) FROM Account")
        (count,) = cursor.fetchone()
        print(f"Total accounts: {count}")

Authentication

The connector ships three OAuth flows, all driven through sfdc.connect(...) via the auth_type keyword.

Username / Password

import salesforce_datacloud_connector as sfdc

conn = sfdc.connect(
    login_url="https://login.salesforce.com",
    auth_type="username_password",
    username="user@example.com",
    password="your_password",
    client_id="your_connected_app_client_id",
    client_secret="your_connected_app_client_secret",
)

JWT Bearer Token

import salesforce_datacloud_connector as sfdc

with open("private_key.pem", "r") as f:
    private_key = f.read()

conn = sfdc.connect(
    login_url="https://login.salesforce.com",
    auth_type="jwt",
    username="user@example.com",
    client_id="your_connected_app_client_id",
    jwt_private_key=private_key,
)

Refresh Token (recommended for long-running services)

import os

import salesforce_datacloud_connector as sfdc

conn = sfdc.connect(
    login_url=os.environ["SFDC_LOGIN_URL"],
    auth_type="refresh_token",
    client_id=os.environ["SFDC_CLIENT_ID"],
    client_secret=os.environ["SFDC_CLIENT_SECRET"],
    refresh_token=os.environ["SFDC_REFRESH_TOKEN"],
)

Credential hygiene

Never hardcode credentials in source. Load them from environment variables (local development) or your platform's secret manager (production). Rotate credentials regularly and avoid using production credentials for development.

Usage

Parameterized queries

cursor.execute(
    """
    SELECT Id, Name, Email
    FROM Contact
    WHERE Status = :status AND CreatedDate > :min_date
    LIMIT :limit
    """,
    {
        "status": "Active",
        "min_date": "2024-01-01",
        "limit": 100,
    },
)

Pandas integration

import pandas as pd
import salesforce_datacloud_connector as sfdc

conn = sfdc.connect(...)

# Method 1 — pandas.read_sql
df = pd.read_sql("SELECT * FROM Account LIMIT 1000", conn)

# Method 2 — cursor.fetch_df
cursor = conn.cursor()
cursor.execute("SELECT Id, Name, Industry FROM Account")
df = cursor.fetch_df()

print(df.describe())
print(df.groupby("Industry").size())

cursor.fetch_df() lazily imports pandas. Calling it without the [pandas] extra installed raises ImportError with installation guidance.

Fetch methods

row = cursor.fetchone()       # one row
rows = cursor.fetchmany(100)  # multiple rows
all_rows = cursor.fetchall()  # all remaining rows

for row in cursor:            # iteration
    process(row)

Configuration: dataspace and workload

conn = sfdc.connect(
    ...,
    dataspace="custom_dataspace",  # optional; defaults to the org default
    workload="my_application",     # optional; surfaced in observability
)

Cursor array size

cursor = conn.cursor()
cursor.arraysize = 1000  # default fetch size for fetchmany()

Type mapping

Data Cloud type	Python type	Notes
Varchar	`str`
Numeric	`int`, `Decimal`	`int` when scale is 0; `Decimal` otherwise
Integer, BigInt	`int`
Float, Double	`float`
Boolean	`bool`
Date	`datetime.date`
TimestampTZ	`datetime.datetime`	timezone-aware

API reference

DB-API 2.0 surface

This connector follows PEP 249 — Python Database API Specification v2.0.

API level: 2.0
Thread safety: 1 (threads may share the module, but not connections)
Parameter style: named (:param)

Connection

cursor() — create a new cursor.
close() — close the connection.
commit() — no-op (read-only driver).
rollback() — no-op (read-only driver).

Cursor

execute(operation, parameters=None) — execute a query.
executemany(operation, seq_of_parameters) — execute the same query multiple times.
fetchone() — fetch the next row.
fetchmany(size=cursor.arraysize) — fetch up to size rows.
fetchall() — fetch all remaining rows.
close() — close the cursor.
cancel() — cancel a running query (extension).
fetch_df() — fetch results as a pandas DataFrame (extension; requires the [pandas] extra).

Cursor attributes

description — column metadata.
rowcount — number of rows affected (-1 for SELECT until exhausted).
arraysize — default fetch size for fetchmany().

Exception hierarchy

Exception
└── Warning
└── Error
    ├── InterfaceError
    └── DatabaseError
        ├── DataError
        ├── OperationalError      (auth failures, network errors)
        ├── IntegrityError
        ├── InternalError         (server errors)
        ├── ProgrammingError      (SQL syntax errors)
        └── NotSupportedError     (unsupported operations)

import salesforce_datacloud_connector as sfdc

try:
    cursor.execute("SELECT * FROM NonexistentTable")
except sfdc.ProgrammingError as e:
    print(f"SQL error: {e}")

try:
    cursor.execute("INSERT INTO Account VALUES (...)")
except sfdc.NotSupportedError as e:
    print(f"Operation not supported: {e}")

`fetch_df()` extension

cursor.fetch_df() returns a pandas.DataFrame containing all remaining rows of the most recent query. It is available only when the [pandas] extra is installed:

pip install --pre "salesforce-datacloud-connector[pandas]"

cursor.execute("SELECT Id, Name, Industry FROM Account LIMIT 5000")
df = cursor.fetch_df()

If pandas is missing, fetch_df() raises an ImportError with the install command above.

Beta-period limitations

2.0.0b1 is intentionally scoped:

Read-only — no INSERT, UPDATE, DELETE, or DDL operations.
No streaming — results buffer in memory (chunked pagination keeps memory bounded but does not stream row-by-row).
Synchronous API only — async client is on the V2 roadmap.
No connection pooling — create new connections as needed.
No prepared statements — parameters bind on each execution.
No SQLAlchemy dialect — direct DB-API usage only.
In-memory token cache — tokens are not persisted across runs.

These constraints will be reconsidered as the package moves from beta to GA.

Development

The repository uses uv for dependency management and hatchling as the build backend.

Setup

# From the repo root
cd salesforce_datacloud_connector

# Install everything (runtime + pandas extra + dev dependency group)
uv sync --all-extras --dev

Run tests

# Unit + integration tests (skips live-org E2E)
uv run pytest -m "not e2e"

# Include the live-org E2E suite (requires real Salesforce credentials —
# see tests/test_e2e_real_datacloud.py for the env vars it expects)
uv run pytest

Lint

uv run ruff check salesforce_datacloud_connector tests --statistics

Build

uv build
# produces dist/salesforce_datacloud_connector-<version>-py3-none-any.whl
# and  dist/salesforce_datacloud_connector-<version>.tar.gz

Repo layout

The repository publishes two PyPI packages from one branch — see the repo-root README for the full layout. This package's source lives under salesforce_datacloud_connector/salesforce_datacloud_connector/, its tests under salesforce_datacloud_connector/tests/, and its build config under salesforce_datacloud_connector/pyproject.toml.

Contributing

Contributions are welcome. Please open an issue before starting on a large change. PRs should include tests for new functionality and pass ruff check and pytest -m "not e2e" locally.

See CONTRIBUTING.md at the repo root for the full contribution guide.

License

BSD-3-Clause. See LICENSE.txt at the repo root.

Project details

Release history Release notifications | RSS feed

This version

2.0.0b1 pre-release

Jun 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

salesforce_datacloud_connector-2.0.0b1.tar.gz (173.5 kB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl (30.1 kB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file salesforce_datacloud_connector-2.0.0b1.tar.gz.

File metadata

Download URL: salesforce_datacloud_connector-2.0.0b1.tar.gz
Upload date: Jun 2, 2026
Size: 173.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for salesforce_datacloud_connector-2.0.0b1.tar.gz
Algorithm	Hash digest
SHA256	`6bb5bdeed9861fc5c32bf8a3d62b537e6bc581ab371c71162e3e4f152908bc4e`
MD5	`055b773273ae1b49202fc364ab73d63d`
BLAKE2b-256	`4cbb0e0cb0347ac607707d5ca7111ce295ae81ed74a54040ef4f26707b4f3815`

See more details on using hashes here.

Provenance

The following attestation bundles were made for salesforce_datacloud_connector-2.0.0b1.tar.gz:

Publisher: v2-publish.yml on forcedotcom/salesforce-cdp-connector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: salesforce_datacloud_connector-2.0.0b1.tar.gz
- Subject digest: 6bb5bdeed9861fc5c32bf8a3d62b537e6bc581ab371c71162e3e4f152908bc4e
- Sigstore transparency entry: 1705909236
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: forcedotcom/salesforce-cdp-connector@bbd9e7a77e3f50be6798c852845528b1be4b60eb
- Branch / Tag: refs/tags/salesforce-datacloud-connector-2.0.0b1
- Owner: https://github.com/forcedotcom
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: v2-publish.yml@bbd9e7a77e3f50be6798c852845528b1be4b60eb
- Trigger Event: release

File details

Details for the file salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl.

File metadata

Download URL: salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 30.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`24f0b88a68074cb29b3e8f41df4360f42c901160dca0f16930b86182fccec603`
MD5	`85c245c5bb216898daa72127292927e0`
BLAKE2b-256	`20bdbc42dd75ecb38dcbdedab807bb72a8fee73938987e05994c9cc36f4daf13`

See more details on using hashes here.

Provenance

The following attestation bundles were made for salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl:

Publisher: v2-publish.yml on forcedotcom/salesforce-cdp-connector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl
- Subject digest: 24f0b88a68074cb29b3e8f41df4360f42c901160dca0f16930b86182fccec603
- Sigstore transparency entry: 1705909408
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: forcedotcom/salesforce-cdp-connector@bbd9e7a77e3f50be6798c852845528b1be4b60eb
- Branch / Tag: refs/tags/salesforce-datacloud-connector-2.0.0b1
- Owner: https://github.com/forcedotcom
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: v2-publish.yml@bbd9e7a77e3f50be6798c852845528b1be4b60eb
- Trigger Event: release

salesforce-datacloud-connector 2.0.0b1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Salesforce Data Cloud Python Connector

Important: this is a beta release

Features

Installation

Requirements

Quickstart

Authentication

Username / Password

JWT Bearer Token

Refresh Token (recommended for long-running services)

Credential hygiene

Usage

Parameterized queries

Pandas integration

Fetch methods

Configuration: dataspace and workload

Cursor array size

Type mapping

API reference

DB-API 2.0 surface

Connection

Cursor

Cursor attributes

Exception hierarchy

fetch_df() extension

Beta-period limitations

Development

Setup

Run tests

Lint

Build

Repo layout

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`fetch_df()` extension