Salesforce Data Cloud Python connector (V3 driver, beta)
Project description
Salesforce Data Cloud Python Connector
The official Salesforce Data Cloud Python connector — a DB-API 2.0 compliant driver for querying Salesforce Data Cloud using the Query API. Designed for use from Jupyter notebooks, pandas pipelines, ETL scripts, and any other Python data tooling.
This package (salesforce-datacloud-connector) supersedes the
salesforce-cdp-connector package. New projects should adopt this package;
existing salesforce-cdp-connector users should plan to migrate.
Important: this is a beta release
salesforce-datacloud-connector is currently published as a beta on PyPI
(2.0.0b1). The public API surface may change before the GA release.
-
Install with the
--preflag — pip will not pick up pre-releases otherwise:pip install --pre salesforce-datacloud-connector
-
Pin your version explicitly in production code and your dependency manifests:
salesforce-datacloud-connector==2.0.0b1
This protects you from accidental upgrades to a later beta or release candidate that may include breaking changes.
Features
- DB-API 2.0 compliant — standard Python database interface.
- Three OAuth flows — Username/Password, JWT Bearer Token, Refresh Token.
- Pandas integration — works with
pandas.read_sql()andcursor.fetch_df()(under the[pandas]extra). - Notebook-ready — interactive exploration and visualization in Jupyter.
- Parameterized queries — safe SQL execution with named parameters
(
:param). - Efficient pagination — chunked, low-memory fetching for large result sets.
- Type conversion — Data Cloud types map to native Python types (
str,int,Decimal,float,bool,datetime.date,datetime.datetime). - Comprehensive error hierarchy — full DB-API 2.0 exception tree.
Installation
# Basic installation (always include --pre during the beta)
pip install --pre salesforce-datacloud-connector
# With pandas + pyarrow support (recommended for analytics use)
pip install --pre "salesforce-datacloud-connector[pandas]"
Requirements
- Python 3.8 or newer.
requests >= 2.31.0cryptography >= 41.0.0pyjwt >= 2.8.0python-dateutil >= 2.8.0
Optional (under [pandas] extra):
pandas >= 2.0.0pyarrow >= 14.0.0
Quickstart
import salesforce_datacloud_connector as sfdc
# Connect to Salesforce Data Cloud (refresh token flow shown — see below for
# all three flows)
conn = sfdc.connect(
login_url="https://login.salesforce.com",
auth_type="refresh_token",
client_id="YOUR_CLIENT_ID",
client_secret="YOUR_CLIENT_SECRET",
refresh_token="YOUR_REFRESH_TOKEN",
)
cursor = conn.cursor()
cursor.execute("SELECT Id, Name FROM Account LIMIT 10")
for row in cursor:
print(row)
cursor.close()
conn.close()
The connector also supports the standard Python context-manager idioms — both
the connection and the cursor close cleanly when their with blocks exit:
import salesforce_datacloud_connector as sfdc
with sfdc.connect(...) as conn:
with conn.cursor() as cursor:
cursor.execute("SELECT COUNT(*) FROM Account")
(count,) = cursor.fetchone()
print(f"Total accounts: {count}")
Authentication
The connector ships three OAuth flows, all driven through sfdc.connect(...)
via the auth_type keyword.
Username / Password
import salesforce_datacloud_connector as sfdc
conn = sfdc.connect(
login_url="https://login.salesforce.com",
auth_type="username_password",
username="user@example.com",
password="your_password",
client_id="your_connected_app_client_id",
client_secret="your_connected_app_client_secret",
)
JWT Bearer Token
import salesforce_datacloud_connector as sfdc
with open("private_key.pem", "r") as f:
private_key = f.read()
conn = sfdc.connect(
login_url="https://login.salesforce.com",
auth_type="jwt",
username="user@example.com",
client_id="your_connected_app_client_id",
jwt_private_key=private_key,
)
Refresh Token (recommended for long-running services)
import os
import salesforce_datacloud_connector as sfdc
conn = sfdc.connect(
login_url=os.environ["SFDC_LOGIN_URL"],
auth_type="refresh_token",
client_id=os.environ["SFDC_CLIENT_ID"],
client_secret=os.environ["SFDC_CLIENT_SECRET"],
refresh_token=os.environ["SFDC_REFRESH_TOKEN"],
)
Credential hygiene
Never hardcode credentials in source. Load them from environment variables (local development) or your platform's secret manager (production). Rotate credentials regularly and avoid using production credentials for development.
Usage
Parameterized queries
cursor.execute(
"""
SELECT Id, Name, Email
FROM Contact
WHERE Status = :status AND CreatedDate > :min_date
LIMIT :limit
""",
{
"status": "Active",
"min_date": "2024-01-01",
"limit": 100,
},
)
Pandas integration
import pandas as pd
import salesforce_datacloud_connector as sfdc
conn = sfdc.connect(...)
# Method 1 — pandas.read_sql
df = pd.read_sql("SELECT * FROM Account LIMIT 1000", conn)
# Method 2 — cursor.fetch_df
cursor = conn.cursor()
cursor.execute("SELECT Id, Name, Industry FROM Account")
df = cursor.fetch_df()
print(df.describe())
print(df.groupby("Industry").size())
cursor.fetch_df() lazily imports pandas. Calling it without the [pandas]
extra installed raises ImportError with installation guidance.
Fetch methods
row = cursor.fetchone() # one row
rows = cursor.fetchmany(100) # multiple rows
all_rows = cursor.fetchall() # all remaining rows
for row in cursor: # iteration
process(row)
Configuration: dataspace and workload
conn = sfdc.connect(
...,
dataspace="custom_dataspace", # optional; defaults to the org default
workload="my_application", # optional; surfaced in observability
)
Cursor array size
cursor = conn.cursor()
cursor.arraysize = 1000 # default fetch size for fetchmany()
Type mapping
| Data Cloud type | Python type | Notes |
|---|---|---|
| Varchar | str |
|
| Numeric | int, Decimal |
int when scale is 0; Decimal otherwise |
| Integer, BigInt | int |
|
| Float, Double | float |
|
| Boolean | bool |
|
| Date | datetime.date |
|
| TimestampTZ | datetime.datetime |
timezone-aware |
API reference
DB-API 2.0 surface
This connector follows PEP 249 — Python Database API Specification v2.0.
- API level:
2.0 - Thread safety:
1(threads may share the module, but not connections) - Parameter style:
named(:param)
Connection
cursor()— create a new cursor.close()— close the connection.commit()— no-op (read-only driver).rollback()— no-op (read-only driver).
Cursor
execute(operation, parameters=None)— execute a query.executemany(operation, seq_of_parameters)— execute the same query multiple times.fetchone()— fetch the next row.fetchmany(size=cursor.arraysize)— fetch up tosizerows.fetchall()— fetch all remaining rows.close()— close the cursor.cancel()— cancel a running query (extension).fetch_df()— fetch results as a pandas DataFrame (extension; requires the[pandas]extra).
Cursor attributes
description— column metadata.rowcount— number of rows affected (-1for SELECT until exhausted).arraysize— default fetch size forfetchmany().
Exception hierarchy
Exception
└── Warning
└── Error
├── InterfaceError
└── DatabaseError
├── DataError
├── OperationalError (auth failures, network errors)
├── IntegrityError
├── InternalError (server errors)
├── ProgrammingError (SQL syntax errors)
└── NotSupportedError (unsupported operations)
import salesforce_datacloud_connector as sfdc
try:
cursor.execute("SELECT * FROM NonexistentTable")
except sfdc.ProgrammingError as e:
print(f"SQL error: {e}")
try:
cursor.execute("INSERT INTO Account VALUES (...)")
except sfdc.NotSupportedError as e:
print(f"Operation not supported: {e}")
fetch_df() extension
cursor.fetch_df() returns a pandas.DataFrame containing all remaining rows
of the most recent query. It is available only when the [pandas] extra is
installed:
pip install --pre "salesforce-datacloud-connector[pandas]"
cursor.execute("SELECT Id, Name, Industry FROM Account LIMIT 5000")
df = cursor.fetch_df()
If pandas is missing, fetch_df() raises an ImportError with the install
command above.
Beta-period limitations
2.0.0b1 is intentionally scoped:
- Read-only — no
INSERT,UPDATE,DELETE, or DDL operations. - No streaming — results buffer in memory (chunked pagination keeps memory bounded but does not stream row-by-row).
- Synchronous API only — async client is on the V2 roadmap.
- No connection pooling — create new connections as needed.
- No prepared statements — parameters bind on each execution.
- No SQLAlchemy dialect — direct DB-API usage only.
- In-memory token cache — tokens are not persisted across runs.
These constraints will be reconsidered as the package moves from beta to GA.
Development
The repository uses uv for dependency
management and hatchling as the build backend.
Setup
# From the repo root
cd salesforce_datacloud_connector
# Install everything (runtime + pandas extra + dev dependency group)
uv sync --all-extras --dev
Run tests
# Unit + integration tests (skips live-org E2E)
uv run pytest -m "not e2e"
# Include the live-org E2E suite (requires real Salesforce credentials —
# see tests/test_e2e_real_datacloud.py for the env vars it expects)
uv run pytest
Lint
uv run ruff check salesforce_datacloud_connector tests --statistics
Build
uv build
# produces dist/salesforce_datacloud_connector-<version>-py3-none-any.whl
# and dist/salesforce_datacloud_connector-<version>.tar.gz
Repo layout
The repository publishes two PyPI packages from one branch — see the
repo-root README for the full layout. This package's source
lives under salesforce_datacloud_connector/salesforce_datacloud_connector/,
its tests under salesforce_datacloud_connector/tests/, and its build config
under salesforce_datacloud_connector/pyproject.toml.
Contributing
Contributions are welcome. Please open an issue before starting on a large
change. PRs should include tests for new functionality and pass ruff check
and pytest -m "not e2e" locally.
See CONTRIBUTING.md at the repo root for the full
contribution guide.
License
BSD-3-Clause. See LICENSE.txt at the repo root.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file salesforce_datacloud_connector-2.0.0b1.tar.gz.
File metadata
- Download URL: salesforce_datacloud_connector-2.0.0b1.tar.gz
- Upload date:
- Size: 173.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bb5bdeed9861fc5c32bf8a3d62b537e6bc581ab371c71162e3e4f152908bc4e
|
|
| MD5 |
055b773273ae1b49202fc364ab73d63d
|
|
| BLAKE2b-256 |
4cbb0e0cb0347ac607707d5ca7111ce295ae81ed74a54040ef4f26707b4f3815
|
Provenance
The following attestation bundles were made for salesforce_datacloud_connector-2.0.0b1.tar.gz:
Publisher:
v2-publish.yml on forcedotcom/salesforce-cdp-connector
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
salesforce_datacloud_connector-2.0.0b1.tar.gz -
Subject digest:
6bb5bdeed9861fc5c32bf8a3d62b537e6bc581ab371c71162e3e4f152908bc4e - Sigstore transparency entry: 1705909236
- Sigstore integration time:
-
Permalink:
forcedotcom/salesforce-cdp-connector@bbd9e7a77e3f50be6798c852845528b1be4b60eb -
Branch / Tag:
refs/tags/salesforce-datacloud-connector-2.0.0b1 - Owner: https://github.com/forcedotcom
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
v2-publish.yml@bbd9e7a77e3f50be6798c852845528b1be4b60eb -
Trigger Event:
release
-
Statement type:
File details
Details for the file salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl.
File metadata
- Download URL: salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl
- Upload date:
- Size: 30.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24f0b88a68074cb29b3e8f41df4360f42c901160dca0f16930b86182fccec603
|
|
| MD5 |
85c245c5bb216898daa72127292927e0
|
|
| BLAKE2b-256 |
20bdbc42dd75ecb38dcbdedab807bb72a8fee73938987e05994c9cc36f4daf13
|
Provenance
The following attestation bundles were made for salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl:
Publisher:
v2-publish.yml on forcedotcom/salesforce-cdp-connector
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl -
Subject digest:
24f0b88a68074cb29b3e8f41df4360f42c901160dca0f16930b86182fccec603 - Sigstore transparency entry: 1705909408
- Sigstore integration time:
-
Permalink:
forcedotcom/salesforce-cdp-connector@bbd9e7a77e3f50be6798c852845528b1be4b60eb -
Branch / Tag:
refs/tags/salesforce-datacloud-connector-2.0.0b1 - Owner: https://github.com/forcedotcom
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
v2-publish.yml@bbd9e7a77e3f50be6798c852845528b1be4b60eb -
Trigger Event:
release
-
Statement type: