Skip to main content

Salesforce Data Cloud Python connector (V3 driver, beta)

Project description

Salesforce Data Cloud Python Connector

The official Salesforce Data Cloud Python connector — a DB-API 2.0 compliant driver for querying Salesforce Data Cloud using the Query API. Designed for use from Jupyter notebooks, pandas pipelines, ETL scripts, and any other Python data tooling.

This package (salesforce-datacloud-connector) supersedes the salesforce-cdp-connector package. New projects should adopt this package; existing salesforce-cdp-connector users should plan to migrate.

Python Version License

Important: this is a beta release

salesforce-datacloud-connector is currently published as a beta on PyPI (2.0.0b1). The public API surface may change before the GA release.

  • Install with the --pre flag — pip will not pick up pre-releases otherwise:

    pip install --pre salesforce-datacloud-connector
    
  • Pin your version explicitly in production code and your dependency manifests:

    salesforce-datacloud-connector==2.0.0b1
    

    This protects you from accidental upgrades to a later beta or release candidate that may include breaking changes.

Features

  • DB-API 2.0 compliant — standard Python database interface.
  • Three OAuth flows — Username/Password, JWT Bearer Token, Refresh Token.
  • Pandas integration — works with pandas.read_sql() and cursor.fetch_df() (under the [pandas] extra).
  • Notebook-ready — interactive exploration and visualization in Jupyter.
  • Parameterized queries — safe SQL execution with named parameters (:param).
  • Efficient pagination — chunked, low-memory fetching for large result sets.
  • Type conversion — Data Cloud types map to native Python types (str, int, Decimal, float, bool, datetime.date, datetime.datetime).
  • Comprehensive error hierarchy — full DB-API 2.0 exception tree.

Installation

# Basic installation (always include --pre during the beta)
pip install --pre salesforce-datacloud-connector

# With pandas + pyarrow support (recommended for analytics use)
pip install --pre "salesforce-datacloud-connector[pandas]"

Requirements

  • Python 3.8 or newer.
  • requests >= 2.31.0
  • cryptography >= 41.0.0
  • pyjwt >= 2.8.0
  • python-dateutil >= 2.8.0

Optional (under [pandas] extra):

  • pandas >= 2.0.0
  • pyarrow >= 14.0.0

Quickstart

import salesforce_datacloud_connector as sfdc

# Connect to Salesforce Data Cloud (refresh token flow shown — see below for
# all three flows)
conn = sfdc.connect(
    login_url="https://login.salesforce.com",
    auth_type="refresh_token",
    client_id="YOUR_CLIENT_ID",
    client_secret="YOUR_CLIENT_SECRET",
    refresh_token="YOUR_REFRESH_TOKEN",
)

cursor = conn.cursor()
cursor.execute("SELECT Id, Name FROM Account LIMIT 10")

for row in cursor:
    print(row)

cursor.close()
conn.close()

The connector also supports the standard Python context-manager idioms — both the connection and the cursor close cleanly when their with blocks exit:

import salesforce_datacloud_connector as sfdc

with sfdc.connect(...) as conn:
    with conn.cursor() as cursor:
        cursor.execute("SELECT COUNT(*) FROM Account")
        (count,) = cursor.fetchone()
        print(f"Total accounts: {count}")

Authentication

The connector ships three OAuth flows, all driven through sfdc.connect(...) via the auth_type keyword.

Username / Password

import salesforce_datacloud_connector as sfdc

conn = sfdc.connect(
    login_url="https://login.salesforce.com",
    auth_type="username_password",
    username="user@example.com",
    password="your_password",
    client_id="your_connected_app_client_id",
    client_secret="your_connected_app_client_secret",
)

JWT Bearer Token

import salesforce_datacloud_connector as sfdc

with open("private_key.pem", "r") as f:
    private_key = f.read()

conn = sfdc.connect(
    login_url="https://login.salesforce.com",
    auth_type="jwt",
    username="user@example.com",
    client_id="your_connected_app_client_id",
    jwt_private_key=private_key,
)

Refresh Token (recommended for long-running services)

import os

import salesforce_datacloud_connector as sfdc

conn = sfdc.connect(
    login_url=os.environ["SFDC_LOGIN_URL"],
    auth_type="refresh_token",
    client_id=os.environ["SFDC_CLIENT_ID"],
    client_secret=os.environ["SFDC_CLIENT_SECRET"],
    refresh_token=os.environ["SFDC_REFRESH_TOKEN"],
)

Credential hygiene

Never hardcode credentials in source. Load them from environment variables (local development) or your platform's secret manager (production). Rotate credentials regularly and avoid using production credentials for development.

Usage

Parameterized queries

cursor.execute(
    """
    SELECT Id, Name, Email
    FROM Contact
    WHERE Status = :status AND CreatedDate > :min_date
    LIMIT :limit
    """,
    {
        "status": "Active",
        "min_date": "2024-01-01",
        "limit": 100,
    },
)

Pandas integration

import pandas as pd
import salesforce_datacloud_connector as sfdc

conn = sfdc.connect(...)

# Method 1 — pandas.read_sql
df = pd.read_sql("SELECT * FROM Account LIMIT 1000", conn)

# Method 2 — cursor.fetch_df
cursor = conn.cursor()
cursor.execute("SELECT Id, Name, Industry FROM Account")
df = cursor.fetch_df()

print(df.describe())
print(df.groupby("Industry").size())

cursor.fetch_df() lazily imports pandas. Calling it without the [pandas] extra installed raises ImportError with installation guidance.

Fetch methods

row = cursor.fetchone()       # one row
rows = cursor.fetchmany(100)  # multiple rows
all_rows = cursor.fetchall()  # all remaining rows

for row in cursor:            # iteration
    process(row)

Configuration: dataspace and workload

conn = sfdc.connect(
    ...,
    dataspace="custom_dataspace",  # optional; defaults to the org default
    workload="my_application",     # optional; surfaced in observability
)

Cursor array size

cursor = conn.cursor()
cursor.arraysize = 1000  # default fetch size for fetchmany()

Type mapping

Data Cloud type Python type Notes
Varchar str
Numeric int, Decimal int when scale is 0; Decimal otherwise
Integer, BigInt int
Float, Double float
Boolean bool
Date datetime.date
TimestampTZ datetime.datetime timezone-aware

API reference

DB-API 2.0 surface

This connector follows PEP 249 — Python Database API Specification v2.0.

  • API level: 2.0
  • Thread safety: 1 (threads may share the module, but not connections)
  • Parameter style: named (:param)

Connection

  • cursor() — create a new cursor.
  • close() — close the connection.
  • commit() — no-op (read-only driver).
  • rollback() — no-op (read-only driver).

Cursor

  • execute(operation, parameters=None) — execute a query.
  • executemany(operation, seq_of_parameters) — execute the same query multiple times.
  • fetchone() — fetch the next row.
  • fetchmany(size=cursor.arraysize) — fetch up to size rows.
  • fetchall() — fetch all remaining rows.
  • close() — close the cursor.
  • cancel() — cancel a running query (extension).
  • fetch_df() — fetch results as a pandas DataFrame (extension; requires the [pandas] extra).

Cursor attributes

  • description — column metadata.
  • rowcount — number of rows affected (-1 for SELECT until exhausted).
  • arraysize — default fetch size for fetchmany().

Exception hierarchy

Exception
└── Warning
└── Error
    ├── InterfaceError
    └── DatabaseError
        ├── DataError
        ├── OperationalError      (auth failures, network errors)
        ├── IntegrityError
        ├── InternalError         (server errors)
        ├── ProgrammingError      (SQL syntax errors)
        └── NotSupportedError     (unsupported operations)
import salesforce_datacloud_connector as sfdc

try:
    cursor.execute("SELECT * FROM NonexistentTable")
except sfdc.ProgrammingError as e:
    print(f"SQL error: {e}")

try:
    cursor.execute("INSERT INTO Account VALUES (...)")
except sfdc.NotSupportedError as e:
    print(f"Operation not supported: {e}")

fetch_df() extension

cursor.fetch_df() returns a pandas.DataFrame containing all remaining rows of the most recent query. It is available only when the [pandas] extra is installed:

pip install --pre "salesforce-datacloud-connector[pandas]"
cursor.execute("SELECT Id, Name, Industry FROM Account LIMIT 5000")
df = cursor.fetch_df()

If pandas is missing, fetch_df() raises an ImportError with the install command above.

Beta-period limitations

2.0.0b1 is intentionally scoped:

  1. Read-only — no INSERT, UPDATE, DELETE, or DDL operations.
  2. No streaming — results buffer in memory (chunked pagination keeps memory bounded but does not stream row-by-row).
  3. Synchronous API only — async client is on the V2 roadmap.
  4. No connection pooling — create new connections as needed.
  5. No prepared statements — parameters bind on each execution.
  6. No SQLAlchemy dialect — direct DB-API usage only.
  7. In-memory token cache — tokens are not persisted across runs.

These constraints will be reconsidered as the package moves from beta to GA.

Development

The repository uses uv for dependency management and hatchling as the build backend.

Setup

# From the repo root
cd salesforce_datacloud_connector

# Install everything (runtime + pandas extra + dev dependency group)
uv sync --all-extras --dev

Run tests

# Unit + integration tests (skips live-org E2E)
uv run pytest -m "not e2e"

# Include the live-org E2E suite (requires real Salesforce credentials —
# see tests/test_e2e_real_datacloud.py for the env vars it expects)
uv run pytest

Lint

uv run ruff check salesforce_datacloud_connector tests --statistics

Build

uv build
# produces dist/salesforce_datacloud_connector-<version>-py3-none-any.whl
# and  dist/salesforce_datacloud_connector-<version>.tar.gz

Repo layout

The repository publishes two PyPI packages from one branch — see the repo-root README for the full layout. This package's source lives under salesforce_datacloud_connector/salesforce_datacloud_connector/, its tests under salesforce_datacloud_connector/tests/, and its build config under salesforce_datacloud_connector/pyproject.toml.

Contributing

Contributions are welcome. Please open an issue before starting on a large change. PRs should include tests for new functionality and pass ruff check and pytest -m "not e2e" locally.

See CONTRIBUTING.md at the repo root for the full contribution guide.

License

BSD-3-Clause. See LICENSE.txt at the repo root.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

salesforce_datacloud_connector-2.0.0b1.tar.gz (173.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file salesforce_datacloud_connector-2.0.0b1.tar.gz.

File metadata

File hashes

Hashes for salesforce_datacloud_connector-2.0.0b1.tar.gz
Algorithm Hash digest
SHA256 6bb5bdeed9861fc5c32bf8a3d62b537e6bc581ab371c71162e3e4f152908bc4e
MD5 055b773273ae1b49202fc364ab73d63d
BLAKE2b-256 4cbb0e0cb0347ac607707d5ca7111ce295ae81ed74a54040ef4f26707b4f3815

See more details on using hashes here.

Provenance

The following attestation bundles were made for salesforce_datacloud_connector-2.0.0b1.tar.gz:

Publisher: v2-publish.yml on forcedotcom/salesforce-cdp-connector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl.

File metadata

File hashes

Hashes for salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 24f0b88a68074cb29b3e8f41df4360f42c901160dca0f16930b86182fccec603
MD5 85c245c5bb216898daa72127292927e0
BLAKE2b-256 20bdbc42dd75ecb38dcbdedab807bb72a8fee73938987e05994c9cc36f4daf13

See more details on using hashes here.

Provenance

The following attestation bundles were made for salesforce_datacloud_connector-2.0.0b1-py3-none-any.whl:

Publisher: v2-publish.yml on forcedotcom/salesforce-cdp-connector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page