Skip to main content

Python interface to the Salesforce.com Bulk API.

Project description

DataOps Salesforce Bulk

This library was forked from the salesforce-bulk library. It adds a feature for dealing with pk chunking from Salesforce. Author credit is given to the author of the original salesforce-bulk library (https://pypi.org/project/salesforce-bulk/)

Python client library for accessing the asynchronous Salesforce.com Bulk API.

Installation

pip install dataops-salesforce-bulk

Authentication

To access the Bulk API you need to authenticate a user into Salesforce. The easiest way to do this is just to supply username, password and security_token. This library will use the simple-salesforce package to handle password based authentication.

::

from salesforce_bulk import SalesforceBulk

bulk = SalesforceBulk(username=username, password=password, security_token=security_token)
...

Alternatively if you run have access to a session ID and instance_url you can use those directly:

::

from urlparse import urlparse
from salesforce_bulk import SalesforceBulk

bulk = SalesforceBulk(sessionId=sessionId, host=urlparse(instance_url).hostname)
...

Operations

The basic sequence for driving the Bulk API is:

  1. Create a new job
  2. Add one or more batches to the job
  3. Close the job
  4. Wait for each batch to finish

Bulk Query

bulk.create_query_job(object_name, contentType='JSON')

Using API v39.0 or higher, you can also use the queryAll operation:

bulk.create_queryall_job(object_name, contentType='JSON')

Example

::

from salesforce_bulk.util import IteratorBytesIO
import json
job = bulk.create_query_job("Contact", contentType='JSON')
batch = bulk.query(job, "select Id,LastName from Contact")
bulk.close_job(job)
while not bulk.is_batch_done(batch):
    sleep(10)

for result in bulk.get_all_results_for_query_batch(batch):
    result = json.load(IteratorBytesIO(result))
    for row in result:
        print row # dictionary rows

Same example but for CSV:

::

import unicodecsv
job = bulk.create_query_job("Contact", contentType='CSV')
batch = bulk.query(job, "select Id,LastName from Contact")
bulk.close_job(job)
while not bulk.is_batch_done(batch):
    sleep(10)

for result in bulk.get_all_results_for_query_batch(batch):
    reader = unicodecsv.DictReader(result, encoding='utf-8')
    for row in reader:
        print row # dictionary rows

Note that while CSV is the default for historical reasons, JSON should be prefered since CSV has some drawbacks including its handling of NULL vs empty string.

PK Chunk Header ^^^^^^^^^^^^^^^

If you are querying a large number of records you probably want to turn on PK Chunking <https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm>_:

bulk.create_query_job(object_name, contentType='CSV', pk_chunking=True)

That will use the default setting for chunk size. You can use a different chunk size by providing a number of records per chunk:

bulk.create_query_job(object_name, contentType='CSV', pk_chunking=100000)

Additionally if you want to do something more sophisticated you can provide a header value:

bulk.create_query_job(object_name, contentType='CSV', pk_chunking='chunkSize=50000; startRow=00130000000xEftMGH')

Bulk Insert, Update, Delete

All Bulk upload operations work the same. You set the operation when you create the job. Then you submit one or more documents that specify records with columns to insert/update/delete. When deleting you should only submit the Id for each record.

For efficiency you should use the post_batch method to post each batch of data. (Note that a batch can have a maximum 10,000 records and be 1GB in size.) You pass a generator or iterator into this function and it will stream data via POST to Salesforce. For help sending CSV formatted data you can use the salesforce_bulk.CsvDictsAdapter class. It takes an iterator returning dictionaries and returns an iterator which produces CSV data.

Full example:

::

from salesforce_bulk import CsvDictsAdapter

job = bulk.create_insert_job("Account", contentType='CSV')
accounts = [dict(Name="Account%d" % idx) for idx in xrange(5)]
csv_iter = CsvDictsAdapter(iter(accounts))
batch = bulk.post_batch(job, csv_iter)
bulk.wait_for_batch(job, batch)
bulk.close_job(job)
print "Done. Accounts uploaded."

Concurrency mode ^^^^^^^^^^^^^^^^

When creating the job, pass concurrency='Serial' or concurrency='Parallel' to set the concurrency mode for the job.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataops-salesforce-bulk-0.0.7.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

dataops_salesforce_bulk-0.0.7-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file dataops-salesforce-bulk-0.0.7.tar.gz.

File metadata

  • Download URL: dataops-salesforce-bulk-0.0.7.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for dataops-salesforce-bulk-0.0.7.tar.gz
Algorithm Hash digest
SHA256 db1f22a5b4b9c68a65fe53cf77cf663b836b22a23b1dbf8afc5aa1436b89be2b
MD5 84debb0347ac7171ec041c8688eb497b
BLAKE2b-256 503afa1616150d426ea19d96a2a548fbf2c6a63dba2a229385a327f98d8e269d

See more details on using hashes here.

File details

Details for the file dataops_salesforce_bulk-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: dataops_salesforce_bulk-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for dataops_salesforce_bulk-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 6f8e6a6e139bf731c892ffbeb0814970db3e79ad37a9b281f6df7d5a5ca57758
MD5 9ea53b63e427360b2a8c7920f2dacf57
BLAKE2b-256 a5ab642e315a5d1795392bbea9da0ab99336ba227a5c6585f3bab78c0a2dba5d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page