Python interface to the Salesforce.com Bulk API.
Project description
Salesforce Bulk
Python client library for accessing the asynchronous Salesforce.com Bulk API.
Installation
pip install salesforce-bulk-2-7
Authentication
To access the Bulk API you need to authenticate a user into Salesforce. The easiest way to do this is just to supply username, password and security_token. This library will use the simple-salesforce package to handle password based authentication.
from salesforce-bulk-2-7 import SalesforceBulk
bulk = SalesforceBulk(username=username, password=password, security_token=security_token)
...
Alternatively if you run have access to a session ID and instance_url you can use those directly:
from urlparse import urlparse
from salesforce-bulk-2-7 import SalesforceBulk
bulk = SalesforceBulk(sessionId=sessionId, host=urlparse(instance_url).hostname)
...
Operations
The basic sequence for driving the Bulk API is:
Create a new job
Add one or more batches to the job
Close the job
Wait for each batch to finish
Bulk Query
bulk.create_query_job(object_name, contentType='JSON')
Using API v39.0 or higher, you can also use the queryAll operation:
bulk.create_queryall_job(object_name, contentType='JSON')
Example
import json
from salesforce-bulk-2-7.util import IteratorBytesIO
job = bulk.create_query_job("Contact", contentType='JSON')
batch = bulk.query(job, "select Id,LastName from Contact")
bulk.close_job(job)
while not bulk.is_batch_done(batch):
sleep(10)
for result in bulk.get_all_results_for_query_batch(batch):
result = json.load(IteratorBytesIO(result))
for row in result:
print row # dictionary rows
Same example but for CSV:
import unicodecsv
job = bulk.create_query_job("Contact", contentType='CSV')
batch = bulk.query(job, "select Id,LastName from Contact")
bulk.close_job(job)
while not bulk.is_batch_done(batch):
sleep(10)
for result in bulk.get_all_results_for_query_batch(batch):
reader = unicodecsv.DictReader(result, encoding='utf-8')
for row in reader:
print(row) # dictionary rows
Note that while CSV is the default for historical reasons, JSON should be prefered since CSV has some drawbacks including its handling of NULL vs empty string.
PK Chunk Header
If you are querying a large number of records you probably want to turn on PK Chunking:
bulk.create_query_job(object_name, contentType='CSV', pk_chunking=True)
That will use the default setting for chunk size. You can use a different chunk size by providing a number of records per chunk:
bulk.create_query_job(object_name, contentType='CSV', pk_chunking=100000)
Additionally if you want to do something more sophisticated you can provide a header value:
bulk.create_query_job(object_name, contentType='CSV', pk_chunking='chunkSize=50000; startRow=00130000000xEftMGH')
Additionally if you want to set a http header yourself, you can pass a list of custom header values that will be added to the create job salesforce bulk api call:
bulk.create_query_job(object_name, contentType='CSV', pk_chunking='chunkSize=50000; startRow=00130000000xEftMGH', extra_headers={'Sforce-Disable-Batch-Retry':'TRUE'})
Bulk Insert, Update, Delete
All Bulk upload operations work the same. You set the operation when you create the job. Then you submit one or more documents that specify records with columns to insert/update/delete. When deleting you should only submit the Id for each record.
For efficiency you should use the post_batch method to post each batch of data. (Note that a batch can have a maximum 10,000 records and be 1GB in size.) You pass a generator or iterator into this function and it will stream data via POST to Salesforce. For help sending CSV formatted data you can use the salesforce_bulk.CsvDictsAdapter class. It takes an iterator returning dictionaries and returns an iterator which produces CSV data.
Full example:
from salesforce-bulk-2-7 import CsvDictsAdapter
job = bulk.create_insert_job("Account", contentType='CSV')
accounts = [dict(Name="Account%d" % idx) for idx in xrange(5)]
csv_iter = CsvDictsAdapter(iter(accounts))
batch = bulk.post_batch(job, csv_iter)
bulk.wait_for_batch(job, batch)
bulk.close_job(job)
print("Done. Accounts uploaded.")
Concurrency mode
When creating the job, pass concurrency='Serial' or concurrency='Parallel' to set the concurrency mode for the job.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file salesforce-bulk-2-7-2.2.4.tar.gz
.
File metadata
- Download URL: salesforce-bulk-2-7-2.2.4.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd7f09c0b385e83d4f88564fef2e6af156a81f34e5d389d51486b6052ad8ebf2 |
|
MD5 | d575595966bf838d554011477f5b5745 |
|
BLAKE2b-256 | 1dcb57dda769d3a81f4709c2a24661746797edb1637b535d645902e80a506cd5 |