Skip to main content

Microsoft SQL Server bcp (Bulk Copy) wrapper with Azure Synapse Blob alternative

Project description

bcpyaz

What is it?

This package is a wrapper for Microsoft's SQL Server bcp utility. Current database drivers available in Python are not fast enough for transferring millions of records (yes, I have tried pyodbc fast_execute_many). Despite the IO hits, the fastest option by far is saving the data to a CSV file in file system (preferably /dev/shm tmpfs) and using the bcp utility to transfer the CSV file to SQL Server.

Azure Synapse / Blob extensions

If the following env vars are set (with examples):

  AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=myblogstorageacct;AccountKey=jhadsfas235876dha90/17891hfh25125/12h2udfjs347512==;EndpointSuffix=core.windows.net"
  AZURE_TEMP_STORAGE_CONTAINER="tempcontainer"

Then instead of using the bcp command line utility, the CSV file will be copied to the Azure Storage Blob temporarily and COPY will be used to transfer it from there into the Synapse Database.

You may need to

GRANT ADMINISTER DATABASE BULK OPERATIONS TO <user>

on Synapse to the user that is connecting.

The sqlcmd utility will still be required.

How Can I Install It?

Make sure your computeer has the requirements.

Install locally from a git clone:

pip install -e .

or via requirements.txt:

-e git+https://git@github.com/Arcturus-io/bcpyaz.git#egg=bcpyaz

Examples

Following examples show you how to load (1) flat files and (2) DataFrame objects to SQL Server using this package.

Flat File

Following example assumes that you have a comma separated file with no qualifier in path 'tests/data1.csv'. The code below sends the the file to SQL Server.

import bcpy


sql_config = {
    'server': 'sql_server_hostname',
    'database': 'database_name',
    'username': 'test_user',
    'password': 'test_user_password1234'
}
sql_table_name = 'test_data1'
csv_file_path = 'tests/data1.csv'
flat_file = bcpy.FlatFile(qualifier='', path=csv_file_path)
sql_table = bcpy.SqlTable(sql_config, table=sql_table_name)
flat_file.to_sql(sql_table)

DataFrame

The following example creates a DataFrame with 100 rows and 4 columns populated with random data and then it sends it to SQL Server.

import bcpy
import numpy as np
import pandas as pd


sql_config = {
    'server': 'sql_server_hostname',
    'database': 'database_name',
    'username': 'test_user',
    'password': 'test_user_password1234'
}
table_name = 'test_dataframe'
df = pd.DataFrame(np.random.randint(-100, 100, size=(100, 4)),
                  columns=list('ABCD'))
bdf = bcpy.DataFrame(df)
sql_table = bcpy.SqlTable(sql_config, table=table_name)
bdf.to_sql(sql_table)

Requirements

You need a working version of Microsoft bcp installed in your system. Your PATH environment variable should contain the directory of the bcp utility. Following are the installation tutorials for different operating systems.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bcpyaz-0.2.0.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

bcpyaz-0.2.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file bcpyaz-0.2.0.tar.gz.

File metadata

  • Download URL: bcpyaz-0.2.0.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6

File hashes

Hashes for bcpyaz-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bc2a93d7f387e8b26162b2fe9cc5dc70d90c453db2971705fe59ce78234c3300
MD5 cee89eb13762b02071df66f7f80b7eda
BLAKE2b-256 2573bdf14576db83317778d1a54babcfe043714191b79f52f93da928e8912ce9

See more details on using hashes here.

File details

Details for the file bcpyaz-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: bcpyaz-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6

File hashes

Hashes for bcpyaz-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 47c70c5bc24a40da78e0fd5ad01c7722ed2b4bc0272906e52cfab0159b56d5c5
MD5 f07ed8628cf16f36f7ebb3aab6a72080
BLAKE2b-256 85f6c9f3fd0764f351c8eb6be16c8b29296efadae66fbe0019e6c4b85c44af49

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page