Skip to main content

A tiny library that uses .Net SqlBulkCopy to enable fast data loading to Microsoft SQL Server. Apache Arrow is used to serialise data between Python and the native DLL. .Net Native Library AOT compilation is used to generate the native DLL.

Project description

GitHub Latest Tag GitHub Actions Workflow Status PyPI Python Versions

arrowsqlbcpy

A tiny library that uses .Net SqlBulkCopy to enable fast data loading to Microsoft SQL Server. Apache Arrow is used to serialise data between Python and the native DLL. .Net Native Library AOT compilation is used to generate the native DLL.

This library is much faster than any other Python solution, including bcpandas, pyodbc and pymssql. See the benchmark results below.

Performance plot

Installation

Binary wheels are available from PyPi and can be installed using your preferred package manager:

pip install arrowsqlbcpy

or

uv add arrowsqlbcpy

Usage

Connection strings for .Net are documented here

import pandas as pd
from arrowsqlbcpy import bulkcopy_from_pandas

# Create a connection string
cn = r"Server=myServerAddress;Database=myDataBase;Trusted_Connection=True;"
# The table to load into must exist and have the same column names and types as the pandas df
tablename = "test"

df = pd.DataFrame({"a":[1]*10000, "b":[2]*10000, "c":[3]*10000})

bulkcopy_from_pandas(df, cn, tablename)

When testing it can be useful to have pandas create the table for you, see tests/test_load.py for an example.

Requirements

Wheels are available for the latest versions of Windows 64 bit, MacOS ARM 64bit and Ubuntu 64 bit.

Wheels are available for Python 3.9-3.13.

Linux support

The Ubuntu wheels may work on other Linux distros. Building C# native libaries and then packaging appropriately for multiple Linux distros is not straightforward. The simplest solution for most Linux distros is to simply pull the source from Github and build locally. These are the high-level steps:

  1. Install .net https://learn.microsoft.com/en-us/dotnet/core/install/linux
  2. Clone the source

    git clone https://github.com/RusselWebber/arrowsqlbcpy

  3. Install uv https://docs.astral.sh/uv/getting-started/installation/
  4. Build the wheel locally

    uv build --wheel

  5. Install the wheel

    pip install dist/wheel_file.whl

Benchmarks

The benchmarks were run using the richbench package. Tests were run repeatedly to get stable benchmarks.

richbench ./benchmarks

The benchmarks load a 3m row parquet file of New York taxi data. Times are recorded for loading 1000 rows, 10 000 rows, 100 000 rows, 1 000 000 rows and finally all 3 000 000 rows.

The benchmarks have a baseline of using pandas to_sql() and SQLAlchemy with pyodbc and pymssql. This is a common solution for loading pandas dataframes into SQL Server. A batch size of 10 000 rows was used in the benchmarks.

The benchmarks show the time taken to load using various alternative strategies:

Label Description
fast_executemany=True Use pandas to_sql(), SQLAlchemy, pyodbc, pymssql with the fast_executemany=True option as discussed here
bcpandas Use the bcpandas package to load the dataframes. The package writes temp files and spawns bcp processes to load them
arrowsqlbcpy This package using .Net SqlBulkCopy

The richbench tables show the min, max and mean time in seconds for the baseline in the left three columns; then the min, max, mean time in seconds for the alternative strategy.

For example this row:

Benchmark Min Max Mean Min (+) Max (+) Mean (+)
1 000 rows - fast_executemany=True 1.0 1.0 1.0 0.5 (2x) 0.5 (2x) 0.5 (2x)

should be interpreted as: the strategy of setting fast_executemany=True resulted in a 2x speedup over the baseline when loading 1000 rows, so fast_executemany=True reduced the average time in seconds to load 1000 rows from 1.0 to 0.5, a 2x speedup.

Windows 11 (local db)

Summary results

1000 10000 10000 1000000 3000000
df.to_sql() 0.055 0.495 4.601 46.648 198.57
arrowsqlbcpy 0.106 (-1.9x) 0.101 (4.9x) 0.933 (4.9x) 8.864 (5.3x) 26.048 (7.6x)
bcpandas 0.156 (-3.0x) 0.336 (1.5x) 2.567 (1.8x) 24.627 (1.9x) 72.353 (2.7x)
fast_executemany=True 0.035 (2.4x) 0.235 (2.3x) 2.246 (2.3x) 22.044 (2.1x) 65.344 (3.0x)

Detailed richbench results

Benchmark Min Max Mean Min (+) Max (+) Mean (+)
1 000 - arrowsqlbcp 0.053 0.056 0.055 0.015 (3.6x) 0.198 (-3.5x) 0.106 (-1.9x)
10 000 rows - arrowsqlbcp 0.489 0.502 0.495 0.099 (4.9x) 0.103 (4.9x) 0.101 (4.9x)
100 000 rows - arrowsqlbcp 4.587 4.616 4.601 0.922 (5.0x) 0.944 (4.9x) 0.933 (4.9x)
1 000 000 rows - arrowsqlbcp 46.558 46.738 46.648 8.842 (5.3x) 8.886 (5.3x) 8.864 (5.3x)
3 000 000 rows - arrowsqlbcp 198.464 198.676 198.570 26.016 (7.6x) 26.079 (7.6x) 26.048 (7.6x)
1 000 - bcpandas 0.051 0.052 0.052 0.121 (-2.4x) 0.190 (-3.6x) 0.156 (-3.0x)
10 000 rows - bcpandas 0.499 0.500 0.500 0.333 (1.5x) 0.339 (1.5x) 0.336 (1.5x)
100 000 rows - bcpandas 4.543 4.547 4.545 2.565 (1.8x) 2.570 (1.8x) 2.567 (1.8x)
1 000 000 rows - bcpandas 45.298 46.443 45.871 24.581 (1.8x) 24.674 (1.9x) 24.627 (1.9x)
3 000 000 rows - bcpandas 197.292 197.699 197.496 72.301 (2.7x) 72.405 (2.7x) 72.353 (2.7x)
1 000 - fast_executemany=True 0.052 0.116 0.084 0.030 (1.7x) 0.041 (2.9x) 0.035 (2.4x)
10 000 rows - fast_executemany=True 0.513 0.550 0.531 0.233 (2.2x) 0.237 (2.3x) 0.235 (2.3x)
100 000 rows - fast_executemany=True 5.018 5.374 5.196 2.239 (2.2x) 2.253 (2.4x) 2.246 (2.3x)
1 000 000 rows - fast_executemany=True 45.470 45.582 45.526 22.036 (2.1x) 22.051 (2.1x) 22.044 (2.1x)
3 000 000 rows - fast_executemany=True 194.152 194.523 194.337 65.153 (3.0x) 65.534 (3.0x) 65.344 (3.0x)

Ubuntu (WSL2) (local db in docker container)

Summary results

1000 10000 10000 1000000 3000000
df.to_sql() 0.070 0.506 5.074 50.089 208.811
arrowsqlbcpy 0.154 (-2.2x) 0.120 (4.2x) 1.070 (4.7x) 10.572 (4.7x) 30.673 (6.8x)
bcpandas 0.158 (-2.4x) 0.438 (1.2x) 3.383 (1.5x) 32.774 (1.5x) 95.200 (2.2x)
fast_executemany=True 0.059 (1.6x) 0.323 (1.7x) 3.039 (1.6x) 29.810 (1.7x) 87.419 (2.4x)

Detailed richbench results

Benchmark Min Max Mean Min (+) Max (+) Mean (+)
1 000 - arrowsqlbcp 0.069 0.071 0.070 0.028 (2.4x) 0.280 (-3.9x) 0.154 (-2.2x)
10 000 rows - arrowsqlbcp 0.503 0.510 0.506 0.115 (4.4x) 0.126 (4.0x) 0.120 (4.2x)
100 000 rows - arrowsqlbcp 5.062 5.085 5.074 1.064 (4.8x) 1.076 (4.7x) 1.070 (4.7x)
1 000 000 rows - arrowsqlbcp 49.746 50.433 50.089 10.566 (4.7x) 10.578 (4.8x) 10.572 (4.7x)
3 000 000 rows - arrowsqlbcp 208.669 208.953 208.811 30.364 (6.9x) 30.982 (6.7x) 30.673 (6.8x)
1 000 - bcpandas 0.066 0.068 0.067 0.149 (-2.2x) 0.167 (-2.5x) 0.158 (-2.4x)
10 000 rows - bcpandas 0.500 0.508 0.504 0.431 (1.2x) 0.444 (1.1x) 0.438 (1.2x)
100 000 rows - bcpandas 5.016 5.028 5.022 3.369 (1.5x) 3.397 (1.5x) 3.383 (1.5x)
1 000 000 rows - bcpandas 49.771 50.535 50.153 32.603 (1.5x) 32.945 (1.5x) 32.774 (1.5x)
3 000 000 rows - bcpandas 208.104 208.350 208.227 95.057 (2.2x) 95.343 (2.2x) 95.200 (2.2x)
1 000 - fast_executemany=True 0.068 0.116 0.092 0.049 (1.4x) 0.069 (1.7x) 0.059 (1.6x)
10 000 rows - fast_executemany=True 0.514 0.557 0.535 0.322 (1.6x) 0.324 (1.7x) 0.323 (1.7x)
100 000 rows - fast_executemany=True 4.934 4.961 4.948 3.023 (1.6x) 3.056 (1.6x) 3.039 (1.6x)
1 000 000 rows - fast_executemany=True 49.298 50.658 49.978 29.783 (1.7x) 29.836 (1.7x) 29.810 (1.7x)
3 000 000 rows - fast_executemany=True 207.245 213.096 210.171 87.219 (2.4x) 87.620 (2.4x) 87.419 (2.4x)

Benchmarks for the typical case of a remote DB still need to be added.

Limitations

bulkcopy_from_pandas() will establish its own database connection to load the data, reusing existing connections and transactions are not supported.

Only basic MacOS testing has been done.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

arrowsqlbcpy-0.2.1-cp313-cp313-win_amd64.whl (13.0 MB view details)

Uploaded CPython 3.13Windows x86-64

arrowsqlbcpy-0.2.1-cp313-cp313-manylinux_2_17_x86_64.whl (13.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

arrowsqlbcpy-0.2.1-cp313-cp313-macosx_14_0_arm64.whl (21.1 MB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

arrowsqlbcpy-0.2.1-cp312-cp312-win_amd64.whl (13.0 MB view details)

Uploaded CPython 3.12Windows x86-64

arrowsqlbcpy-0.2.1-cp312-cp312-manylinux_2_17_x86_64.whl (13.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

arrowsqlbcpy-0.2.1-cp312-cp312-macosx_14_0_arm64.whl (21.1 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

arrowsqlbcpy-0.2.1-cp311-cp311-win_amd64.whl (13.0 MB view details)

Uploaded CPython 3.11Windows x86-64

arrowsqlbcpy-0.2.1-cp311-cp311-manylinux_2_17_x86_64.whl (13.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

arrowsqlbcpy-0.2.1-cp311-cp311-macosx_11_0_universal2.whl (21.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ universal2 (ARM64, x86-64)

arrowsqlbcpy-0.2.1-cp310-cp310-win_amd64.whl (13.0 MB view details)

Uploaded CPython 3.10Windows x86-64

arrowsqlbcpy-0.2.1-cp310-cp310-manylinux_2_17_x86_64.whl (13.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

arrowsqlbcpy-0.2.1-cp39-cp39-win_amd64.whl (13.0 MB view details)

Uploaded CPython 3.9Windows x86-64

arrowsqlbcpy-0.2.1-cp39-cp39-manylinux_2_17_x86_64.whl (13.7 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file arrowsqlbcpy-0.2.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 d6c729417ff42eb7e579d196771faf0278ed4542577c3fb588bfc510b492130c
MD5 77dac3a7d9f038defa2790aeb6a51009
BLAKE2b-256 cb5104422a0657eee0eaf2be1fa01065c74a860928f5ef71f3dc0963b7e6fe55

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp313-cp313-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp313-cp313-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 39db17c5f29193ace0b48d831221514a65c5242239ead58f11e953095754b9bc
MD5 f89e65f781be129ebe79c5c3f71c4430
BLAKE2b-256 caab36116ab8a0c3ac2da44cfb97fc3c725164986ee7fc72088269a0ef1aaca1

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 1e6cbab8f34bd30c52ef95c07a84b60356ddd298c852283de83fddaab7dd485d
MD5 fd61ac7819854b42db55956f005bd7c6
BLAKE2b-256 5f8bdc389adeb20cec0bd2b6f246f0ab8520b17f96a1dcbc0e5c959d88a9afa9

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 63d4eda58bb76d716fb04daf5ba0e5bd5924118a6a29c94e20376f9f825693e4
MD5 f37d2f67c8e52491cae2b44787a895b7
BLAKE2b-256 23d49d278506ab498949821ad4afbcb39ba1b6a4067669b32638f9e9b7454edb

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp312-cp312-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp312-cp312-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 c3f826b61bd21dce18cdefe80f5b4079cc9e125b750c242fb919d771a504e6c1
MD5 300d0546fb7a6f8a233caa85e9f1810c
BLAKE2b-256 690d3c19d5f6781a4f37ffb5d8ebb26a85876747ea081b13d332bb711582bab6

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 d59e71ec40456237fbf5ef5c1ad958adf023556873e6fcdd633423658f67e8ab
MD5 682ba30a0fd371fc59de728865e1e50d
BLAKE2b-256 d63216420acf5410bb293ad9854522e9fe5ce759e03900cd0b6d59fc4665a135

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a7537ead33a314e2275a2f36610679ea26ac32deba15370f52883af0ebef5fd9
MD5 35c6c4bf03e069b55adf1856727847f8
BLAKE2b-256 8782c41e3d781cd61aac73feb9321f851f687f611763c29f62462727356d2ae1

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp311-cp311-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp311-cp311-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 2460190ee640243e76ddd81ac9f6e6a343013d45690b1f6ba4be84cdd70bb99c
MD5 2c45a6399a67e7c20a39f907c113d267
BLAKE2b-256 8dea77819ac8e0b138210655107257aa891ac3aed9316b14f6d56ac8e0902fab

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp311-cp311-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp311-cp311-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 895fad6cbc5e4743c48a0f89e726d8300fcf00d4ce09d1f0bfe8691419f25b68
MD5 587ca536ce02923851dae01b33c2c746
BLAKE2b-256 225f8032eed28c5db281f62fe8b5c5e2b6bf049d38d47af21255f47f299d9f0f

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 61fc20e0b206804e5d97243a7871d3c6d5bb3946a7b439f7dd4e12666c806ac4
MD5 742600cad20f09a210999feadabb1ce5
BLAKE2b-256 47a2e898d861d7f1cdb7bcc5166738acbd439e91dba2c5a2a257bbda84708e72

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp310-cp310-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp310-cp310-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 e90867784861c0f06d8e1bfbc0f0b4309651989f45aa8ed89cd23377667b4fa5
MD5 b6201b91dff4c36e940e0c081857c66f
BLAKE2b-256 fdb3d5ca6f22a31f03580fc13910125bccac11dd973c9ba7ed36cb4fd60566ab

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 0b5f5277219834bcc46bd6f1d0e0cca21f88d4d6bdba32b38fa7de7c581de82c
MD5 80d1b747bbed911751febc2535752fbe
BLAKE2b-256 da9d56e80f2ef0d69a63db86ae226d347795a90ec80b2323d9dcc10afc50c61a

See more details on using hashes here.

File details

Details for the file arrowsqlbcpy-0.2.1-cp39-cp39-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for arrowsqlbcpy-0.2.1-cp39-cp39-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 d2492bc51f801af6d52a8d78d3c2eb0a30dfd0644114bea5b009294b1de88c11
MD5 a75b5e151da1a610ffc0b1067ae2b777
BLAKE2b-256 cc3d19693f4bd3d517e66953637a748b113ed3a3edf2d8a3d50a133e20be9dc5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page