High-level wrapper around BCP for high performance data transfers between pandas and SQL Server. No knowledge of BCP required!!
Project description
bcpandas
High-level wrapper around BCP for high performance data transfers between pandas and SQL Server. No knowledge of BCP required!!
:warning: :construction: This library is still under active development, should be considered in "alpha". Use at your own risk. :construction: :warning:
(That said, the source code is very small and easy to understand, so you should feel comfortable pretty quickly)
Quickstart
In [1]: import pandas as pd
...: import numpy as np
...:
...: from bcpandas import SqlCreds, to_sql, read_sql
In [2]: creds = SqlCreds(
...: 'my_server',
...: 'my_db',
...: 'my_username',
...: 'my_password'
...: )
In [3]: df = pd.DataFrame(
...: data=np.ndarray(shape=(10, 6), dtype=int),
...: columns=[f"col_{x}" for x in range(6)]
...: )
In [4]: df
Out[4]:
col_0 col_1 col_2 col_3 col_4 col_5
0 4128860 6029375 3801155 5570652 6619251 7536754
1 4849756 7536751 4456552 7143529 7471201 7012467
2 6029433 6881357 6881390 7274595 6553710 3342433
3 6619228 7733358 6029427 6488162 6357104 6553710
4 7536737 7077980 6422633 7536732 7602281 2949221
5 6357104 7012451 6750305 7536741 7340124 7274610
6 7340141 6226036 7274612 7077999 6881387 6029428
7 6619243 6226041 6881378 6553710 7209065 6029415
8 6881378 6553710 7209065 7536743 7274588 6619248
9 6226030 7209065 6619231 6881380 7274612 3014770
In [5]: to_sql(df, 'my_test_table', creds, index=False, if_exists='replace')
In [6]: df2 = read_sql('my_test_table', creds)
In [7]: df2
Out[7]:
col_0 col_1 col_2 col_3 col_4 col_5
0 4128860 6029375 3801155 5570652 6619251 7536754
1 4849756 7536751 4456552 7143529 7471201 7012467
2 6029433 6881357 6881390 7274595 6553710 3342433
3 6619228 7733358 6029427 6488162 6357104 6553710
4 7536737 7077980 6422633 7536732 7602281 2949221
5 6357104 7012451 6750305 7536741 7340124 7274610
6 7340141 6226036 7274612 7077999 6881387 6029428
7 6619243 6226041 6881378 6553710 7209065 6029415
8 6881378 6553710 7209065 7536743 7274588 6619248
9 6226030 7209065 6619231 6881380 7274612 3014770
Requirements
Motivations and Design
Overview
Reading and writing data from pandas DataFrames to/from a SQL database is very slow using the built-in read_sql
and to_sql
methods, even with the newly introduced execute_many
option. For Microsoft SQL Server, a far far faster method is to use the BCP utility provided by Microsoft. This utility is a command line tool that transfers data to/from the database and flat text files.
This package is a wrapper for seamlessly using the bcp utility from Python using a pandas DataFrame. Despite the IO hits, the fastest option by far is saving the data to a CSV file in the file system and using the bcp utility to transfer the CSV file to SQL Server. Best of all, you don't need to know anything about using BCP at all!
Existing Solutions
Name | GitHub | PyPI |
bcpy | https://github.com/titan550/bcpy | https://pypi.org/project/bcpy |
magical-sqlserver | https://github.com/brennoflavio/magical-sqlserver | https://pypi.org/project/magical-sqlserver/ |
bcpy
bcpy
has several flaws:
- No support for reading from SQL, only writing to SQL
- A convoluted, overly class-based internal design
- Scope a bit too broad - deals with pandas as well as flat files
This repository aims to fix and improve on bcpy
and the above issues by making the design choices described below.
Note, much credit is due to
bcpy
for the original idea and for some of the code that was adopted and changed.
magical-sqlserver
magical-sqlserver
is a library to make working with Python and SQL Server very easy. But it doesn't fit what I'm trying to do:
- No built in support for pandas DataFrame
- Larger codebase, I'm not fully comfortable with the dependency on the very heavy pymssql
Design and Scope
The only scope of bcpandas
is to read and write between a pandas DataFrame and a Microsoft SQL Server database. That's it. We do not concern ourselves with reading existing flat files to/from SQL - that introduces way to much complexity in trying to parse and decode the various parts of the file, like delimiters, quote characters, and line endings. Instead, to read/write an exiting flat file, just import it via pandas into a DataFrame, and then use bcpandas
.
The big benefit of this is that we get to precicely control all the finicky parts of the text file when we write/read it to a local file and then in the BCP utility. This lets us set library-wide defaults (maybe configurable in the future) and work with those.
For now, we are using the non-XML BCP format file type. In the future, XML format files may be added.
Currently, this is being built with only Windows in mind. Linux support is definitely easily added, it's just not in the immediate scope of the project yet. PRs are welcome.
Finally, the SQL Server databases supported are both the on-prem and Azure versions.
Benchmarks
# TODO
Installation
You can download and install this package from PyPI
pip install bcpandas
or from conda coming soon
conda install -c conda-forge bcpandas
Contributing
Please, all contributions are very welcome!
I will attempt to use the pandas
code style as detailed here
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bcpandas-0.1.2.tar.gz
.
File metadata
- Download URL: bcpandas-0.1.2.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f451bb7cc0b35f8266d3f09f73005271897313293b94132f003236e796d6ac9b |
|
MD5 | 697250f4cc0b086d94f89fb82baa4f89 |
|
BLAKE2b-256 | 9d1f9fd60370fd4cba6e2c8d525ba97c4f24e816b8788b032eed2ac536a789fe |
Provenance
File details
Details for the file bcpandas-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: bcpandas-0.1.2-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4fc99297fb698c47eb5d56bea55cd1a1c82810dd853d8f2527a065c9d71bc91 |
|
MD5 | 04bbd6c959c6bc607381d2d4193bf734 |
|
BLAKE2b-256 | c04f3cad40f43a412ffc0c51859ad1d39a683445bffac6126757792c78f3aa4a |