Fast db insert with postgresql binary copy
Project description
pgcopy is a small system for very fast bulk insertion of data into a PostgreSQL database table using binary copy.
Installation
To install:
$ pip install pgcopy
pgcopy requires pytz and the psycopg2 db adapter. nose is required to run the tests.
Use
pgcopy provides facility for copying data from an iterable of tuple-like objects using a CopyManager, which must be instantiated with a psycopg2 db connection, the table name, and an iterable containing the names of the columns to be inserted in the order in which they will be provided. pgcopy inspects the database to determine the datatypes of the columns.
For example:
from datetime import datetime from pgcopy import CopyManager import psycopg2 cols = ('id', 'timestamp', 'location', 'temperature') now = datetime.now() records = [ (0, now, 'Jerusalem', 72.2), (1, now, 'New York', 75.6), (2, now, 'Moscow', 54.3), ] conn = psycopg2.connect(database='weather_db') mgr = CopyManager(conn, 'measurements_table', cols) mgr.copy(records)
By default, a temporary file on disk is used. If there’s enough memory, you can get a slight performance benefit with in-memory storage:
from cStringIO import StringIO mgr.copy(records, StringIO)
A db schema can be specified in the table name using dot notation:
mgr = CopyManager(conn, 'myschema.measurements', cols)
Supported datatypes
Currently the following PostgreSQL datatypes are supported:
bool
smallint
integer
bigint
real
double precision
char
varchar
text
bytea
date
timestamp
timestamp with time zone
numeric (data must be decimal.Decimal)
json
jsonb
uuid
Testing
For a fast test run using current environment, use nose:
$ nosetests
For more thorough testing, Tox configuration will run tests on python versions 2.7 and 3.3 - 3.6:
$ tox
Additionally, test can be run with no local requirements other than the ubiquitous docker:
$ docker-compose up pgcopy
Benchmarks
Below are simple benchmarks for 100000 records. This gives a general idea of the kind of speedup available with pgcopy:
$ nosetests -c tests/benchmark.cfg ExecuteManyBenchmark: 7.75s PGCopyBenchmark: 0.54s ---------------------------------------------------------------------- Ran 2 tests in 9.101s
Replacing a Table
When possible, faster insertion may be realized by inserting into an empty table with no indices or constraints. In a case where the entire contents of the table can be reinserted, the Replace context manager automates the process. On entry, it creates a new table like the original, with a temporary name. Default column values are included. It provides the temporary name for populating the table within the context. On exit, it recreates the constraints, indices, triggers, and views on the new table, then replaces the old table with the new. It can be used so:
from pgcopy import CopyManager, Replace with Replace(conn, 'mytable') as temp_name: mgr = CopyManager(conn, temp_name, cols) mgr.copy(records)
Replace renames new db objects like the old, where possible. Names of foreign key and check constraints will be mangled. As of v0.6 there is also pgcopy.util.RenameReplace, which instead of dropping the original objects renames them using a transformation function.
Note that on PostgreSQL 9.1 and earlier, concurrent queries on the table will fail once the table is dropped.
See Also
cpgcopy, a Cython implementation, about twice as fast.
Changelog
1.2.0
- date:
25 Mar, 2017
Support db schema (Thanks Marcin Gozdalik)
1.1.0
- date:
26 Jan, 2017
Support uuid, json, and jsonb types (Thanks Igor Mastak)
Integrate Travis CI
Add docker test strategy
1.0.0
- date:
19 Jan, 2017
Run tests with tox
Support Python 3
Initial release on PyPi
0.7
- date:
19 Jan, 2017
Add support for serializing Python decimal.Decimal to PostgreSQL numeric.
0.6
- date:
21 Oct, 2014
RenameReplace variant
0.5
- date:
14 Jul, 2014
Support default values and sequences
0.4
- date:
14 Jul, 2014
Fix Replace utility class bugs
Add view support to Replace
0.3
- date:
8 Jul, 2014
Move Cython optimization to separate project
Add Replace utility class
0.2
- date:
7 Jul, 2014
Cython optimization
0.1
- date:
29 Jun, 2014
Initial version
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pgcopy-1.2.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b7e8b780a95b9524a4cd2836be02080e340c7380d2347ffbcb31d2db97f1b9b |
|
MD5 | 6477f2a766ca48cb591d108f29acd200 |
|
BLAKE2b-256 | ac00fb00549a610f0b31fee04a3d9bbc76fc921f31c9957e638bde2501c1754c |