Skip to main content

Fast db insert with postgresql binary copy

Project description

pgcopy is a small system for very fast bulk insertion of data into a PostgreSQL database table using binary copy.

Installation

To install:

pip install pgcopy

pgcopy requires pytz and the psycopg2 db adapter. nose is required to run the tests.

Use

pgcopy provides facility for copying data from an iterable of tuple-like objects using a CopyManager, which must be instantiated with a psycopg2 db connection, the table name, and an iterable containing the names of the columns to be inserted in the order in which they will be provided. pgcopy inspects the database to determine the datatypes of the columns.

For example:

from datetime import datetime
from pgcopy import CopyManager
import psycopg2
cols = ('id', 'timestamp', 'location', 'temperature')
now = datetime.now()
records = [
        (0, now, 'Jerusalem', 72.2),
        (1, now, 'New York', 75.6),
        (2, now, 'Moscow', 54.3),
    ]
conn = psycopg2.connect(database='weather_db')
mgr = CopyManager(conn, 'measurements_table', cols)
mgr.copy(records)

By default, a temporary file on disk is used. If there’s enough memory, you can get a slight performance benefit with in-memory storage:

from cStringIO import StringIO
mgr.copy(records, StringIO)

Supported datatypes

Currently the following PostgreSQL datatypes are supported:

  • bool

  • smallint

  • integer

  • bigint

  • real

  • double precision

  • char

  • varchar

  • text

  • bytea

  • date

  • timestamp

  • timestamp with time zone

  • numeric (data must be decimal.Decimal)

Benchmarks

Below are simple benchmarks for 100000 records. This gives a general idea of the kind of speedup available with pgcopy:

$ nosetests -c tests/benchmark.cfg
          ExecuteManyBenchmark:   7.75s
               PGCopyBenchmark:   0.54s
----------------------------------------------------------------------
Ran 2 tests in 9.101s

Replacing a Table

When possible, faster insertion may be realized by inserting into an empty table with no indices or constraints. In a case where the entire contents of the table can be reinserted, the Replace context manager automates the process. On entry, it creates a new table like the original, with a temporary name. Default column values are included. It provides the temporary name for populating the table within the context. On exit, it recreates the constraints, indices, triggers, and views on the new table, then replaces the old table with the new. It can be used so:

from pgcopy import CopyManager, Replace
with Replace(conn, 'mytable') as temp_name:
    mgr = CopyManager(conn, temp_name, cols)
    mgr.copy(records)

Replace renames new db objects like the old, where possible. Names of foreign key and check constraints will be mangled. As of v0.6 there is also pgcopy.util.RenameReplace, which instead of dropping the original objects renames them using a transformation function.

Note that on PostgreSQL 9.1 and earlier, concurrent queries on the table will fail once the table is dropped.

See Also

cpgcopy, a Cython implementation, about twice as fast.

Changelog

1.0.0

date:

19 Jan, 2017

  • Run tests with tox

  • Support Python 3

  • Initial release on PyPi

0.7

date:

19 Jan, 2017

  • Add support for serializing Python decimal.Decimal to PostgreSQL numeric.

0.6

date:

21 Oct, 2014

  • RenameReplace variant

0.5

date:

14 Jul, 2014

  • Support default values and sequences

0.4

date:

14 Jul, 2014

  • Fix Replace utility class bugs

  • Add view support to Replace

0.3

date:

8 Jul, 2014

  • Move Cython optimization to separate project

  • Add Replace utility class

0.2

date:

7 Jul, 2014

  • Cython optimization

0.1

date:

29 Jun, 2014

  • Initial version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgcopy-1.0.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pgcopy-1.0.0-py2.py3-none-any.whl (11.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pgcopy-1.0.0.tar.gz.

File metadata

  • Download URL: pgcopy-1.0.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pgcopy-1.0.0.tar.gz
Algorithm Hash digest
SHA256 24353ee668b9689c1a12259ea5500478a2ae3fdc4d101f551b598e1462097440
MD5 4b53fcd573c6265df4005cb566d14234
BLAKE2b-256 ffa90fdf80f7961e9fbe2b773ce2a4f3de6fedc84dba682acf3740daa9148c8f

See more details on using hashes here.

File details

Details for the file pgcopy-1.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for pgcopy-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 3c4fe40b6b36a0d3e2cc59517787743f8b5602528d10e603d54cf14a8b14f9bd
MD5 99ddf47e19be10e2dcd11c71f2520cac
BLAKE2b-256 63f561d8cf9b6748c9b1b94c34fd9e2bd1928f3bd4de893cead16d96083d19f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page