Skip to main content

Apache Beam Python I/O connector for Amazon DynamoDB

Project description

dynamodb_pyio

doc test release pypi python

Amazon DynamoDB is a serverless, NoSQL database service that allows you to develop modern applications at any scale. The Apache Beam Python I/O connector for Amazon DynamoDB (dynamodb_pyio) aims to integrate with the database service by supporting source and sink connectors. Currently, the sink connector is available.

Installation

The connector can be installed from PyPI.

$ pip install dynamodb_pyio

Usage

Sink Connector

It has the main composite transform (WriteToDynamoDB), and it expects a list or tuple PCollection element. If the element is a tuple, the tuple's first element is taken. If the element is not of the accepted types, you can apply the GroupIntoBatches or BatchElements transform beforehand. Then, the records of the element are written to a DynamoDB table with help of the batch_writer of the boto3 package. Note that the batch writer will automatically handle buffering and sending items in batches. In addition, it will also automatically handle any unprocessed items and resend them as needed.

The transform also has an option that handles duplicate records.

  • dedup_pkeys - List of keys to be used for deduplicating items in buffer.

Sink Connector Example

The transform can process many records, thanks to the batch writer.

import apache_beam as beam
from dynamodb_pyio.io import WriteToDynamoDB

records = [{"pk": str(i), "sk": i} for i in range(500)]

with beam.Pipeline() as p:
    (
        p
        | beam.Create([records])
        | WriteToDynamoDB(table_name=self.table_name)
    )

Duplicate records can be handled using the dedup_pkeys option.

import apache_beam as beam
from dynamodb_pyio.io import WriteToDynamoDB

records = [{"pk": str(1), "sk": 1} for _ in range(20)]

with beam.Pipeline() as p:
    (
        p
        | beam.Create([records])
        | WriteToDynamoDB(table_name=self.table_name, dedup_pkeys=["pk", "sk"])
    )

Batches of elements can be controlled further with the BatchElements or GroupIntoBatches transform

import apache_beam as beam
from apache_beam.transforms.util import BatchElements
from dynamodb_pyio.io import WriteToDynamoDB

records = [{"pk": str(i), "sk": i} for i in range(100)]

with beam.Pipeline() as p:
    (
        p
        | beam.Create(records)
        | BatchElements(min_batch_size=50, max_batch_size=50)
        | WriteToDynamoDB(table_name=self.table_name)
    )

See Introduction to DynamoDB PyIO Sink Connector for more examples.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

dynamodb_pyio was created as part of the Apache Beam Python I/O Connectors project. It is licensed under the terms of the Apache License 2.0 license.

Credits

dynamodb_pyio was created with cookiecutter and the pyio-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dynamodb_pyio-0.1.1.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

dynamodb_pyio-0.1.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file dynamodb_pyio-0.1.1.tar.gz.

File metadata

  • Download URL: dynamodb_pyio-0.1.1.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.10 Linux/5.15.153.1-microsoft-standard-WSL2

File hashes

Hashes for dynamodb_pyio-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e509e95711e6eb4f8cd5a04ba633c6da379d38c23246bf36457ec832c5b69385
MD5 1abf858fefbfec39b624e30f27cb3e34
BLAKE2b-256 b8ac9aca4179a357ce4b4d2e42d8e61b47f987445d964d126113d8571e8134c1

See more details on using hashes here.

File details

Details for the file dynamodb_pyio-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dynamodb_pyio-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.10 Linux/5.15.153.1-microsoft-standard-WSL2

File hashes

Hashes for dynamodb_pyio-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f61ee0610b2098fac3bb1225005d7cda0ba71b8b8658ea9c75cdc91fa4410e1f
MD5 85bf5381578cb64222b8ddff73add7c0
BLAKE2b-256 27e4e982b206e1646d41faeeb28b26461e07b3885df5bfa4250b99698f381ea1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page