Skip to main content

Easily load data from CSV to test out your DynamoDB table design

Project description

dynamodb-dev-importer

Easily load data from CSV to test out your DynamoDB table design.

When working with DynamoDB, it is common practice to minimise the number of tables used, ideally down to just one.

Techniques such as sparse indexes and GSI overloading allow a lot of flexibility and efficiency.

Designing a good schema that supports your query patterns can be challenging. Often it is nice to try things out with a small amount of data. I personally find it convenient to enter data into a spreadsheet and play around with it there.

When ready to try out an approach with DynamoDB, it's a hassle to then create a items in a table through the AWS Console or CLI, so this script:

  • reads a CSV file (exported from your spreadsheet) and imports it into a DynamoDB table
  • columns 0 and 1 are used for the key: partition key pk: S and sort key sk: S - your target table needs these keys defined
  • column 2, if not an empty string, is set to data: S
  • all other columns are added as non-key attributes

Your CSV should contain columns for:

  • pk
  • sk
  • data (optional)
  • anything after those three can contain arbitrary attributes of form attribute_name: value i.e. city: Edinburgh

Example row:

PERSON-1,sales-Q1-2019,Alex,jan: 12012,feb: 1927

Will yield an item like this:

{
    pk: 'PERSON-1',
    sk: 'sales-Q1-2019',
    data: 'Alex',
    jan: 12012,
    feb 1927
}

Usage

Assuming DynamoDB table example(pk, sk) is setup and you're in a virtual environment. If you already have boto3 installed, you don't need to install any packages.

$ pip install ddbimp
$ ddbimp --table example --skip 1 example.csv

Key ideas

  • Table consists of partition key pk: S and sort key sk: S - their meaning varies depending on the item
  • A secondary index swaps the sort and partition keys, so the partition key is sk: S and sort key pk: S
  • A final secondary index uses sk: S and data: S where data is an arbitrary value you might want to search for, the meaning of data depends on the item it is part of
  • Group items through a shared partition key, store sub items with a sort key e.g.
    • e.g. pk:PERSON-1, sk:sales-Q1-2019, jan:12012, feb:1927

See example.csv for an example input file.

AWS recently released a preview build of a tool called NoSQL Workbench. It builds on the above ideas. I've tried it out and it seems pretty good, but I am a luddite and am faster working in a spreadsheet right now. I'd certainly recommend giving it a try.

Useful resources

Caveats, TODO

  • Uses your default AWS profile
  • Region needs to be set
  • Make work directly with a Google Sheets via sheets API

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddbimp-0.2.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

ddbimp-0.2-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file ddbimp-0.2.tar.gz.

File metadata

  • Download URL: ddbimp-0.2.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.8.0

File hashes

Hashes for ddbimp-0.2.tar.gz
Algorithm Hash digest
SHA256 5a7df24b7a40d578bb06089a6be2eedd5b896fad7aef1804ad925296463e136b
MD5 c420ba9c2e85a9926078e76448fb85e8
BLAKE2b-256 100cfcf836a2c48e29256440824e7a5d540beb3b04d4b9396fd83a218ab0a9d4

See more details on using hashes here.

File details

Details for the file ddbimp-0.2-py3-none-any.whl.

File metadata

  • Download URL: ddbimp-0.2-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.8.0

File hashes

Hashes for ddbimp-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6cd5e9eefeb854d4ae2d2b99f4a9b02fc77438808aabfc3c811535ed1aa4b619
MD5 4e98122ffb4a4f0d9d1f9fe0f705d1ad
BLAKE2b-256 46ba20b779cf55beddb320503d751549abdb65f22b43968a80de1a132814d6f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page