Easily load data from CSV to test out your DynamoDB table design
Project description
dynamodb-dev-importer
Easily load data from CSV to test out your DynamoDB table design.
When working with DynamoDB, it is common practice to minimise the number of tables used, ideally down to just one.
Techniques such as sparse indexes and GSI overloading allow a lot of flexibility and efficiency.
Designing a good schema that supports your query patterns can be challenging. Often it is nice to try things out with a small amount of data. I personally find it convenient to enter data into a spreadsheet and play around with it there.
When ready to try out an approach with DynamoDB, it's a hassle to then create a items in a table through the AWS Console or CLI, so this script:
- reads a CSV file (exported from your spreadsheet) and imports it into a DynamoDB table
- columns 0 and 1 are used for the key: partition key
pk: S
and sort keysk: S
- your target table needs these keys defined - column 2, if not an empty string, is set to
data: S
- all other columns are added as non-key attributes
Your CSV should contain columns for:
- pk
- sk
- data (optional)
- anything after those three can contain arbitrary attributes of form
attribute_name: value
i.e.city: Edinburgh
Example row:
PERSON-1,sales-Q1-2019,Alex,jan: 12012,feb: 1927
Will yield an item like this:
{
pk: 'PERSON-1',
sk: 'sales-Q1-2019',
data: 'Alex',
jan: 12012,
feb 1927
}
Usage
Assuming DynamoDB table example(pk, sk)
is setup and you're in a virtual environment. If you already have boto3 installed, you don't need to install any packages.
$ pip install ddbimp
$ ddbimp --table example --skip 1 example.csv
Key ideas
- Table consists of partition key
pk: S
and sort keysk: S
- their meaning varies depending on the item - A secondary index swaps the sort and partition keys, so the partition key is
sk: S
and sort keypk: S
- A final secondary index uses
sk: S
anddata: S
where data is an arbitrary value you might want to search for, the meaning ofdata
depends on the item it is part of - Group items through a shared partition key, store sub items with a sort key e.g.
- e.g.
pk:PERSON-1, sk:sales-Q1-2019, jan:12012, feb:1927
- e.g.
See example.csv for an example input file.
AWS recently released a preview build of a tool called NoSQL Workbench. It builds on the above ideas. I've tried it out and it seems pretty good, but I am a luddite and am faster working in a spreadsheet right now. I'd certainly recommend giving it a try.
Useful resources
- https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes.html
- https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/workbench.html
- https://www.youtube.com/watch?v=6yqfmXiZTlM
Caveats, TODO
- Uses your default AWS profile
- Region needs to be set
- Make work directly with a Google Sheets via sheets API
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.