Skip to main content

Geohash and DynamoDB Utility Package

Project description

GeoDDB - Geohash in DynamoDB

GeoDDB is a simple Python module that helps you store and query your location data in DynamoDB using just the partition key, without requiring any changes to your existing table or indexes.

Getting Started

  • GeoDDB does not require a new or separate table, you should create a table if you don't already have one
  • GeoDDB does not create or require local secondary indexes or global secondary indexes
    • You can certainly use LSIs and/or GSIs but this module doesn't require them
  • GeoDDB does not require a sort/range key, just tell it the name of your partition key
    • This avoids interfering with your ability to use composite keys to satisfy other access patterns

Installation

This package comes with its own Geohash implementation, so the only dependency is boto3.

pip install geoddb

Examples

Adding an Item

import boto3
from geoddb import GeoDDB

ddb = boto3.resource('dynamodb')
table = ddb.Table('FooTable')

gddb = GeoDDB(table, pk_name='PK', precision=5)

lat, lon = 33.63195443030888, -117.93583128993387

data = { 
    'SK': f'coffee#daydream',  # SK is my sort key, note that no partition key is present
    'Name': 'Daydream',
    'EntityType': 'Coffee/Surf Shop',
    'Address': '1588 Monrovia Ave, Newport Beach, CA 92663'
}

gddb.put_item(lat, lon, data)

Here we add a location with geohash length of 5, so the cell dimension is about 5km x 5km (3mi x 3mi).

Searching Items

import boto3
from geoddb import GeoDDB

ddb = boto3.resource('dynamodb')
table = ddb.Table('FooTable')

# use same settings here as when you added the location
gddb = GeoDDB(table, pk_name='PK', precision=5)

myLat, myLon = 33.66677439489231, -118.01282517173841

results = gddb.query(myLat, myLon, ddb_kwargs={
    'KeyConditionExpression': Key('SK').begins_with('coffee#'),
})

Here we search for coffee around a point of interest (my current location for example). Note that the same settings are used for querying that were used when storing the data. These settings can change for different collections of data, but must be consistent when storing and querying within the same set of data.

Options

DynamoDB Arguments

gddb.query(myLat, myLon, ddb_kwargs={
  'Limit': 10,
  'KeyConditionExpression': Key('SK').begins_with('coffee#'),
  'FilterExpression': Attr('Rating').gt(4.5)
})

GeoDDB's put_item and query accept a ddb_kwargs argument where you can include extra DynamoDB specific arguments. Note you should not include a condition on your partition key, this is handled by GeoDDB.

Geohash Prefix

gddb = GeoDDB(table, pk_name='PK', precision=5, prefix='loc#')

GeoDDB uses the geohash of a location as the partition key for your item, you can prefix this string if needed, for example loc# or geohash#. This would result in loc# followed by the geohash, eg: loc#7mup6. This can be useful for example in single-table design where key-blending is necessary.

Neighboring Cells

gddb.query(myLat, myLon, include_neighbors=False)

By default, all neighbors of your input geohash are queried. This is to avoid situations where the query location is near the edge of a cell and nearby results in the next cell would be missing. You may include or exclude neighboring cells depending on your use-case but no more than 9 cells are ever queried. You can turn this off:

Walk All Pages

gddb.query(myLat, myLon, include_all_pages=False)

By default, GeoDDB will walk all pages of results and return a complete list of items. Depending on your use-case and geohash length, this can lead to memory issues. You can turn this off:

Limitations

Bring Your Own Table

GeoDDB does not require, nor will it create a separate table or additional indexes for you. This was the biggest motivation for this project. Most of the time, a table already exists with appropriate indexes to satisfy a set of access patterns. This is especially true in a single-table design where composite keys are usually required and you need the sort key to filter collections items within a partition. I don't want to have to create a new table with local secondary indexes or use up a precious global secondary index when the whole benefit of geohashing is the ability to do a single lookup! You can certainly add a GSI if your application requires it to satisfy an access pattern, but the minimum needed for geohash queries is a partition key.

Radius Filtering

GeoDDB supports filtering results by distance using the Haversine formula for accurate great-circle distance. Use query_radius to get all items within a given radius in kilometers, sorted nearest-first:

from geoddb import GeoDDB, GeoItem

results = gddb.query_radius(myLat, myLon, radius_km=2.0)

for item in results:
    print(item.data['Name'], item.distance_km, 'km away')

Results are returned as GeoItem dataclass instances sorted by distance, nearest first. Each GeoItem has the following fields:

Field Type Description
lat float Item latitude
lon float Item longitude
distance_km float Distance from the query point in km
geohash str Geohash of the item
data dict The raw DynamoDB item

By default, query_radius expects your stored items to have lat and lon attributes. If your items use different attribute names, specify them:

results = gddb.query_radius(myLat, myLon, radius_km=5.0, lat_attr='latitude', lon_attr='longitude')

query_radius also accepts ddb_kwargs and include_all_pages just like query.

Note that at most 9 geohash cells are queried, so your radius shouldn't be larger than the shortest side of the 3x3 cell rectangle. Choose your geohash precision so that the cell dimensions cover your desired search radius. See table for geohash length and rectangular dimensions.

You may also set different geohash lengths for different types of your location data. For example: a 5 character long geohash is probably okay for coffee shop searches but not for airports where 3-4 characters might be more appropriate.

Deleting an Item

To delete an item, provide the latitude, longitude, sort key name, and sort key value:

gddb.delete_item(lat, lon, sk_name='SK', sk_value='coffee#daydream')

GeoDDB computes the geohash from the coordinates to find the correct partition, then deletes the item matching the given sort key. delete_item also accepts ddb_kwargs for passing extra arguments like ConditionExpression.

Updating a Location

This should be an infrequent operation. Obviously since the geohash is generated from the latitude and longitude of the location, in general you can't simply change those values without changing the geohash. Since you can't change the partition key of an item in DynamoDB, you must first delete the record and create a new record. You can use delete_item followed by put_item to accomplish this.

Geohash Cell Dimensions

Cell dimensions change with latitude, these are approximate.

Length Width x Height
1 5,009km x 4,992km
2 1,252km x 624km
3 156km x 156km
4 39.1km x 19.5km
5 4.9km x 4.9km
6 1.2km x 609.4m
7 152.9m x 152.4m
8 38.2m x 19m
9 4.8m x 4.8m
10 1.2m x 59.5cm
11 14.9cm x 14.9cm
12 3.7cm x 1.9cm

Bugs?!

Maybe... Probably, I don't have any tests yet :/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoddb-1.0.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geoddb-1.0.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file geoddb-1.0.0.tar.gz.

File metadata

  • Download URL: geoddb-1.0.0.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for geoddb-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f6b7d04a5bc9f72f7afa4e2662edd1a1acea7f930441852675c620763527e900
MD5 cd775ddc6ed23ffbb39259888451c381
BLAKE2b-256 ce6c68f51f540f7dc3ef8dbce16d479fa28deaeed41c3cc589cb348e6c1943bb

See more details on using hashes here.

File details

Details for the file geoddb-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: geoddb-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for geoddb-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c0e3655d16002df6decafd9f534ceeceac6b67dd35faa25f1bed87203cb70e8a
MD5 cd437cba69117e772351891988be2292
BLAKE2b-256 20406bbef470ea9fac8633195e405621a982c2c8d5dbf620fc3b3e356829a238

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page