A flexible schema framework for geospatial data.
Project description
Grout
Grout is a flexible-schema framework for geospatial data, powered by Django and PostgreSQL. Think: NoSQL database server, but with schema validation and PostGIS support.
Contents
Introduction
Grout combines the flexibility of NoSQL databases with the geospatial muscle of PostGIS, allowing you to make migration-free edits to your database schema while still having access to powerful geospatial queries.
Grout will help you:
- Define, edit, and validate schemas for records in your application
- Keep track of changes to schemas using a built-in versioning system
- Perform fast filtering of user-defined fields
- Run complex geospatial queries, even on records stored with unstructured data
Grout is the core library of the Grout suite, a toolkit for easily building flexible-schema apps on top of Grout. You can use Grout by installing it as an app in a Django project, or you can deploy it as a standalone API server with an optional admin backend.
Ready for more? To get started using Grout with Django, see Getting started. To get started using Grout with another stack, see Non-Django applications. For more background on how Grout works, see Concepts.
Getting started
Django
If you're developing a Django project, you can install Grout as a Django app and use it in your project.
Requirements
Grout supports the following versions of Python and Django:
- Python: 2.7, 3.4, 3.5, 3.6, 3.7
- Django: 1.11, 2.0
Certain versions of Django only support certain versions of Python. To ensure that your Python and Django versions work together, see the Django FAQ: What Python version can I use with Django?
Installation
Install the Grout library from PyPi using pip
.
$ pip install grout
To use the development version of Grout, install it from GitHub.
$ git clone git@github.com:azavea/grout.git
Make sure Grout is included in INSTALLED_APPS
in your project's settings.py
.
# settings.py
INSTALLED_APPS = (
...
'grout',
)
To use Grout as an API server, you need to incorporate the API views into your
urls.py
file. The following example will include Grout views under the
/grout
endpoint.
# urls.py
urlpatterns = [
url(r'^grout/', include('grout.urls'))
]
Note that Grout automatically nests views under the /api/
endpoint, meaning
that the setting above would create URLs like hostname.com/grout/api/records
.
If you'd prefer Grout views to live under a top-level /api/
endpoint (like
hostname.com/api/records
), you can import the Grout urlpatterns
directly.
# urls.py
from grout import urlpatterns as grout_urlpatterns
urlpatterns = grout_urlpatterns
Configuration
Grout requires that the GROUT
configuration variable be defined in your settings.py
file
in order to work properly. The GROUT
variable is a dictionary of configuration
directives for the app.
Currently, 'SRID'
is the only required key in the GROUT
dictionary. 'SRID'
is an integer
corresponding to the spatial reference
identifier
that Grout should use to store geometries. 4326
is the most common SRID, and
is a good default for projects.
Here's an example configuration for a development project:
# settings.py
# The projection for geometries stored in Grout.
GROUT = { 'SRID': 4326 }
Note that Grout uses Django REST Framework under the hood to provide API endpoints. To configure DRF-specific settings like authentication, see the DRF docs.
More examples
Grout Server is a simple deployment of a Grout API server designed to be used as a standalone app. It also serves as a good example of how to incorporate Grout into a Django project, and includes a preconfigured authentication module to boot. If you're having trouble installing or configuring Grout in your project, Grout Server is a good resource for troubleshooting.
Non-Django applications
If you're not a Django developer, you can still use Grout as a standalone API server using the Grout Server project. See the Grout Server docs for details on how to install a Grout Server instance.
Concepts
Data model
.
Grout is centered around Records, which are just entities in your database. A Record can be any type of thing or event in the world, although Grout is most useful when your Records have some geospatial and temporal component.
Every Record contains a reference to a RecordSchema, which catalogs the versioned schema of the Record that points to it. This schema is stored as JSONSchema, a specification for describing data models in JSON.
Finally, each RecordSchema contains a reference to a RecordType, which is a simple container for organizing Records. The RecordType exposes a way to reliably access a set of Records that represent the same type of thing, even if they have different schemas. As we’ll see shortly, RecordTypes are useful access points to Records because RecordSchemas can change at any moment.
Versioned schemas
In Grout, RecordSchemas are append-only, meaning that they cannot be deleted.
Instead, when you want to change the schema of a Record, you create a new
RecordSchema and update the version
attribute.
For a quick example, say that we have a RecordSchema describing data stored on
a cat
RecordType. The RecordSchema might look something like this:
{
"version": 1,
"next_version": null,
"schema": {
"type": "object",
"title": "Initial Schema",
"$schema": "http://json-schema.org/draft-04/schema#",
"properties": {
"catDetails": {
"$ref": "#/definitions/driverPosterDetails"
},
"definitions": {
"catDetails": {
"type": "object",
"title": "Cat Details",
"properties": {
"Name": {
"type": "string",
"fieldType": "text",
"isSearchable": true,
"propertyOrder": 1
},
"Age": {
"type": "integer",
"fieldType": "integer",
"minimum": 0,
"maximum": 100,
"isSearchable": true,
"propertyOrder": 2
},
"Color": {
"type": "string",
"fieldType": "text",
"isSearchable": true,
"propertyOrder": 3
},
"Breed": {
"type": "select",
"fieldType": "selectlist",
"enum": [
"Tabby",
"Bobtail",
"Abyssinian"
],
"isSearchable": true,
"propertyOrder": 4
}
}
}
}
}
}
A few things to note about this RecordSchema object:
- This is the first version of the schema (its
version
is1
) - There is no more recent version than this one (its
next_version
isnull
) - The schema definition itself is stored as a JSONSchema object on the
schema
attribute - All of the available fields are namespaced by the
catDetails
attribute, which we sometimes refer to as a form or related content
Now say we want to change the Age
field to a Date of Birth
field. Instead of
changing the schema directly, we'll create a new schema. Grout will automatically
set version: 2
and next_version: null
for this updated schema:
{
"version": 2,
"next_version": null,
"schema": {
"type": "object",
"title": "Initial Schema",
"$schema": "http://json-schema.org/draft-04/schema#",
"properties": {
"catDetails": {
"$ref": "#/definitions/driverPosterDetails"
},
"definitions": {
"catDetails": {
"type": "object",
"title": "Cat Details",
"properties": {
"Name": {
"type": "string",
"fieldType": "text",
"isSearchable": true,
"propertyOrder": 1
},
"Age": {
"type": "integer",
"fieldType": "integer",
"minimum": 0,
"maximum": 100,
"isSearchable": true,
"propertyOrder": 2
},
"Date of Birth": {
"type": "string",
"format": "datetime",
"fieldType": "text",
"isSearchable": true,
"propertyOrder": 3
},
"Color": {
"type": "string",
"fieldType": "text",
"isSearchable": true,
"propertyOrder": 4
},
"Breed": {
"type": "select",
"fieldType": "selectlist",
"enum": [
"Tabby",
"Bobtail",
"Abyssinian"
],
"isSearchable": true,
"propertyOrder": 5
}
}
}
}
}
}
}
In addition, Grout will update the initial schema to set next_version: 2
:
{
"version": 1,
"next_version": 2,
"schema": {
...
}
}
Now, when a user searches for Records in the cat
RecordType, Grout can find
the most recent schema by looking for the RecordSchema where next_version: null
.
This preserves a full audit trail of the RecordSchema, allowing us to
inspect how the schema has changed over time.
For a closer look at the Grout data model, see the models.py
file in the Grout
library.
API documentation
Request and response formats
Communication with the API generally follows the principles of RESTful API design.
API paths correspond to resources, GET
requests are used to retrieve objects, POST
requests are used to create new objects, and PATCH
requests are used to update
existing objects. This pattern is followed in nearly all cases; any exceptions
will be noted in the documentation.
Responses from the API are exclusively JSON.
Endpoint behavior can be configured using query parameters for GET
requests,
while POST
requests require a payload in JSON format.
Pagination
All API endpoints that return lists of resources are paginated. The pagination takes the following format:
{
"count": 57624,
"next": "http://localhost:8000/api/records/?offset=20",
"previous": "http://localhost:7000/api/records/",
"results": [
...
]
}
In a real response, the domain and port for the next
and previous
fields
will be that of the server responding to the request.
This format applies to the API endpoints below and will not be repeated in the documentation for each individual endpoint.
Resources
RecordTypes
Because the RecordSchema for a set of Records can change at any time, the RecordType API endpoint provides a consistent access point for retrieving a set of Records. Use the RecordType endpoints to discover the most recent RecordSchema for the Records you are interested in before performing further queries.
Paths:
- List:
/api/recordtypes/
- Detail:
/api/recordtypes/{uuid}/
Query parameters:
active
: Boolean- Filter for only RecordTypes with an
active
value of True. Generally, you will want to limit yourself to active RecordTypes.
- Filter for only RecordTypes with an
Results fields:
Field name | Type | Description |
---|---|---|
uuid |
UUID | Unique identifier for this RecordType. |
current_schema |
UUID | The most recent RecordSchema for this RecordType. |
created |
Timestamp | The date and time when this RecordType was created. |
modified |
Timestamp | The date and time when this RecordType was last modified. |
label |
String | The name of this RecordType. |
plural_label |
String | The plural version of the name of this RecordType. |
description |
String | A short description of this RecordType. |
active |
Boolean | Whether or not this RecordType is active. This field allows RecordTypes to be deactivated rather than deleted. |
geometry_type |
String | The geometry type supported for Records of this RecordType. One of point , polygon , multipolygon , linestring , or none . |
temporal |
Boolean | Whether or not Records of this RecordType should store datetime data in the occurred_from and occurred_to fields. |
RecordSchemas
The RecordSchema API endpoint can help you discover the fields that should be available on a given Record. This can be useful for automatically generating filters based on a Record's fields, or for running custom validation on a Record's schema.
Paths:
- List:
/api/recordschemas/
- Detail:
/api/recordschemas/{uuid}/
Results fields:
Field name | Type | Description |
---|---|---|
uuid |
UUID | Unique identifier for this RecordSchema. |
created |
Timestamp | The date and time when this RecordSchema was created. |
modified |
Timestamp | The date and time when this RecordSchema was last modified. |
version |
Integer | A sequential number indicating what version of the RecordType's schema this is. Starts at 1. |
next_version |
UUID | Unique identifier of the RecordSchema with the next-highest version number for this schema's RecordType. If this is the most recent version of the schema, this field will be null . |
record_type |
UUID | Unique identifier of the RecordType that this RecordSchema refers to. |
schema |
Object | A JSONSchema object that should validate Records that refer to this RecordSchema. |
Records
Records are the heart of a Grout project: the entities in your database. The Records API endpoint provides a way of retrieving these objects for analysis or display to an end user.
Paths:
- List:
/api/records/
- Detail:
/api/records/{uuid}/
Query Parameters:
-
archived
: Boolean- Records can be "archived" to denote that they are no longer current, as an
alternative to deletion. Pass
True
(case-sensitive) to this parameter to return archived Records only, and passFalse
(case-sensitive) to return current Records only. Omitting this parameter returns both types.
- Records can be "archived" to denote that they are no longer current, as an
alternative to deletion. Pass
-
details_only
: Boolean- In the Grout Schema Editor,
every Record is automatically generated with a
<record_type>Details
form which is intended to contain a basic summary of information about the Record. PassingTrue
(case-sensitive) to this parameter will omit any other forms which may exist on the Record. This is useful for limiting the size of the payload returned when only a summary view is needed.
- In the Grout Schema Editor,
every Record is automatically generated with a
-
record_type
: UUID- Limit the response to Records matching the passed RecordType UUID. This is optional in theory, but for most applications it is a good idea to include this parameter by default. It is considered rare that it will be useful to return two different types of Records in a single request. It is usually a better idea to make a separate request for each RecordType.
-
jsonb
: Object- Query the data fields of the object and filter on the result.
- Keys in this object mimic the search paths to filter on a particular object
field. However, in place of values, a filter rule definition is used. Example:
{ "accidentDetails": { "Main+cause": { "_rule_type": "containment", "contains": [ "Vehicle+defect", "Road+defect", ["Vehicle+defect"], ["Road+defect"] ] }, "Num+driver+casualties": { "_rule_type": "intrange", "min": 1, "max": 3 } }}
. This query defines the following two filters:accidentDetails -> "Main cause" == "Vehicle defect" OR accidentDetails -> "Main cause" == "Road defect"
accidentDetails -> "Num driver casualties" >= 1 AND accidentDetails -> "Num driver casualties" <= 3
- There is a third filter rule type available:
containment_multiple
. This is used when searching a form of which there can be several on a single Record. Here's an example:{"person":{"Injury":{"_rule_type":"containment_multiple","contains":["Fatal"]}}}
-
occurred_min
: Timestamp- Filter to Records occurring after this date.
-
occurred_max
: Timestamp- Filter to Records occurring before this date.
-
polygon_id
: UUID- Filter to Records which occurred within the Polygon identified by the UUID. The value must refer to a Boundary in the database.
-
polygon
: GeoJSON- Filter to Records which occurred within the bounds of a valid GeoJSON object.
Results fields:
Field name | Type | Description |
---|---|---|
uuid |
UUID | Unique identifier for this Record. |
created |
Timestamp | The date and time when this Record was created. |
modified |
Timestamp | The date and time when this Record was last modified. |
occurred_from |
Timestamp | The earliest time at which this Record might have occurred. |
occurred_to |
Timestamp | The latest time at which this Record might have occurred. Note that this field is mandatory for temporal Records: if a Record only occurred at one moment in time, the occurred_from field and the occurred_to field will have the same value. |
geom |
GeoJSON | Geometry representing the location associated with this Record. |
location_text |
String | A description of the location where this Record occurred, typically an address. |
archived |
Boolean | A way of hiding records without deleting them completely. True indicates the Record is archived. |
schema |
UUID | References the RecordSchema which was used to create this Record. |
data |
Object | A JSON object representing the flexible data fields associated with this Record. It is always true that the object stored in data conforms to the RecordSchema referenced by the schema UUID. |
Boundaries
Boundaries provide a quick way of storing Shapefile data in Grout without having to create separate RecordTypes. Using a Boundary, you can upload and retrieve Shapefile data for things like administrative borders and focus areas in your application.
Paths:
- List:
/api/boundaries/
- Detail:
/api/boundaries/{uuid}/
Results fields:
Field name | Type | Description |
---|---|---|
uuid |
UUID | Unique identifier for this Boundary. |
created |
Timestamp | The date and time when this Boundary was created. |
modified |
Timestamp | The date and time when this Boundary was last modified. |
label |
String | Label of this Boundary, for display. |
color |
String | Color preference to use for rendering this Boundary. |
display_field |
String | Which field of the imported Shapefile to use for display. |
data_fields |
Array | List of the names of the fields contained in the imported Shapefile. |
errors |
Array | A possible list of errors raised when importing the Shapefile. |
status |
String | Import status of the Shapefile. |
source_file |
String | URI of the Shapefile that was originally used to generate this Boundary. |
Notes:
Creating a new Boundary and its BoundaryPolygon correctly is a two-step process.
-
POST
to/api/boundaries/
with a zipped Shapefile attached; you will need to include the label as form data. You should receive a 201 response which contains a fully-fledged Boundary object, including a list of available data fields indata_fields
. -
The response from the previous request will have a blank
display_field
. Select one of the fields indata_fields
and make aPATCH
request to/api/boundaries/{uuid}/
with that value indisplay_field
. You are now ready to use this Boundary and its associated BoundaryPolygon.
BoundaryPolygons
BoundaryPolygons store the Shapefile data associated with a Boundary, including geometry and metadata.
Paths:
- List:
/api/boundarypolygons/
- Detail:
/api/boundarypolygons/{uuid}/
Query Parameters:
-
boundary
: UUID- Filter to Polygons associated with this parent Boundary.
-
nogeom
: Boolean- When passed with any value, causes the geometry field to be replaced with a bbox field. This reduces the response size and is sufficient for many purposes.
Results fields:
Field name | Type | Description |
---|---|---|
uuid |
UUID | Unique identifier for this BoundaryPolygon. |
created |
Timestamp | The date and time when this BoundaryPolygon was created. |
modified |
Timestamp | The date and time when this BoundaryPolygon was last modified. |
data |
Object | Each key in this Object will correspond to one of the data_fields in the parent Boundary, and will store the value for that field for this Polygon. |
boundary |
UUID | Unique identifier of the parent Boundary for this BoundaryPolygon. |
bbox |
Array | Minimum bounding box containing this Polygon's geometry, as an Array of lat/lon points. This field is optional -- see the nogeom parameter above for more details. |
geometry |
GeoJSON | GeoJSON representation of this Polygon. This field is optional -- see the nogeom parameter above for more details. |
Developing
These instructions will help you set up a development version of Grout and contribute changes back upstream.
Requirements
The Grout development environment is containerized with Docker to ensure similar environments across platforms. In order to develop with Docker, you need the following dependencies:
- Docker CE Engine >= 1.13.0 (must be compatible with Docker Compose file v3 syntax)
- Docker Compose
Installation
Clone the repo with git.
$ git clone git@github.com:azavea/grout.git
$ cd grout
Run the update
script to set up your development environment.
$ ./scripts/update
Running tests
Once your environment is up to date, you can use the scripts/test
script to
run the Grout unit test suite.
$ ./scripts/test
This command will run a matrix of tests for every supported version of Python and Django in the project. If you're developing locally and you just want to run a subset of the tests, you can specify the version of Python that you want to use to run tests:
# Only run tests for Python 2.7 (this will test Django 1.8).
$ ./scripts/test app py27
# Only run tests for Python 3.7 (this will test Django 2.0).
$ ./scripts/test app py37
For a list of available Python versions, see the envlist
directive in the tox.ini
file.
Cleaning up
Tox creates a new virtualenv for every combination of Python and Django versions
used by the test suite. In order to clean up stopped containers and
remove these virtualenvs, use the clean
script:
$ ./scripts/clean
Note that clean
will remove all dangling images, stopped containers, and
unused volumes on your machine. If you don't want to remove these artifacts,
view the clean
script and run only the command that
interests you.
Making migrations
If you edit the data model in grout/models.py
, you'll need to create a new
migration for the app. You can use the django-admin
script in the scripts
directory to automatically generate the migration:
$ ./scripts/django-admin makemigrations
Make sure to register the new migrations file with Git:
$ git add grout/migrations
Resources
The following resources provide helpful tips for deploying and using Grout.
Grout suite
- Grout Server: An easily-deployable standalone instance of a Grout API server.
- Grout Schema Editor: A purely static app that can read and write flexible schemas from a Grout API.
- Demo app: A demo project providing an example of incorporating the Grout suite into a Vue.js app.
Historical documents
-
Concept map: An early description of the Grout suite (formerly known as Ashlar) from an Open Source Fellow working on it during the summer of 2018. Describes the conceptual architecture of the suite, and summarizes ideas for future directions.
-
Renaming the package to Grout: An ADR documenting the decision to rename the package from "Ashlar" to "Grout".
-
Evaluating Record-to-Record references: An ADR documenting the reasons and requirements for implementing a Record-to-Record foreign key field. See also the pull request thread for further discussion.
-
Evaluating alternate backends: An ADR presenting research into possible NoSQL backends and service providers for Grout.
-
Grout 2018 Fellowship: A project management repo for working on Grout during Azavea's Summer 2018 Open Source Fellowship. Useful for documentation around the motivation and trajectory of the project.
Roadmap
Want to know where Grout is headed? See the Roadmap to get a picture of future development.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file grout-2.0.1.tar.gz
.
File metadata
- Download URL: grout-2.0.1.tar.gz
- Upload date:
- Size: 486.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c27273ce305ff5df0782495218555965f059e337c360765ab0f17c8cade4eab3 |
|
MD5 | c10cb4318684175b1ff8e01856c4846b |
|
BLAKE2b-256 | 64194248c2376d8800023a6b364b29e30ef0324e7115f445e4b4ad6a2b311750 |