Combines DynamicAnnotationDB and PyChunkedGraph
Project description
Materialization Engine
A product of the CAVE (Connectome Annotation Versioning Engine) infrastructure
This is a microservice for creating materialized versions of an analysis database, merging together spatially bound annotation and a segmentation data stored in a chunkedgraph that is frozen in time. The data is stored in a PostgreSQL database where the spatial annotations are leveraging PostGIS point types. The materialization engine can create new time locked versions periodically on a defined schedule as well as one-off versions for specific use cases.
Present functionality:
- A flask microservice as REST API endpoints for creating and querying the materialized databases.
- The backend is powered by workflows running as Celery workers, a task queue implementation used to asynchronously execute work.
Installation
This service is intended to be deployed to a Kubernetes cluster as a series of pods. Local deployment is currently best done by using docker. A docker-compose file is included that will install all the required packages and create a local PostgreSQL database and redis broker that is leveraged by the Celery workers for running tasks.
Docker compose example:
$ docker-compose build
$ docker-compose up
Alternatively one can setup a docker container running PostgreSQL database and a separate Redis container then create a python virtual env and run the following commands:
Setup a redis instance:
$ docker run -p 6379:6379 redis
Setup a Postgres database (with postgis):
$ docker run --name db -v /my/own/datadir:/var/lib/postgresql/data -e POSTGRES_PASSWORD=materialize postgis/postgis
Setup the flask microservice:
$ cd materializationengine
$ python3 -m venv mat_engine
$ source mat_engine/bin/activate
(mat_engine) $: python setup.py install
(mat_engine) $: python run.py
Start a celery worker for processing tasks. Open another terminal:
$ source mat_engine/bin/activate
(mat_engine) $ celery worker --app=run.celery --pool=prefork --hostname=worker.process@%h --queues=processcelery --concurrency=4 --loglevel=INFO -Ofair
Workflow Overview
The materialization engine runs celery workflows that create snapshots of spatial annotation data where each spatial point is linked to a segmentation id that is valid at a specific time point.
There are a few workflows that make up the materialization engine:
- Bulk Upload (Load large spatial and segmentation datasets into a PostgreSQL database)
- Ingest New Annotations (Query and insert underlying segmentation data on spatial points with missing segmentation data)
- Update Root Ids (Query and update expired root ids from the chunkedgraph between a time delta)
- Create Frozen Database (Creates a time locked database for all tables)
- Complete Workflow (Combines the Ingest New Annotations, Update Root Id and Create Frozen Workflows in one, run in series)
Meta
Distributed under the MIT license. See LICENSE
for more information.
Contributing
- Fork it (https://github.com/seung-lab/MaterializationEngine/fork)
- Create your feature branch (
git checkout -b feature/fooBar
) - Commit your changes (
git commit -am 'Add some fooBar'
) - Push to the branch (
git push origin feature/fooBar
) - Create a new Pull Request
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file materializationengine-1.4.2.tar.gz
.
File metadata
- Download URL: materializationengine-1.4.2.tar.gz
- Upload date:
- Size: 54.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | df58ad1938e8b3eb4796e638a4075f13fdaa65b876c71f7b540e807980b2ff84 |
|
MD5 | 8dccb926f11910e76031b6059d375261 |
|
BLAKE2b-256 | 6550c684ab9b2efc28fb07c69109a2ddc1639a3c18b06c45e8a35686a7584e0a |