Skip to main content

Tools to query data from the dynamic annotation framework

Project description

AnalysisDataLink

This repostitory facilitates easy access to the materialized (SQL) database tables. Have a look at https://github.com/seung-lab/AnnotationPipelineOverview to get a better overview of the system.

The database can be accessed directly as described in below. However, it is recommended to use this repository as it not only helps with querying the database but also sets datatypes and converts the geometry coordinates which are stored in a postgis string format.

Access through the DataLink

The DataLink has three hierarchy levels:

  • low level: datalink_base.py
  • query level: datalink.py
  • abstract level: datalink_ext.py

We anticpate that most users operate on the highest level where queries to the different table schemas are predefined for convenient access. However, these functions might be too limited in some cases and require more low level access. We hope that users contribute to this repo by formulating their currently unsupported queries with the means of the lower level modules and adding them to datalink_ext.py.

Example

Accessing synapses from all pyramidal cells onto all other cells:

from analysisdatalink import datalink_ext as de
adle = de.AnalysisDataLinkExt("pinky100", 50, sqlalchemy_database_uri)

# Read all pyramidal cell ids
pyc_ids = adle.query_cell_types("soma_valence", cell_type_include_filter=["e"], return_only_ids=True, exclude_zero_root_ids=True)

# Read synapses restricted to pyramidal cells (takes ~11s and returns 17571 synapses)
synapse_df = adle.query_synapses("pni_synapses_i3", pre_ids=pyc_ids)

See below for how to build the sqlalchemy_database_uri. For convenience, one can define DATABASE_URI as global system variable which will be read if sqlalchemy_database_uri is undefined.

Accessing the SQL databases directly

The SQL database can be accessed in many ways, sqlAlchemy and pandas are a good place to start. Adminer is a good tool to view the database content.

Table naming

All tables are called following a convention:

{dataset_name}_{table_name}_v{materialization_version}

For instance, a synapse table might be called: pinky100_pni_synapses_i3_v38.

Pandas examples

Getting all the cell segment ids (also called root ids):

import pandas as pd
sql_query = "SELECT * FROM pinky100_cellsegment_v38"
df = pd.read_sql(sql_query, database_uri, index_col="id")

where database_uri is build as follows:

database_uri = "postgresql://{user_name}:{password}@{database_ip}/postgres"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AnalysisDataLink-0.2.5.tar.gz (7.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page