Tools to query data from the dynamic annotation framework

Project description

AnalysisDataLink

This repostitory facilitates easy access to the materialized (SQL) database tables. Have a look at https://github.com/seung-lab/AnnotationPipelineOverview to get a better overview of the system.

The database can be accessed directly as described in below. However, it is recommended to use this repository as it not only helps with querying the database but also sets datatypes and converts the geometry coordinates which are stored in a postgis string format.

Access through the DataLink

The DataLink has three hierarchy levels:

low level: datalink_base.py
query level: datalink.py
abstract level: datalink_ext.py

We anticpate that most users operate on the highest level where queries to the different table schemas are predefined for convenient access. However, these functions might be too limited in some cases and require more low level access. We hope that users contribute to this repo by formulating their currently unsupported queries with the means of the lower level modules and adding them to datalink_ext.py.

Example

Accessing synapses from all pyramidal cells onto all other cells:

from analysisdatalink import datalink_ext as de
adle = de.AnalysisDataLinkExt("pinky100", 50, sqlalchemy_database_uri)

# Read all pyramidal cell ids
pyc_ids = adle.query_cell_types("soma_valence", cell_type_include_filter=["e"], return_only_ids=True, exclude_zero_root_ids=True)

# Read synapses restricted to pyramidal cells (takes ~11s and returns 17571 synapses)
synapse_df = adle.query_synapses("pni_synapses_i3", pre_ids=pyc_ids)

See below for how to build the sqlalchemy_database_uri. For convenience, one can define DATABASE_URI as global system variable which will be read if sqlalchemy_database_uri is undefined.

Accessing the SQL databases directly

The SQL database can be accessed in many ways, sqlAlchemy and pandas are a good place to start. Adminer is a good tool to view the database content.

Table naming

All tables are called following a convention:

{dataset_name}_{table_name}_v{materialization_version}

For instance, a synapse table might be called: pinky100_pni_synapses_i3_v38.

Pandas examples

Getting all the cell segment ids (also called root ids):

import pandas as pd
sql_query = "SELECT * FROM pinky100_cellsegment_v38"
df = pd.read_sql(sql_query, database_uri, index_col="id")

where database_uri is build as follows:

database_uri = "postgresql://{user_name}:{password}@{database_ip}/postgres"

Project details

Release history Release notifications | RSS feed

This version

0.4.1

Aug 23, 2019

0.4.0

Aug 20, 2019

0.3.0

Aug 19, 2019

0.2.5

Aug 19, 2019

0.2.0

May 6, 2019

0.1.4

Dec 4, 2018

0.1.3

Dec 3, 2018

0.1.2

Dec 3, 2018

0.1.1

Nov 16, 2018

0.1.0

Nov 16, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AnalysisDataLink-0.4.1.tar.gz (7.9 kB view details)

Uploaded Aug 23, 2019 Source

File details

Details for the file AnalysisDataLink-0.4.1.tar.gz.

File metadata

Download URL: AnalysisDataLink-0.4.1.tar.gz
Upload date: Aug 23, 2019
Size: 7.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.7

File hashes

Hashes for AnalysisDataLink-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`8cda40e1f0fc5add19678a120a6b327c46adc7186673ff6c4c3049eeb51d432d`
MD5	`0fde93bc0206987932038807210b4be2`
BLAKE2b-256	`426b8cced6f42bb5a3ec3b318b2e4aedb531494a58b6cb4d8afd0773d229b440`

See more details on using hashes here.

AnalysisDataLink 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta