Skip to main content

Clinical Linked Data: High-level Python classes to load, model and reshape tabular data imported into Neo4j database

Project description

tab2neo- backend classes

High-level Python classes to load, model and reshape tabular data imported into Neo4j database
IMPORTANT NOTE: tested on versions 4.3.6 and 4.4.11 of Neo4j
python verison: 3.8

Installation

pip install tab2neo

Modules

DATA LOADERS - modules allowing to read data from various formats and write it to neo4j

  • FileDataLoader - Load data into Neo4j, with support the following input formats: sas7bdat, xpt, rda, xls, xlsx, csv See details

MODEL APPLIERS

  • ModelApplier - Class to restructure data in Neo4j database using Class-Relationship model (which as well resides in Neo4j). See details

DATA PROVIDERS

  • DataProvider - To fetch the data already in the database (in particular, the way the data after the transformations with ModelApplier in mode='schema_PROPERTY', or any linked data in Neo4j in mode = 'noschema') See details

MODEL MANAGERS

  • ModelManager - Class to manage metadata nodes (Class-Relationship model)

QUERY BUILDERS

  • QueryBuilder - Class to support creation of cypher queries to work with data in Neo4j

End-to-end data loading and reshaping example

The example code below runs through a use case of data loader, model manager, model applier and data provider.

Importing the data

The call to FileDataLoader in the code below connects to the database using environment variables for your host and log-in credentials. Alternatively you can use the format FileDataLoader(host="bolt://...",credentials=("username","password")) to connect. The call to clean_slate() empties the database, and then load_file reads in your data from the specified filepath. See our example data here.

from model_managers import ModelManager
from data_loaders import FileDataLoader
from model_appliers import ModelApplier
from data_providers import DataProvider

fdl = FileDataLoader()
fdl.clean_slate()
fdl.load_file(
    folder='examples/data/', 
    filename='Record.csv'
)

Now the data will be populated within your database, but it won't be connected in a meaningful way. We have `Source Data Column` nodes containing information about the columns of our data - Study, Subject, Age and Sex, and `Source Data Row` nodes containing information of the two rows. The following code creates a trivial model(Classes and Relationships) from the data using ModelManager.

mm = ModelManager()
mm.create_model_from_data()

Here we can see we now have class and relationship nodes, illustrating connections between Study, Subject, Age and Sex. The red nodes below indicate relationship nodes, while the blue indicate class nodes.

modelmanager example

Now using that class-relationship model we built using ModelManager, we can refactor our data and extract entities of the defined classes into separate nodes. In the code below we use refactor_all to do this, and we can see in the image that age values of 40 and 50 have been extracted into their own nodes.

ma = ModelApplier()
ma.refactor_all()

modelapplier example

Now we have our data set up, we can call it back in a tabular way using DataProvider. Here we call back the Subject, Record and Age classes.

Note: The Record class connects the graph between Subject and Age, and so is required for this call, despite not appearing in the output.

As we have not specified any relationships, we must set infer_rels=True. The argument return_propname = False ensures that we see the label name in our output, and return_nodeid = False removes the id values generated for each unique node in Neo4j from your output.

dp = DataProvider()
dp.get_data_generic(["Subject","Record","Age"],infer_rels=True,return_propname=False,return_nodeid=False) 
            Subject  Age 
    0            S001      30
    1            S002      40

Note: ModelManager as well allows to create addtional schema classes with the following functions, however the content of tab2neo package at this stage does not allow to populate those new classes with derived data. This functionality will become available in later releases.

We can create some additional classes using create_class from ModelManager:

mm.create_class([
    {'label': 'Parameter', 'short_label': 'PARAM'}, 
    {'label': 'Analysis Value (C)', 'short_label': 'AVALC'}, 
    {'label': 'Analysis Value', 'short_label': 'AVAL'}, 
    {'label': 'Record', 'short_label': 'RECORD'}
    ])

And we can also create related classes, where for each triplet, the required classes and the relationships between them will be created:

mm.create_related_classes_from_list([
    ['Subject', 'Record', 'Record'],
    ['Record', 'Parameter', 'Parameter'],
    ['Record', 'Analysis Value', 'Analysis Value'],
    ['Record', 'Analysis Value (C)', 'Analysis Value (C)'],
]
)

And finally we can create Term nodes using create_ct. Here the class with label 'Parameter' is being linked with [:HAS_CONTROLLED_TERM] relationships to 'Age' and 'Sex' Term nodes.

mm.create_ct(
    {
    'Parameter': [{'rdfs:label': 'Age'}, {'rdfs:label': 'Sex'}],               
    }
)

Dependencies:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tab2neo-1.2.5.0.tar.gz (69.0 kB view details)

Uploaded Source

Built Distribution

tab2neo-1.2.5.0-py3-none-any.whl (69.0 kB view details)

Uploaded Python 3

File details

Details for the file tab2neo-1.2.5.0.tar.gz.

File metadata

  • Download URL: tab2neo-1.2.5.0.tar.gz
  • Upload date:
  • Size: 69.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for tab2neo-1.2.5.0.tar.gz
Algorithm Hash digest
SHA256 bbc7527642ca5ef90c8963c15815500f51ad9b6ef3a71cb67908642bbafdc8bb
MD5 d539317d7a447fa8cbc799fb91544800
BLAKE2b-256 36742ab6733458a766ddfe7c9bc4555e1e0d9b6ad2fec312e1642b88cea1b0a7

See more details on using hashes here.

File details

Details for the file tab2neo-1.2.5.0-py3-none-any.whl.

File metadata

  • Download URL: tab2neo-1.2.5.0-py3-none-any.whl
  • Upload date:
  • Size: 69.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for tab2neo-1.2.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8c80ad9973bc3dc6809cf19b57a10d91e1adaa111fd84499ab742ebe79b6f2e
MD5 9b31529ce531a09e186f73355e70a278
BLAKE2b-256 f3de3d223ec13930dc79538fabfe1976db7fd504ec9b5cb1bb964cbec39440cc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page