Skip to main content

Library for converting relational data into graph data (neo4j)

Project description

Tests Neo4j 5.13 Python Versions


Data2Neo banner


Data2Neo is a library that simplifies the convertion of data in relational format to a graph knowledge database. It reliefs you of the cumbersome manual work of writing the conversion code and let's you focus on the conversion schema and data processing.

The library is built specifically for converting data into a neo4j graph (minimum version 5.2). The library further supports extensive customization capabilities to clean and remodel data. As neo4j python client it uses the native neo4j python client.

This library has been developed at the Chair of Systems Design at ETH Zürich.

Installation

pip install data2neo

The Data2Neo library supports Python 3.8+.

Quick Start

A quick example for converting data in a Pandas dataframe into a graph. The full example code can be found under examples. For more details, please checkout the full documentation. We first define a convertion schema in a YAML style config file. In this config file we specify, which entites are converted into which nodes and which relationships.

schema.yaml
ENTITY("Flower"):
    NODE("Flower") flower:
        - sepal_length = Flower.sepal_length
        - sepal_width = Flower.sepal_width
        - petal_length = Flower.petal_width
        - petal_width = append(Flower.petal_width, " milimeters")
    NODE("Species", "BioEntity") species:
        + Name = Flower.species
    RELATIONSHIP(flower, "is", species):
    
ENTITY("Person"):
    NODE("Person") person:
        + ID = Person.ID
        - FirstName = Person.FirstName
        - LastName = Person.LastName
    RELATIONSHIP(person, "likes", MATCH("Species", Name=Person.FavoriteFlower)):
        - Since = "4ever"

The library itself has 2 basic elements, that are required for the conversion: the Converter that handles the conversion itself and an Iterator that iterates over the relational data. The iterator can be implemented for arbitrary data in relational format. Data2Neo currently has preimplemented iterators under:

  • Data2Neo.relational_modules.sqlite for SQLite databases
  • Data2Neo.relational_modules.pandas for Pandas dataframes

We will use the PandasDataFrameIterator from Data2Neo.relational_modules.pandas. Further we will use the IteratorIterator that can wrap multiple iterators to handle multiple dataframes. Since a pandas dataframe has no type/table name associated, we need to specify the name when creating a PandasDataFrameIterator. We also define define a custom function append that can be refered to in the schema file and that appends a string to the attribute value. For an entity with Flower["petal_width"] = 5, the outputed node will have the attribute petal_width = "5 milimeters".

import neo4j
import pandas as pd 
from data2neo.relational_modules.pandas import PandasDataFrameIterator 
from data2neo import IteratorIterator, Converter, Attribute, register_attribute_postprocessor
from data2neo.utils import load_file

# Setup the neo4j uri and credentials
uri = "bolt:localhost:7687"
auth = neo4j.basic_auth("neo4j", "password")

people = ... # a dataframe with peoples data (ID, FirstName, LastName, FavoriteFlower)
people_iterator = PandasDataFrameIterator(people, "Person")
iris = ... # a dataframe with the iris dataset
iris_iterator = PandasDataFrameIterator(iris, "Flower")

# register a custom data processing function
@register_attribute_postprocessor
def append(attribute, append_string):
    new_attribute = Attribute(attribute.key, attribute.value + append_string)
    return new_attribute

# Create IteratorIterator
iterator = IteratorIterator([pandas_iterator, iris_iterator])

# Create converter instance with schema, the final iterator and the graph
converter = Converter(load_file("schema.yaml"), iterator, uri, auth)
# Start the conversion
converter()

Known issues

If you encounter a bug or an unexplainable behavior, please check the known issues list. If your issue is not found, submit a new one.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data2neo-1.4.3.tar.gz (40.5 kB view details)

Uploaded Source

Built Distribution

data2neo-1.4.3-py3-none-any.whl (47.9 kB view details)

Uploaded Python 3

File details

Details for the file data2neo-1.4.3.tar.gz.

File metadata

  • Download URL: data2neo-1.4.3.tar.gz
  • Upload date:
  • Size: 40.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.7

File hashes

Hashes for data2neo-1.4.3.tar.gz
Algorithm Hash digest
SHA256 94c0926755546f818783334b420bdf8d574be6cdee80fbcb5666fbd8dada1168
MD5 830472343fa5176c0af801f6d58b1966
BLAKE2b-256 beffc6fbef045b0fa43eae0e9cd0b919594ebf1629789f04c4f667157e98c801

See more details on using hashes here.

File details

Details for the file data2neo-1.4.3-py3-none-any.whl.

File metadata

  • Download URL: data2neo-1.4.3-py3-none-any.whl
  • Upload date:
  • Size: 47.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.7

File hashes

Hashes for data2neo-1.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f8e36a70ab850e6f82155156976964cb378550adab4b1ca0b356a36c92b19b7f
MD5 b494392c71aba312460102a7c5df3d89
BLAKE2b-256 70f00f05de6ccb7f357cf65ce65ba135123fc840e5b7adcd0a80ed95779637b8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page