Skip to main content

A Python library that contains tools for data discovery, data model generation and ingestion for the Neo4j graph database.

Project description

Neo4j Runway

Neo4j Runway is a Python library that simplifies the process of migrating your relational data into a graph. It provides tools that abstract communication with OpenAI to run discovery on your data and generate a data model, as well as tools to generate ingestion code and load your data into a Neo4j instance.

Key Features

  • Data Discovery: Harness OpenAI LLMs to provide valuable insights from your data
  • Graph Data Modeling: Utilize OpenAI and the Instructor Python library to create valid graph data models
  • Code Generation: Generate ingestion code for your preferred method of loading data
  • Data Ingestion: Load your data using Runway's built in implementation of PyIngest - Neo4j's popular ingestion tool

Requirements

Runway uses graphviz to visualize data models. To enjoy this feature please download graphviz.

You'll need a Neo4j instance to fully utilize Runway. Start up a free cloud hosted Aura instance or download the Neo4j Desktop app.

Get Running in Minutes

Follow the steps below or check out Neo4j Runway end-to-end examples

pip install neo4j-runway

Now let's walk through a basic example.

Here we import the modules we'll be using.

import pandas as pd

from neo4j_runway import Discovery, GraphDataModeler, PyIngest, UserInput
from neo4j_runway.code_generation import PyIngestConfigGenerator
from neo4j_runway.llm.openai import OpenAIDiscoveryLLM, OpenAIDataModelingLLM

Discovery

Now we...

  • Define a general description of our data
  • Provide brief descriptions of the columns of interest
  • Provide any use cases we'd like our data model to address
  • Load the data with Pandas
USER_GENERATED_INPUT = UserInput(general_description='This is data on different countries.',
    column_descriptions={
        'id': 'unique id for a country.',
        'name': 'the country name.',
        'phone_code': 'country area code.',
        'capital': 'the capital of the country.',
        'currency_name': "name of the country's currency.",
        'region': 'primary region of the country.',
        'subregion': 'subregion location of the country.',
        'timezones': 'timezones contained within the country borders.',
        'latitude': 'the latitude coordinate of the country center.',
        'longitude': 'the longitude coordinate of the country center.'
    },
    use_cases=[
        "Which region contains the most subregions?", 
        "What currencies are most popular?", 
        "Which countries share timezones?"
    ]
)

data = pd.read_csv("data/csv/countries.csv")

We then initialize our discovery llm. By default we use GPT-4o and define our OpenAI API key in an environment variable.

disc_llm = OpenAIDiscoveryLLM()

And we run discovery on our data.

disc = Discovery(llm=disc_llm, user_input=USER_GENERATED_INPUT, data=data)
disc.run()

Data Modeling

We can now pass our Discovery object to a GraphDataModeler to generate our initial data model. A Discovery object isn't required here, but it provides rich context to the LLM to achieve the best results.

modeling_llm = OpenAIDataModelingLLM()
gdm = GraphDataModeler(llm=modeling_llm, discovery=disc)
gdm.create_initial_model()

If we have graphviz installed, we can take a look at our model.

gdm.current_model.visualize()

countries-first-model.png

Let's make some corrections to our model and view the results.

gdm.iterate_model(user_corrections="""
Make Region node have a HAS_SUBREGION relationship with Subregion node. 
Remove The relationship between Country and Region.
""")
gdm.current_model.visualize()

countries-second-model.png

Code Generation

We can now use our data model to generate some ingestion code.

gen = PyIngestConfigGenerator(data_model=gdm.current_model, 
                         username="neo4j", password="password", 
                         uri="bolt://localhost:7687", database="neo4j", 
                         csv_dir="data/csv/", csv_name="countries.csv")

pyingest_yaml = gen.generate_config_string()

Ingestion

We will use the generated PyIngest yaml config to ingest our CSV into our Neo4j instance.

PyIngest(config=pyingest_yaml, dataframe=data)

We can also save this as a .yaml file and use with the original PyIngest.

gen.generate_config_yaml(file_name="countries.yaml")

Here's a snapshot of our new graph!

countries-graph.png

Limitations

The current project is in beta and has the following limitations:

  • Single CSV input only for data model generation
  • Nodes may only have a single label
  • Only uniqueness and node / relationship key constraints are supported
  • CSV columns that refer to the same node property are not supported in model generation
  • Only OpenAI models may be used at this time
  • The modified PyIngest function included with Runway only supports loading a local Pandas DataFrame or CSVs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neo4j_runway-0.11.0.tar.gz (60.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neo4j_runway-0.11.0-py3-none-any.whl (89.4 kB view details)

Uploaded Python 3

File details

Details for the file neo4j_runway-0.11.0.tar.gz.

File metadata

  • Download URL: neo4j_runway-0.11.0.tar.gz
  • Upload date:
  • Size: 60.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.6.0

File hashes

Hashes for neo4j_runway-0.11.0.tar.gz
Algorithm Hash digest
SHA256 0c3e1bc9ca80826dc417b6d1b05464484259d1e12d6784f814139a114ece87ff
MD5 d587f8c6b4ebe2ae8061b060c49da18e
BLAKE2b-256 26830ea0cec661c0925c6b0c8db0ec4ad28156c804fec9e80edcd1f0e481cf48

See more details on using hashes here.

File details

Details for the file neo4j_runway-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: neo4j_runway-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 89.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.6.0

File hashes

Hashes for neo4j_runway-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6a8e8ef725710702229f4c47c77319268ca6293604790097950f8c928a94810a
MD5 ebf49fc034a90d129e43f4d08c1e56e3
BLAKE2b-256 1a69c5000b488ba2ff1eef76df9280c8193c9759b971a8c43309ff598645397c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page