Skip to main content

Dataset-agnostic knowledge graph builder with optional LLM and Neo4j integrations.

Project description

Auto Graph Builder Framework

Transform any dataset into a Knowledge Graph with just a few lines of code


๐ŸŽฏ Vision

A framework that automatically converts structured data (JSON, CSV, Excel) into optimized Neo4j graph databases using LLM-powered entity extraction and optional embeddings.

Goal: Enable developers to build knowledge graphs without graph expertise.


๐Ÿ’ก The Problem

Building a graph database today requires:

# Current approach: 100+ lines of code + expertise

1. Manually design graph schema
   - Which nodes? Which relationships?
   - What properties to store?
   
2. Write entity extraction logic
   - Parse text fields
   - Extract organizations, people, locations
   - Handle edge cases
   
3. Create database schema
   - Write Cypher for constraints
   - Create indexes for performance
   - Test and debug
   
4. Insert data
   - Write complex Cypher queries
   - Handle batching
   - Error handling
   
5. Add embeddings (optional)
   - Generate vectors
   - Create vector indexes
   - Sync with graph

6. Optimize & maintain
   - Monitor performance
   - Add new indexes
   - Update schema

Result: Takes days/weeks, requires Neo4j expertise, error-prone.


โœจ Our Solution

# Auto Graph approach: 3 lines

from autograph import GraphBuilder

builder = GraphBuilder(
    llm_api_key="sk-...",           # OpenAI or Anthropic
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password",
    use_embeddings=True              # Optional
)

builder.load_data("data.json").build_graph()

Result: Working knowledge graph in minutes, zero graph expertise needed.

What Happens Automatically

1. ๐Ÿค– LLM analyzes your data structure
   โ†’ Designs optimal graph schema
   โ†’ Suggests node types & relationships

2. ๐Ÿง  LLM extracts entities from text
   โ†’ Organizations, people, locations, etc.
   โ†’ Creates connections automatically

3. โšก Framework optimizes database
   โ†’ Creates constraints for uniqueness
   โ†’ Adds indexes for performance
   โ†’ Batch processing for efficiency

4. ๐Ÿ“Š Optional: Generates embeddings
   โ†’ Vector representations of text
   โ†’ Enables semantic search
   โ†’ Hybrid graph + vector queries

๐ŸŒŸ Core Capabilities

1. Zero Configuration

  • No manual schema design
  • No Cypher knowledge required
  • Works out of the box

2. LLM-Powered Intelligence

  • Automatic schema detection
  • Smart entity extraction
  • Relationship inference

3. Multi-Format Support

  • JSON files
  • CSV spreadsheets
  • Excel workbooks
  • Pandas DataFrames

4. Optional Semantic Search

  • Toggle embeddings on/off
  • Automatic vector indexing
  • Hybrid graph + semantic queries

5. Production Ready

  • Auto-optimization
  • Batch processing
  • Error handling
  • Performance monitoring

๐Ÿ—๏ธ How It Works

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              YOUR DATA (JSON/CSV/Excel)              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚
                     โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 AUTO GRAPH BUILDER                   โ”‚
โ”‚                                                      โ”‚
โ”‚  Step 1: Analyze Data                               โ”‚
โ”‚  โ”œโ”€ LLM examines structure                          โ”‚
โ”‚  โ””โ”€ Designs graph schema                            โ”‚
โ”‚                                                      โ”‚
โ”‚  Step 2: Extract Entities                           โ”‚
โ”‚  โ”œโ”€ LLM reads text fields                           โ”‚
โ”‚  โ””โ”€ Identifies entities & relationships             โ”‚
โ”‚                                                      โ”‚
โ”‚  Step 3: Build Graph                                โ”‚
โ”‚  โ”œโ”€ Create nodes & relationships                    โ”‚
โ”‚  โ”œโ”€ Add constraints & indexes                       โ”‚
โ”‚  โ””โ”€ Optimize for queries                            โ”‚
โ”‚                                                      โ”‚
โ”‚  Step 4: Optional Embeddings                        โ”‚
โ”‚  โ”œโ”€ Generate vectors                                โ”‚
โ”‚  โ””โ”€ Create vector index                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚
                     โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           NEO4J KNOWLEDGE GRAPH (Ready!)             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Components

  1. Data Loaders - Understand your data format
  2. Schema Analyzer - LLM designs graph structure
  3. Entity Extractor - LLM finds entities in text
  4. Graph Manager - Builds optimized Neo4j database
  5. Embedding Generator - Optional semantic search

๐Ÿš€ Quick Start

Basic Usage

from autograph import GraphBuilder

# Initialize
builder = GraphBuilder(
    llm_api_key="sk-...",
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password"
)

# Build graph automatically
builder.load_data("data.json").build_graph()

# Search
results = builder.search("AI regulation")

With Semantic Search

builder = GraphBuilder(
    llm_api_key="sk-...",
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password",
    use_embeddings=True  # Enable vector search
)

builder.load_data("data.json").build_graph()

# Semantic search automatically enabled
results = builder.search("government AI policy")

Different Data Formats

# JSON
builder.load_data("news.json")

# CSV
builder.load_data("products.csv")

# Excel
builder.load_data("customers.xlsx")

# All trigger automatic processing
builder.build_graph()

๐ŸŽจ Example Use Cases

News & Media

Transform RSS feeds and articles into searchable knowledge graphs with automatic entity extraction and topic clustering.

E-Commerce

Build product catalogs with automatic category hierarchies, brand relationships, and customer review connections.

Research & Academia

Create citation networks from paper databases with author affiliations and topic relationships.

Social Networks

Map user interactions, followers, and content sharing patterns automatically from platform exports.

Business Intelligence

Convert CRM data into relationship graphs showing customer journeys, sales patterns, and market segments.


โš™๏ธ Configuration

Minimal Setup

builder = GraphBuilder(
    llm_api_key="sk-...",
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password"
)

Common Options

builder = GraphBuilder(
    # Required
    llm_api_key="sk-...",
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password",
    
    # Optional
    use_embeddings=True,        # Enable semantic search
    llm_provider="openai",      # or "anthropic"
    verbose=True                # Show progress
)

๐Ÿค– How LLM Powers the Framework

Automatic Schema Detection

The framework sends sample data to LLM:

Input: Sample of your data

[
  {
    "title": "White House considers AI regulation",
    "source": "NYT",
    "content": "The White House is discussing..."
  }
]

LLM Analyzes and Returns:

{
  "node_types": [
    {"label": "Article", "id_field": "url"},
    {"label": "Source", "id_field": "name"},
    {"label": "Entity", "id_field": "name"}
  ],
  "relationships": [
    {"type": "PUBLISHED_BY", "from": "Article", "to": "Source"},
    {"type": "MENTIONS", "from": "Article", "to": "Entity"}
  ]
}

Framework creates the graph automatically based on this schema.


Automatic Entity Extraction

For each record, LLM extracts entities:

Input Text:

"White House considers vetting AI models. 
The Biden administration is exploring..."

LLM Extracts:

{
  "organizations": ["White House", "Biden administration"],
  "technologies": ["AI models"],
  "people": []
}

Framework creates nodes and relationships for these entities.


โšก Performance & Costs

Processing Time Estimates

Dataset Size Processing Time With Embeddings
100 records ~2 minutes ~5 minutes
1,000 records ~15 minutes ~40 minutes
10,000 records ~2 hours ~6 hours

Using GPT-4 and standard embedding models

Cost Estimates (OpenAI)

For 1,000 records:

  • Schema analysis: ~$0.01
  • Entity extraction: ~$2.00
  • Embeddings (optional): ~$0.02
  • Total: ~$2.03

Neo4j Requirements

  • Free tier: 200MB (suitable for ~10-50k articles)
  • Production: Aura Pro $65/month (8GB+)

๐ŸŽฏ Target Users

Primary:

  • Data Scientists who know Python but not Neo4j
  • Backend Developers building features quickly
  • ML Engineers needing knowledge graphs for RAG

Secondary:

  • Researchers creating quick prototypes
  • Startups building MVPs fast
  • Students learning about graphs

๐Ÿ’Ž Value Proposition

For Developers

  • โœ… 90% less code required
  • โœ… Zero graph expertise needed
  • โœ… Production-ready in minutes

For Businesses

  • โœ… Faster time to market
  • โœ… Lower development costs
  • โœ… Easier maintenance

For Projects

  • โœ… Rapid prototyping
  • โœ… Easy experimentation
  • โœ… Scalable foundation

๐Ÿš€ Getting Started

# Install (when available)
pip install auto-graph

# Use
from autograph import GraphBuilder

builder = GraphBuilder(
    llm_api_key="sk-...",
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password"
)

builder.load_data("your_data.json").build_graph()

๐Ÿ“ Summary

Auto Graph Builder transforms any dataset into a knowledge graph with minimal code.

  • Zero configuration - LLM handles schema design
  • Smart extraction - Entities detected automatically
  • Production ready - Optimized and scalable
  • Developer friendly - Works like familiar Python libraries

Built for developers who want graphs, not graph expertise.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autograph_llmneo4j-0.1.2.tar.gz (33.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autograph_llmneo4j-0.1.2-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file autograph_llmneo4j-0.1.2.tar.gz.

File metadata

  • Download URL: autograph_llmneo4j-0.1.2.tar.gz
  • Upload date:
  • Size: 33.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for autograph_llmneo4j-0.1.2.tar.gz
Algorithm Hash digest
SHA256 93b3a854b0929697efca97d019251b5c0b9d7a4100150cbc3c3dd0ea0c2767ed
MD5 2b44d780cfdb9a726cb5feee7d6edca1
BLAKE2b-256 36adc690c923d4ab1905f5e815933c3cda680d63adbc363fba635e0b8134aeca

See more details on using hashes here.

Provenance

The following attestation bundles were made for autograph_llmneo4j-0.1.2.tar.gz:

Publisher: publish-pypi.yml on cds0987/autograph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file autograph_llmneo4j-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for autograph_llmneo4j-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ceee39f8f6b53d39fdf3fa967db4f39734628c4ed3187841daff08026c910435
MD5 7c175df9cd8141feb926e2d8e9c8c2a2
BLAKE2b-256 d799bde65dadd70b4bb519157ac09e2e040a0e32be07cc103037b059ba99bf59

See more details on using hashes here.

Provenance

The following attestation bundles were made for autograph_llmneo4j-0.1.2-py3-none-any.whl:

Publisher: publish-pypi.yml on cds0987/autograph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page