Dataset-agnostic knowledge graph builder with optional LLM and Neo4j integrations.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nguyentran1410

These details have not been verified by PyPI

Project description

Auto Graph Builder Framework

Transform any dataset into a Knowledge Graph with just a few lines of code

🎯 Vision

A framework that automatically converts structured data (JSON, CSV, Excel) into optimized Neo4j graph databases using LLM-powered entity extraction and optional embeddings.

Goal: Enable developers to build knowledge graphs without graph expertise.

💡 The Problem

Building a graph database today requires:

# Current approach: 100+ lines of code + expertise

1. Manually design graph schema
   - Which nodes? Which relationships?
   - What properties to store?
   
2. Write entity extraction logic
   - Parse text fields
   - Extract organizations, people, locations
   - Handle edge cases
   
3. Create database schema
   - Write Cypher for constraints
   - Create indexes for performance
   - Test and debug
   
4. Insert data
   - Write complex Cypher queries
   - Handle batching
   - Error handling
   
5. Add embeddings (optional)
   - Generate vectors
   - Create vector indexes
   - Sync with graph

6. Optimize & maintain
   - Monitor performance
   - Add new indexes
   - Update schema

Result: Takes days/weeks, requires Neo4j expertise, error-prone.

✨ Our Solution

# Auto Graph approach: 3 lines

from autograph import GraphBuilder

builder = GraphBuilder(
    llm_api_key="sk-...",           # OpenAI or Anthropic
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password",
    use_embeddings=True              # Optional
)

builder.load_data("data.json").build_graph()

Result: Working knowledge graph in minutes, zero graph expertise needed.

What Happens Automatically

1. 🤖 LLM analyzes your data structure
   → Designs optimal graph schema
   → Suggests node types & relationships

2. 🧠 LLM extracts entities from text
   → Organizations, people, locations, etc.
   → Creates connections automatically

3. ⚡ Framework optimizes database
   → Creates constraints for uniqueness
   → Adds indexes for performance
   → Batch processing for efficiency

4. 📊 Optional: Generates embeddings
   → Vector representations of text
   → Enables semantic search
   → Hybrid graph + vector queries

🌟 Core Capabilities

1. Zero Configuration

No manual schema design
No Cypher knowledge required
Works out of the box

2. LLM-Powered Intelligence

Automatic schema detection
Smart entity extraction
Relationship inference

3. Multi-Format Support

JSON files
CSV spreadsheets
Excel workbooks
Pandas DataFrames

4. Optional Semantic Search

Toggle embeddings on/off
Automatic vector indexing
Hybrid graph + semantic queries

5. Production Ready

Auto-optimization
Batch processing
Error handling
Performance monitoring

🏗️ How It Works

┌──────────────────────────────────────────────────────┐
│              YOUR DATA (JSON/CSV/Excel)              │
└────────────────────┬─────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────┐
│                 AUTO GRAPH BUILDER                   │
│                                                      │
│  Step 1: Analyze Data                               │
│  ├─ LLM examines structure                          │
│  └─ Designs graph schema                            │
│                                                      │
│  Step 2: Extract Entities                           │
│  ├─ LLM reads text fields                           │
│  └─ Identifies entities & relationships             │
│                                                      │
│  Step 3: Build Graph                                │
│  ├─ Create nodes & relationships                    │
│  ├─ Add constraints & indexes                       │
│  └─ Optimize for queries                            │
│                                                      │
│  Step 4: Optional Embeddings                        │
│  ├─ Generate vectors                                │
│  └─ Create vector index                             │
└────────────────────┬─────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────┐
│           NEO4J KNOWLEDGE GRAPH (Ready!)             │
└──────────────────────────────────────────────────────┘

Key Components

Data Loaders - Understand your data format
Schema Analyzer - LLM designs graph structure
Entity Extractor - LLM finds entities in text
Graph Manager - Builds optimized Neo4j database
Embedding Generator - Optional semantic search

🚀 Quick Start

Basic Usage

from autograph import GraphBuilder

# Initialize
builder = GraphBuilder(
    llm_api_key="sk-...",
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password"
)

# Build graph automatically
builder.load_data("data.json").build_graph()

# Search
results = builder.search("AI regulation")

With Semantic Search

builder = GraphBuilder(
    llm_api_key="sk-...",
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password",
    use_embeddings=True  # Enable vector search
)

builder.load_data("data.json").build_graph()

# Semantic search automatically enabled
results = builder.search("government AI policy")

Different Data Formats

# JSON
builder.load_data("news.json")

# CSV
builder.load_data("products.csv")

# Excel
builder.load_data("customers.xlsx")

# All trigger automatic processing
builder.build_graph()

🎨 Example Use Cases

News & Media

Transform RSS feeds and articles into searchable knowledge graphs with automatic entity extraction and topic clustering.

E-Commerce

Build product catalogs with automatic category hierarchies, brand relationships, and customer review connections.

Research & Academia

Create citation networks from paper databases with author affiliations and topic relationships.

Social Networks

Map user interactions, followers, and content sharing patterns automatically from platform exports.

Business Intelligence

Convert CRM data into relationship graphs showing customer journeys, sales patterns, and market segments.

⚙️ Configuration

Minimal Setup

builder = GraphBuilder(
    llm_api_key="sk-...",
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password"
)

Common Options

builder = GraphBuilder(
    # Required
    llm_api_key="sk-...",
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password",
    
    # Optional
    use_embeddings=True,        # Enable semantic search
    llm_provider="openai",      # or "anthropic"
    verbose=True                # Show progress
)

🤖 How LLM Powers the Framework

Automatic Schema Detection

The framework sends sample data to LLM:

Input: Sample of your data

[
  {
    "title": "White House considers AI regulation",
    "source": "NYT",
    "content": "The White House is discussing..."
  }
]

LLM Analyzes and Returns:

{
  "node_types": [
    {"label": "Article", "id_field": "url"},
    {"label": "Source", "id_field": "name"},
    {"label": "Entity", "id_field": "name"}
  ],
  "relationships": [
    {"type": "PUBLISHED_BY", "from": "Article", "to": "Source"},
    {"type": "MENTIONS", "from": "Article", "to": "Entity"}
  ]
}

Framework creates the graph automatically based on this schema.

Automatic Entity Extraction

For each record, LLM extracts entities:

Input Text:

"White House considers vetting AI models. 
The Biden administration is exploring..."

LLM Extracts:

{
  "organizations": ["White House", "Biden administration"],
  "technologies": ["AI models"],
  "people": []
}

Framework creates nodes and relationships for these entities.

⚡ Performance & Costs

Processing Time Estimates

Dataset Size	Processing Time	With Embeddings
100 records	~2 minutes	~5 minutes
1,000 records	~15 minutes	~40 minutes
10,000 records	~2 hours	~6 hours

Using GPT-4 and standard embedding models

Cost Estimates (OpenAI)

For 1,000 records:

Schema analysis: ~$0.01
Entity extraction: ~$2.00
Embeddings (optional): ~$0.02
Total: ~$2.03

Neo4j Requirements

Free tier: 200MB (suitable for ~10-50k articles)
Production: Aura Pro $65/month (8GB+)

🎯 Target Users

Primary:

Data Scientists who know Python but not Neo4j
Backend Developers building features quickly
ML Engineers needing knowledge graphs for RAG

Secondary:

Researchers creating quick prototypes
Startups building MVPs fast
Students learning about graphs

💎 Value Proposition

For Developers

✅ 90% less code required
✅ Zero graph expertise needed
✅ Production-ready in minutes

For Businesses

✅ Faster time to market
✅ Lower development costs
✅ Easier maintenance

For Projects

✅ Rapid prototyping
✅ Easy experimentation
✅ Scalable foundation

🚀 Getting Started

# Install (when available)
pip install auto-graph

# Use
from autograph import GraphBuilder

builder = GraphBuilder(
    llm_api_key="sk-...",
    neo4j_uri="bolt://localhost:7687",
    neo4j_password="password"
)

builder.load_data("your_data.json").build_graph()

📝 Summary

Auto Graph Builder transforms any dataset into a knowledge graph with minimal code.

Zero configuration - LLM handles schema design
Smart extraction - Entities detected automatically
Production ready - Optimized and scalable
Developer friendly - Works like familiar Python libraries

Built for developers who want graphs, not graph expertise.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nguyentran1410

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.2

May 5, 2026

0.1.1

May 5, 2026

This version

0.1.0

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autograph_llmneo4j-0.1.0.tar.gz (33.7 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autograph_llmneo4j-0.1.0-py3-none-any.whl (24.4 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file autograph_llmneo4j-0.1.0.tar.gz.

File metadata

Download URL: autograph_llmneo4j-0.1.0.tar.gz
Upload date: May 5, 2026
Size: 33.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for autograph_llmneo4j-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`79a34ec9493c010c14b5e011fdadbd903e0d746d1b695aa40d1d0a129922c377`
MD5	`336368e2f9900ba3b5512f43da3ce598`
BLAKE2b-256	`b130f207b2896a0001e7b5c4129da9f66cd29bacd9d318e84d564bc26cfe73de`

See more details on using hashes here.

Provenance

The following attestation bundles were made for autograph_llmneo4j-0.1.0.tar.gz:

Publisher: publish-pypi.yml on cds0987/autograph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: autograph_llmneo4j-0.1.0.tar.gz
- Subject digest: 79a34ec9493c010c14b5e011fdadbd903e0d746d1b695aa40d1d0a129922c377
- Sigstore transparency entry: 1440601415
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: cds0987/autograph@27f495377dea2970289471a6ff4656cfca1d3245
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cds0987
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@27f495377dea2970289471a6ff4656cfca1d3245
- Trigger Event: workflow_dispatch

File details

Details for the file autograph_llmneo4j-0.1.0-py3-none-any.whl.

File metadata

Download URL: autograph_llmneo4j-0.1.0-py3-none-any.whl
Upload date: May 5, 2026
Size: 24.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for autograph_llmneo4j-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e4e5a72ac1257f722b07612ded6bad817760f5b19afd240bcfa32b796a57e0b1`
MD5	`ec2de3fe1bdd8ad2e9dc5301e2df6790`
BLAKE2b-256	`2f25875e331f8cf382f54ae03020aef0d36440a79f9655875790ad31a426bd45`

See more details on using hashes here.

Provenance

The following attestation bundles were made for autograph_llmneo4j-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on cds0987/autograph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: autograph_llmneo4j-0.1.0-py3-none-any.whl
- Subject digest: e4e5a72ac1257f722b07612ded6bad817760f5b19afd240bcfa32b796a57e0b1
- Sigstore transparency entry: 1440601516
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: cds0987/autograph@27f495377dea2970289471a6ff4656cfca1d3245
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cds0987
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@27f495377dea2970289471a6ff4656cfca1d3245
- Trigger Event: workflow_dispatch

autograph-llmneo4j 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Auto Graph Builder Framework

🎯 Vision

💡 The Problem

✨ Our Solution

What Happens Automatically

🌟 Core Capabilities

1. Zero Configuration

2. LLM-Powered Intelligence

3. Multi-Format Support

4. Optional Semantic Search

5. Production Ready

🏗️ How It Works

Key Components

🚀 Quick Start

Basic Usage

With Semantic Search

Different Data Formats

🎨 Example Use Cases

News & Media

E-Commerce

Research & Academia

Social Networks

Business Intelligence

⚙️ Configuration

Minimal Setup

Common Options

🤖 How LLM Powers the Framework

Automatic Schema Detection

Automatic Entity Extraction

⚡ Performance & Costs

Processing Time Estimates

Cost Estimates (OpenAI)

Neo4j Requirements

🎯 Target Users

💎 Value Proposition

For Developers

For Businesses

For Projects

🚀 Getting Started

📝 Summary

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance