Dataset-agnostic knowledge graph builder with optional LLM and Neo4j integrations.
Project description
Auto Graph Builder Framework
Transform any dataset into a Knowledge Graph with just a few lines of code
๐ฏ Vision
A framework that automatically converts structured data (JSON, CSV, Excel) into optimized Neo4j graph databases using LLM-powered entity extraction and optional embeddings.
Goal: Enable developers to build knowledge graphs without graph expertise.
๐ก The Problem
Building a graph database today requires:
# Current approach: 100+ lines of code + expertise
1. Manually design graph schema
- Which nodes? Which relationships?
- What properties to store?
2. Write entity extraction logic
- Parse text fields
- Extract organizations, people, locations
- Handle edge cases
3. Create database schema
- Write Cypher for constraints
- Create indexes for performance
- Test and debug
4. Insert data
- Write complex Cypher queries
- Handle batching
- Error handling
5. Add embeddings (optional)
- Generate vectors
- Create vector indexes
- Sync with graph
6. Optimize & maintain
- Monitor performance
- Add new indexes
- Update schema
Result: Takes days/weeks, requires Neo4j expertise, error-prone.
โจ Our Solution
# Auto Graph approach: 3 lines
from autograph import GraphBuilder
builder = GraphBuilder(
llm_api_key="sk-...", # OpenAI or Anthropic
neo4j_uri="bolt://localhost:7687",
neo4j_password="password",
use_embeddings=True # Optional
)
builder.load_data("data.json").build_graph()
Result: Working knowledge graph in minutes, zero graph expertise needed.
What Happens Automatically
1. ๐ค LLM analyzes your data structure
โ Designs optimal graph schema
โ Suggests node types & relationships
2. ๐ง LLM extracts entities from text
โ Organizations, people, locations, etc.
โ Creates connections automatically
3. โก Framework optimizes database
โ Creates constraints for uniqueness
โ Adds indexes for performance
โ Batch processing for efficiency
4. ๐ Optional: Generates embeddings
โ Vector representations of text
โ Enables semantic search
โ Hybrid graph + vector queries
๐ Core Capabilities
1. Zero Configuration
- No manual schema design
- No Cypher knowledge required
- Works out of the box
2. LLM-Powered Intelligence
- Automatic schema detection
- Smart entity extraction
- Relationship inference
3. Multi-Format Support
- JSON files
- CSV spreadsheets
- Excel workbooks
- Pandas DataFrames
4. Optional Semantic Search
- Toggle embeddings on/off
- Automatic vector indexing
- Hybrid graph + semantic queries
5. Production Ready
- Auto-optimization
- Batch processing
- Error handling
- Performance monitoring
๐๏ธ How It Works
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ YOUR DATA (JSON/CSV/Excel) โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AUTO GRAPH BUILDER โ
โ โ
โ Step 1: Analyze Data โ
โ โโ LLM examines structure โ
โ โโ Designs graph schema โ
โ โ
โ Step 2: Extract Entities โ
โ โโ LLM reads text fields โ
โ โโ Identifies entities & relationships โ
โ โ
โ Step 3: Build Graph โ
โ โโ Create nodes & relationships โ
โ โโ Add constraints & indexes โ
โ โโ Optimize for queries โ
โ โ
โ Step 4: Optional Embeddings โ
โ โโ Generate vectors โ
โ โโ Create vector index โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ NEO4J KNOWLEDGE GRAPH (Ready!) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key Components
- Data Loaders - Understand your data format
- Schema Analyzer - LLM designs graph structure
- Entity Extractor - LLM finds entities in text
- Graph Manager - Builds optimized Neo4j database
- Embedding Generator - Optional semantic search
๐ Quick Start
Basic Usage
from autograph import GraphBuilder
# Initialize
builder = GraphBuilder(
llm_api_key="sk-...",
neo4j_uri="bolt://localhost:7687",
neo4j_password="password"
)
# Build graph automatically
builder.load_data("data.json").build_graph()
# Search
results = builder.search("AI regulation")
With Semantic Search
builder = GraphBuilder(
llm_api_key="sk-...",
neo4j_uri="bolt://localhost:7687",
neo4j_password="password",
use_embeddings=True # Enable vector search
)
builder.load_data("data.json").build_graph()
# Semantic search automatically enabled
results = builder.search("government AI policy")
Different Data Formats
# JSON
builder.load_data("news.json")
# CSV
builder.load_data("products.csv")
# Excel
builder.load_data("customers.xlsx")
# All trigger automatic processing
builder.build_graph()
๐จ Example Use Cases
News & Media
Transform RSS feeds and articles into searchable knowledge graphs with automatic entity extraction and topic clustering.
E-Commerce
Build product catalogs with automatic category hierarchies, brand relationships, and customer review connections.
Research & Academia
Create citation networks from paper databases with author affiliations and topic relationships.
Social Networks
Map user interactions, followers, and content sharing patterns automatically from platform exports.
Business Intelligence
Convert CRM data into relationship graphs showing customer journeys, sales patterns, and market segments.
โ๏ธ Configuration
Minimal Setup
builder = GraphBuilder(
llm_api_key="sk-...",
neo4j_uri="bolt://localhost:7687",
neo4j_password="password"
)
Common Options
builder = GraphBuilder(
# Required
llm_api_key="sk-...",
neo4j_uri="bolt://localhost:7687",
neo4j_password="password",
# Optional
use_embeddings=True, # Enable semantic search
llm_provider="openai", # or "anthropic"
verbose=True # Show progress
)
๐ค How LLM Powers the Framework
Automatic Schema Detection
The framework sends sample data to LLM:
Input: Sample of your data
[
{
"title": "White House considers AI regulation",
"source": "NYT",
"content": "The White House is discussing..."
}
]
LLM Analyzes and Returns:
{
"node_types": [
{"label": "Article", "id_field": "url"},
{"label": "Source", "id_field": "name"},
{"label": "Entity", "id_field": "name"}
],
"relationships": [
{"type": "PUBLISHED_BY", "from": "Article", "to": "Source"},
{"type": "MENTIONS", "from": "Article", "to": "Entity"}
]
}
Framework creates the graph automatically based on this schema.
Automatic Entity Extraction
For each record, LLM extracts entities:
Input Text:
"White House considers vetting AI models.
The Biden administration is exploring..."
LLM Extracts:
{
"organizations": ["White House", "Biden administration"],
"technologies": ["AI models"],
"people": []
}
Framework creates nodes and relationships for these entities.
โก Performance & Costs
Processing Time Estimates
| Dataset Size | Processing Time | With Embeddings |
|---|---|---|
| 100 records | ~2 minutes | ~5 minutes |
| 1,000 records | ~15 minutes | ~40 minutes |
| 10,000 records | ~2 hours | ~6 hours |
Using GPT-4 and standard embedding models
Cost Estimates (OpenAI)
For 1,000 records:
- Schema analysis: ~$0.01
- Entity extraction: ~$2.00
- Embeddings (optional): ~$0.02
- Total: ~$2.03
Neo4j Requirements
- Free tier: 200MB (suitable for ~10-50k articles)
- Production: Aura Pro $65/month (8GB+)
๐ฏ Target Users
Primary:
- Data Scientists who know Python but not Neo4j
- Backend Developers building features quickly
- ML Engineers needing knowledge graphs for RAG
Secondary:
- Researchers creating quick prototypes
- Startups building MVPs fast
- Students learning about graphs
๐ Value Proposition
For Developers
- โ 90% less code required
- โ Zero graph expertise needed
- โ Production-ready in minutes
For Businesses
- โ Faster time to market
- โ Lower development costs
- โ Easier maintenance
For Projects
- โ Rapid prototyping
- โ Easy experimentation
- โ Scalable foundation
๐ Getting Started
# Install (when available)
pip install auto-graph
# Use
from autograph import GraphBuilder
builder = GraphBuilder(
llm_api_key="sk-...",
neo4j_uri="bolt://localhost:7687",
neo4j_password="password"
)
builder.load_data("your_data.json").build_graph()
๐ Summary
Auto Graph Builder transforms any dataset into a knowledge graph with minimal code.
- Zero configuration - LLM handles schema design
- Smart extraction - Entities detected automatically
- Production ready - Optimized and scalable
- Developer friendly - Works like familiar Python libraries
Built for developers who want graphs, not graph expertise.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autograph_llmneo4j-0.1.0.tar.gz.
File metadata
- Download URL: autograph_llmneo4j-0.1.0.tar.gz
- Upload date:
- Size: 33.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79a34ec9493c010c14b5e011fdadbd903e0d746d1b695aa40d1d0a129922c377
|
|
| MD5 |
336368e2f9900ba3b5512f43da3ce598
|
|
| BLAKE2b-256 |
b130f207b2896a0001e7b5c4129da9f66cd29bacd9d318e84d564bc26cfe73de
|
Provenance
The following attestation bundles were made for autograph_llmneo4j-0.1.0.tar.gz:
Publisher:
publish-pypi.yml on cds0987/autograph
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autograph_llmneo4j-0.1.0.tar.gz -
Subject digest:
79a34ec9493c010c14b5e011fdadbd903e0d746d1b695aa40d1d0a129922c377 - Sigstore transparency entry: 1440601415
- Sigstore integration time:
-
Permalink:
cds0987/autograph@27f495377dea2970289471a6ff4656cfca1d3245 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/cds0987
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@27f495377dea2970289471a6ff4656cfca1d3245 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file autograph_llmneo4j-0.1.0-py3-none-any.whl.
File metadata
- Download URL: autograph_llmneo4j-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4e5a72ac1257f722b07612ded6bad817760f5b19afd240bcfa32b796a57e0b1
|
|
| MD5 |
ec2de3fe1bdd8ad2e9dc5301e2df6790
|
|
| BLAKE2b-256 |
2f25875e331f8cf382f54ae03020aef0d36440a79f9655875790ad31a426bd45
|
Provenance
The following attestation bundles were made for autograph_llmneo4j-0.1.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on cds0987/autograph
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autograph_llmneo4j-0.1.0-py3-none-any.whl -
Subject digest:
e4e5a72ac1257f722b07612ded6bad817760f5b19afd240bcfa32b796a57e0b1 - Sigstore transparency entry: 1440601516
- Sigstore integration time:
-
Permalink:
cds0987/autograph@27f495377dea2970289471a6ff4656cfca1d3245 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/cds0987
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@27f495377dea2970289471a6ff4656cfca1d3245 -
Trigger Event:
workflow_dispatch
-
Statement type: