Convert unstructured documents into knowledge graphs
Project description
unstructured2graph
Convert unstructured documents into knowledge graphs within Memgraph.
Overview
unstructured2graph enables you to transform any unstructured data (PDFs, URLs, documents) into a graph database, powering Graph Retrieval-Augmented Generation (GraphRAG) applications. It combines:
- Unstructured - Parse and chunk diverse document formats
- LightRAG - Extract entities and relationships using LLMs
- Memgraph - Store and query your knowledge graph
Installation
Install from source:
git clone https://github.com/memgraph/ai-toolkit.git
cd ai-toolkit/unstructured2graph
pip install -e .
For full document support (PDF, DOCX, etc.):
pip install -e ".[all-docs]"
Quick Start
import asyncio
from memgraph_toolbox.api.memgraph import Memgraph
from lightrag_memgraph import MemgraphLightRAGWrapper
from unstructured2graph import from_unstructured, create_index
async def main():
memgraph = Memgraph(user_agent="unstructured2graph")
create_index(memgraph, "Chunk", "hash")
lightrag = MemgraphLightRAGWrapper()
await lightrag.initialize(working_dir="./lightrag_storage")
# Ingest documents from URLs or local files
await from_unstructured(
sources=["https://example.com/doc.pdf", "./local_file.md"],
memgraph=memgraph,
lightrag_wrapper=lightrag,
link_chunks=True, # Create NEXT relationships between chunks
)
await lightrag.afinalize()
asyncio.run(main())
Key Features
| Feature | Description |
|---|---|
| Multi-format parsing | PDFs, URLs, HTML, Markdown, DOCX, and more via Unstructured |
| Automatic chunking | Smart document chunking with configurable options |
| Entity extraction | LLM-powered entity and relationship extraction via LightRAG |
| Vector search | Built-in support for embedding generation and vector indices |
| GraphRAG queries | Combine vector search with graph traversal for enhanced retrieval |
API Reference
Document Processing
parse_source(source, partition_kwargs)- Parse a single file or URL into chunksmake_chunks(sources, partition_kwargs)- Process multiple sources intoChunkedDocumentobjectsfrom_unstructured(sources, memgraph, lightrag_wrapper, ...)- Full ingestion pipeline
Graph Operations
create_nodes_from_list(memgraph, nodes, label, batch_size)- Batch insert nodesconnect_chunks_to_entities(memgraph, chunk_label, entity_label)- Link entities to source chunkslink_nodes_in_order(memgraph, ...)- Create sequential relationships between chunkscreate_vector_search_index(memgraph, label, property)- Create vector index for similarity searchcompute_embeddings(memgraph, label)- Generate embeddings for nodes
Documentation
For detailed usage examples and getting started guides, check out the official documentation:
👉 unstructured2graph Documentation
Requirements
- Python 3.10+
- Memgraph database instance
LLM API Key
This library uses LightRAG for entity and relationship extraction, which requires an LLM API key. Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY="your-api-key"
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unstructured2graph-0.1.5.tar.gz.
File metadata
- Download URL: unstructured2graph-0.1.5.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87521284eec23809287929d9e934a6fc3f192a751e3f5ec8dc53f2c6de2d6031
|
|
| MD5 |
38bf99cfcdbebe8cc7d2dfa2a262099e
|
|
| BLAKE2b-256 |
212d8ebd6bb5c21847a92cdead2416f9a8fbe394ccfa2162bd27a669430934c8
|
File details
Details for the file unstructured2graph-0.1.5-py3-none-any.whl.
File metadata
- Download URL: unstructured2graph-0.1.5-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40e932ab2eed44936b8e7dd0f67732c6a619479b71b91a9bb2583d0ce7a1d27a
|
|
| MD5 |
addb3383238ba2285bc62ad3e13b02ca
|
|
| BLAKE2b-256 |
52a9ad140a1af6e75ffefa9a23c7786d117fe6bd8efce0092f7c4ae4a4dc9db3
|