Skip to main content

Convert unstructured documents into knowledge graphs

Project description

unstructured2graph

Convert unstructured documents into knowledge graphs within Memgraph.

Overview

unstructured2graph enables you to transform any unstructured data (PDFs, URLs, documents) into a graph database, powering Graph Retrieval-Augmented Generation (GraphRAG) applications. It combines:

  • Unstructured - Parse and chunk diverse document formats
  • LightRAG - Extract entities and relationships using LLMs
  • Memgraph - Store and query your knowledge graph

Installation

Install from source:

git clone https://github.com/memgraph/ai-toolkit.git
cd ai-toolkit/unstructured2graph
pip install -e .

For full document support (PDF, DOCX, etc.):

pip install -e ".[all-docs]"

Quick Start

import asyncio
from memgraph_toolbox.api.memgraph import Memgraph
from lightrag_memgraph import MemgraphLightRAGWrapper
from unstructured2graph import from_unstructured, create_index

async def main():
    memgraph = Memgraph(user_agent="unstructured2graph")
    create_index(memgraph, "Chunk", "hash")

    lightrag = MemgraphLightRAGWrapper()
    await lightrag.initialize(working_dir="./lightrag_storage")

    # Ingest documents from URLs or local files
    await from_unstructured(
        sources=["https://example.com/doc.pdf", "./local_file.md"],
        memgraph=memgraph,
        lightrag_wrapper=lightrag,
        link_chunks=True,  # Create NEXT relationships between chunks
    )
    await lightrag.afinalize()

asyncio.run(main())

Key Features

Feature Description
Multi-format parsing PDFs, URLs, HTML, Markdown, DOCX, and more via Unstructured
Automatic chunking Smart document chunking with configurable options
Entity extraction LLM-powered entity and relationship extraction via LightRAG
Vector search Built-in support for embedding generation and vector indices
GraphRAG queries Combine vector search with graph traversal for enhanced retrieval

API Reference

Document Processing

  • parse_source(source, partition_kwargs) - Parse a single file or URL into chunks
  • make_chunks(sources, partition_kwargs) - Process multiple sources into ChunkedDocument objects
  • from_unstructured(sources, memgraph, lightrag_wrapper, ...) - Full ingestion pipeline

Graph Operations

  • create_nodes_from_list(memgraph, nodes, label, batch_size) - Batch insert nodes
  • connect_chunks_to_entities(memgraph, chunk_label, entity_label) - Link entities to source chunks
  • link_nodes_in_order(memgraph, ...) - Create sequential relationships between chunks
  • create_vector_search_index(memgraph, label, property) - Create vector index for similarity search
  • compute_embeddings(memgraph, label) - Generate embeddings for nodes

Documentation

For detailed usage examples and getting started guides, check out the official documentation:

👉 unstructured2graph Documentation

Requirements

  • Python 3.10+
  • Memgraph database instance

LLM API Key

This library uses LightRAG for entity and relationship extraction, which requires an LLM API key. Set your OpenAI API key as an environment variable:

export OPENAI_API_KEY="your-api-key"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unstructured2graph-0.1.5.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unstructured2graph-0.1.5-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file unstructured2graph-0.1.5.tar.gz.

File metadata

  • Download URL: unstructured2graph-0.1.5.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for unstructured2graph-0.1.5.tar.gz
Algorithm Hash digest
SHA256 87521284eec23809287929d9e934a6fc3f192a751e3f5ec8dc53f2c6de2d6031
MD5 38bf99cfcdbebe8cc7d2dfa2a262099e
BLAKE2b-256 212d8ebd6bb5c21847a92cdead2416f9a8fbe394ccfa2162bd27a669430934c8

See more details on using hashes here.

File details

Details for the file unstructured2graph-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: unstructured2graph-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for unstructured2graph-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 40e932ab2eed44936b8e7dd0f67732c6a619479b71b91a9bb2583d0ce7a1d27a
MD5 addb3383238ba2285bc62ad3e13b02ca
BLAKE2b-256 52a9ad140a1af6e75ffefa9a23c7786d117fe6bd8efce0092f7c4ae4a4dc9db3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page