Skip to main content

Graph generation and storage library with update tracking

Project description

Graph Context Banner

This library builds a graph (entities and relations) incrementally, and stores them in append-only logs. Now supports optional community tagging.

Python Versions License: MIT Tests Coverage Code Quality PyPI version

Table of Contents

Architecture

The library follows a modular architecture with the following components:

Core Components

  1. GraphBuilder

    • Main class for building and managing the graph.
    • Supports incremental entity and relation updates.
    • Maintains append-only logs for all changes.
    • Integrates with configurable indexers for efficient lookups.
  2. GraphCompactor

    • Processes the append-only logs to create compacted representations.
    • Merges entity updates based on timestamps.
    • Builds adjacency lists for efficient graph traversal.
    • Supports sharding for large datasets.
  3. Indexers

    • Abstract interface for entity indexing
    • Implementations:
      • SQLiteIndexer: Persistent storage using SQLite
      • MemoryIndexer: In-memory storage with JSON serialization

Architecture Diagram

graph TD
    subgraph "Graph Builder"
        GB[GraphBuilder] --> |add_entity| EL[Entity Log]
        GB --> |add_relation| RL[Relation Log]
        GB --> |index| IDX[Indexer]
    end

    subgraph "Graph Compactor"
        GC[GraphCompactor] --> |read| EL
        GC --> |read| RL
        GC --> |write| ES[Entity Shards]
        GC --> |write| AL[Adjacency Lists]
    end

    subgraph "Storage"
        EL --> |append-only| LOGS[Logs]
        RL --> |append-only| LOGS
        IDX --> |persist| DB[(SQLite DB)]
        ES --> |sharded| STORAGE[Storage]
        AL --> |compacted| STORAGE
    end

    classDef primary fill:#E6D5AC,stroke:#D4B483,stroke-width:2px,color:#000
    classDef secondary fill:#D4E6AC,stroke:#B4D483,stroke-width:2px,color:#000
    classDef storage fill:#ACD4E6,stroke:#83B4D4,stroke-width:2px,color:#000

    class GB,GC primary
    class EL,RL,ES,AL secondary
    class LOGS,DB,STORAGE storage

Storage Structure

output_dir/
├── entities/          # Compacted entity shards
├── relations/         # Relation data
├── logs/             # Append-only update logs
│   ├── entity_updates.jsonl
│   └── relation_updates.jsonl
├── adjacency/        # Compacted adjacency lists
└── index.db         # SQLite index (if using SQLiteIndexer)

Data Model

  • Entities: Nodes in the graph with properties
  • Relations: Directed edges between entities with properties
  • Updates: Timestamped changes to entities and relations
  • Shards: Partitioned storage for efficient processing

Usage

Basic Usage

from graph_builder.storage_manager import GraphBuilder
from graph_builder.config import GraphBuilderConfig

config = GraphBuilderConfig(output_dir="graph_output")
graph = GraphBuilder(config)

# Ingest entities with optional community
graph.add_entity(1, {"name": "Alice", "type": "Person", "community_id": "team_alpha"})
graph.add_entity(2, {"name": "Bob", "type": "Person", "community_id": "team_alpha"})

# Create a relation
graph.add_relation(100, 1, 2, {"type": "FRIENDS_WITH"})

graph.finalize()

Compaction

from graph_builder import GraphCompactor

# Compact the graph data
compactor = GraphCompactor(base_dir="graph_output")
compactor.compact_entities()  # Merge entity updates
compactor.build_adjacency()   # Build adjacency lists

Features

  • Incremental Updates: Support for timestamped updates to entities and relations
  • Efficient Storage: Append-only logs with periodic compaction
  • Flexible Indexing: Choose between SQLite or in-memory indexing
  • Sharding: Support for large datasets through sharding
  • Timestamp Tracking: All changes are tracked with UTC timestamps

Installation

pip install graph-builder

Requirements

  • Python 3.12+
  • SQLite3 (for SQLiteIndexer)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graph_builder-0.2.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graph_builder-0.2.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file graph_builder-0.2.0.tar.gz.

File metadata

  • Download URL: graph_builder-0.2.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for graph_builder-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d22b80a64666a532fe7604a9d9145fb82f3a1871cd55259378bcfc166d90019e
MD5 1964d986a2277e29233a02ee9a4f74aa
BLAKE2b-256 6d2a3cbd3db94a0aecd13eeae9ce6db5f7f2c0d54a3cab359f2974fa86b5ffbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for graph_builder-0.2.0.tar.gz:

Publisher: publish.yml on beanone/graph_builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file graph_builder-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: graph_builder-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for graph_builder-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a1082b77224abc60baaaed4bdded089af7a6c09622933023d7b856c83546a139
MD5 40ccc31daeafe9ff284fc872f512a3bf
BLAKE2b-256 c0853581bd06e92c3b6c79ec799f2c07e4b7fb580c7e05f1e69d4be7b0ceb43b

See more details on using hashes here.

Provenance

The following attestation bundles were made for graph_builder-0.2.0-py3-none-any.whl:

Publisher: publish.yml on beanone/graph_builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page