Graph generation and storage library with update tracking
Project description
This library enables fast graph traversal and lookup from file-based storage with sharded and indexed structure. Now includes community exploration.
Features
- Efficient reading of graph data from JSONL files
- Support for multiple indexing strategies (SQLite and Memory)
- Entity caching for improved performance
- Adjacency list-based neighbor lookup
- Property-based entity search
- Community lookup
- Configurable cache size
Architecture
The library is organized into the following components:
Core Components
GraphReader: Main class for reading and querying graph dataGraphReaderConfig: Configuration class for customizing reader behavior
Indexers
The library supports multiple indexing strategies through a plugin architecture:
BaseIndexer: Abstract base class for indexersSQLiteIndexer: SQLite-based indexing for persistent storageMemoryIndexer: In-memory indexing for faster access
Data Structure
The library expects data to be organized in the following directory structure:
base_dir/
├── entities/
│ └── shard_*.jsonl
├── relations/
│ └── shard_*.jsonl
└── adjacency/
└── adjacency.jsonl
Architecture Diagram
graph TD
GR[GraphReader]
GC[GraphReaderConfig]
BI[BaseIndexer]
SI[SQLiteIndexer]
MI[MemoryIndexer]
EF[Entity Files]
RF[Relation Files]
AF[Adjacency File]
DB[(SQLite DB)]
GR --> GC
GR --> BI
SI --> BI
MI --> BI
GR --> EF
GR --> RF
GR --> AF
SI --> DB
style GR fill:#e6f3ff,stroke:#000000,stroke-width:2px,color:#000000
style GC fill:#e6f3ff,stroke:#000000,stroke-width:2px,color:#000000
style BI fill:#fff2e6,stroke:#000000,stroke-width:2px,color:#000000
style SI fill:#fff2e6,stroke:#000000,stroke-width:2px,color:#000000
style MI fill:#fff2e6,stroke:#000000,stroke-width:2px,color:#000000
style EF fill:#f0fff0,stroke:#000000,stroke-width:2px,color:#000000
style RF fill:#f0fff0,stroke:#000000,stroke-width:2px,color:#000000
style AF fill:#f0fff0,stroke:#000000,stroke-width:2px,color:#000000
style DB fill:#f0fff0,stroke:#000000,stroke-width:2px,color:#000000
Installation
pip install beanone-graph
Usage
from graph_reader import GraphReader, GraphReaderConfig
config = GraphReaderConfig(base_dir="graph_output")
reader = GraphReader(config)
# Get an entity
entity = reader.get_entity(1)
print("Entity:", entity)
# Get neighbors
neighbors = reader.get_neighbors(1)
print("Neighbors:", neighbors)
# Search
matches = reader.search_by_property("name", "Alice")
print("Matches:", matches)
# Get entity's community
community = reader.get_entity_community(1)
print("Community:", community)
# Get members of a community
members = reader.get_community_members("team_alpha")
print("Members:", members)
Search Queries
The library supports powerful search queries with various operators and conditions. Here are comprehensive examples based on a real graph structure:
Basic Property Search
# Simple equality
results = reader.search_by_property("name", "effect")
# Case-insensitive search
results = reader.search_by_property("name", "nuclear", case_sensitive=False)
# Numeric comparison
results = reader.search_by_property("community", 155, operator="==")
Complex Queries
# Multiple conditions with AND
query = "name:effect AND community:155"
results = reader.search(query)
# Multiple conditions with OR
query = "name:effect OR name:nuclear"
results = reader.search(query)
# Combining AND and OR
query = "community:155 AND (levels.0:>=3 OR levels.1:>=17)"
results = reader.search(query)
# Array value search
query = "keywords:protein OR keywords:formula"
results = reader.search(query)
# Multiple properties
query = "community:155 AND keywords:protein AND levels.0:>=3"
results = reader.search(query)
Search Operators
The following operators are supported:
:- Equals (default)- For arrays: checks if the value is in the array
- Example:
keywords:proteinmatches if "protein" is in the keywords array
:@- Array membership- Checks if the property value is in the specified array
- Example:
type:@['user', 'admin']matches if type is either 'user' or 'admin'
==- Equals (explicit)!=- Not equals>- Greater than>=- Greater than or equal<- Less than<=- Less than or equalAND- Logical ANDOR- Logical OR(and)- Grouping
Search Examples by Use Case
Entity Search
# Find entities by community and level
query = "community:155 AND levels.0:>=3"
# Find entities by keywords (array contains)
query = "keywords:protein AND keywords:formula"
# Find entities by name pattern
query = "name:*nuclear*"
# Find entities with multiple keywords
query = "keywords:protein AND keywords:home AND keywords:formula"
# Find entities by type (array membership)
query = "type:@['user', 'admin']"
Level-based Search
# Find entities with high level 0 values
query = "levels.0:>=50"
# Find entities with specific level combinations
query = "levels.0:>=3 AND levels.1:>=17"
# Find entities with no higher level connections
query = "levels.3:0 AND levels.4:0"
Community Search
# Find entities in specific communities
query = "community:155 OR community:245"
# Find entities by community and keywords
query = "community:155 AND keywords:protein"
# Find entities by community and level distribution
query = "community:155 AND levels.0:>=3 AND levels.1:>=17"
# Find entities with specific keyword combinations
query = "keywords:protein AND keywords:home AND keywords:formula"
Best Practices
- Use parentheses to group complex conditions
- Combine related conditions with AND
- Use OR for alternative values
- Use numeric operators for ranges
- Consider case sensitivity for text searches
- For array properties:
- Use
:to check if a value exists in an array - Use
:@to check if a property value is in a specified array - Combine multiple array conditions with AND to find entities with all specified values
- Use
- Use wildcards (*) for pattern matching in text fields
- Use dot notation for nested properties (e.g., levels.0)
- Combine community and level information for precise filtering
- Use keyword arrays for semantic search
Configuration
The GraphReaderConfig class supports the following parameters:
base_dir: Base directory containing the graph dataindexer_type: Type of indexer to use ("sqlite" or "memory")cache_size: Maximum number of entities to cache in memory
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file beanone_graph-0.3.0.tar.gz.
File metadata
- Download URL: beanone_graph-0.3.0.tar.gz
- Upload date:
- Size: 28.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eed11ab0e35fa393ec9af4104ff026301098ecd8ff0bfce80c58df54f5ad6575
|
|
| MD5 |
21b2cf02dd45719ba9e93abc5c82e4e6
|
|
| BLAKE2b-256 |
6a85375968f1c254961496ce5c6179fa70a0bfa9d5810933083eee90883f4fcf
|
Provenance
The following attestation bundles were made for beanone_graph-0.3.0.tar.gz:
Publisher:
publish.yml on beanone/graph_reader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
beanone_graph-0.3.0.tar.gz -
Subject digest:
eed11ab0e35fa393ec9af4104ff026301098ecd8ff0bfce80c58df54f5ad6575 - Sigstore transparency entry: 219886473
- Sigstore integration time:
-
Permalink:
beanone/graph_reader@6341f8696485ff15ae81e5270af8d4f8ffa7647e -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/beanone
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6341f8696485ff15ae81e5270af8d4f8ffa7647e -
Trigger Event:
push
-
Statement type:
File details
Details for the file beanone_graph-0.3.0-py3-none-any.whl.
File metadata
- Download URL: beanone_graph-0.3.0-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23b44a8c01d97ffe1fb03b7e7347a487ed219a768e0f8361a8ef8a814c243fe1
|
|
| MD5 |
9e9dea32211e7bc4f1eb1430b05ec3a1
|
|
| BLAKE2b-256 |
5ebb4d2e197aa1a215075ac286f1e89731358713b4b9383c31ebbf4dbee50f44
|
Provenance
The following attestation bundles were made for beanone_graph-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on beanone/graph_reader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
beanone_graph-0.3.0-py3-none-any.whl -
Subject digest:
23b44a8c01d97ffe1fb03b7e7347a487ed219a768e0f8361a8ef8a814c243fe1 - Sigstore transparency entry: 219886474
- Sigstore integration time:
-
Permalink:
beanone/graph_reader@6341f8696485ff15ae81e5270af8d4f8ffa7647e -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/beanone
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6341f8696485ff15ae81e5270af8d4f8ffa7647e -
Trigger Event:
push
-
Statement type: