Skip to main content

A high-performance graph database library with Python bindings written in Rust

Project description

KGLite

PyPI version Python versions License: MIT

An embedded knowledge graph engine for Python.

Use it for: local analytics, ETL pipelines, notebooks, embedding in apps, fast prototyping. Not for: multi-user server deployments, cross-call transactions, HA/replication.

Embedded, in-process No server, no network; import and go
In-memory Persistence via save()/load() snapshots
Cypher subset Querying + mutations; returns dict or DataFrame
Single-label nodes Each node has exactly one type
Single-threaded Designed for single-threaded use (see Threading)

Requirements: Python 3.10+ (CPython) | macOS (ARM/Intel), Linux (x86_64/aarch64), Windows (x86_64) | pandas >= 1.5

pip install kglite

Feature Matrix

Feature Status
Embedded / in-process Yes
Cypher queries (MATCH, CREATE, SET, DELETE, MERGE, ...) Yes
DataFrame output (to_df=True) Yes
Graph algorithms (shortest path, centrality, communities) Yes
Persistence (binary snapshots) Yes
Multi-label nodes No
Transactions across calls No
Concurrency / thread safety No
Server mode No

Quick Start

import kglite

graph = kglite.KnowledgeGraph()

# Create nodes and relationships
graph.cypher("CREATE (:Person {name: 'Alice', age: 28, city: 'Oslo'})")
graph.cypher("CREATE (:Person {name: 'Bob', age: 35, city: 'Bergen'})")
graph.cypher("CREATE (:Person {name: 'Charlie', age: 42, city: 'Oslo'})")
graph.cypher("""
    MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
    CREATE (a)-[:KNOWS]->(b)
""")

# Query — returns list[dict]
result = graph.cypher("""
    MATCH (p:Person) WHERE p.age > 30
    RETURN p.name AS name, p.city AS city
    ORDER BY p.age DESC
""")
for row in result:
    print(row['name'], row['city'])

# Or get a pandas DataFrame
df = graph.cypher("MATCH (p:Person) RETURN p.name, p.age ORDER BY p.age", to_df=True)

# Mutations return stats
result = graph.cypher("CREATE (:Person {name: 'Dave', age: 22})")
print(result['stats'])  # {'nodes_created': 1, 'relationships_created': 0, ...}

# Check graph size
print(graph.graph_info())  # {'node_count': 4, 'edge_count': 1, ...}

# Persist to disk and reload
graph.save("my_graph.kgl")
loaded = kglite.load("my_graph.kgl")

Loading Data from DataFrames

For bulk loading (thousands of rows), use the fluent API:

import pandas as pd

users_df = pd.DataFrame({
    'user_id': [1001, 1002, 1003],
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [28, 35, 42]
})

graph.add_nodes(data=users_df, node_type='User', unique_id_field='user_id', node_title_field='name')

edges_df = pd.DataFrame({'source_id': [1001, 1002], 'target_id': [1002, 1003]})
graph.add_connections(data=edges_df, connection_type='KNOWS', source_type='User',
                      source_id_field='source_id', target_type='User', target_id_field='target_id')

graph.cypher("MATCH (u:User) WHERE u.age > 30 RETURN u.name, u.age")

Core Concepts

Nodes have four built-in fields: id (unique within type), title (display name), type (label), plus arbitrary properties. Each node has exactly one type — labels(n) returns a string, not a list.

Relationships connect two nodes with a type (e.g., :KNOWS) and optional properties. The Cypher API calls them "relationships"; the fluent API calls them "connections" — they're the same thing.

Return shape. Read queries return list[dict] — each dict is one row keyed by column alias. Mutation queries (CREATE, SET, DELETE, REMOVE, MERGE) return {'stats': {...}} with counts like nodes_created, properties_set, etc. Mutations with a RETURN clause return {'rows': [...], 'stats': {...}}. Pass to_df=True to get a pandas DataFrame instead (read queries only).

Atomicity. Each cypher() call is atomic at the statement level — if any clause fails, the graph remains unchanged (copy-on-write internally). There are no multi-statement transactions: two separate cypher() calls are independent. Durability only via explicit save().

Selections (fluent API only) are lightweight views — a set of node indices that flow through chained operations like type_filter().filter().traverse(). They don't copy data. Use explain() to see the pipeline.

Tombstones. DELETE leaves empty slots in the internal graph storage. After heavy deletion, check graph.graph_info()['fragmentation_ratio'] and call graph.vacuum() if it exceeds 0.3 (see Graph Maintenance).


Table of Contents


When to Use What

Interface Best For Key Benefits
Cypher (recommended) Ad-hoc queries, exploration, analytics, mutations Standard syntax, declarative, familiar to Neo4j users
Fluent API (advanced) Bulk loading from DataFrames, multi-step pipelines Chainable operations, explain(), computed properties
Pattern matching (specialized) Quick structural checks without full Cypher overhead Lightweight, minimal parsing

Start with Cypher for most tasks. Use the fluent API when bulk-loading from pandas DataFrames or building pipelines that store intermediate computed properties. Use pattern matching for simple structural queries where you don't need WHERE/RETURN clauses.


Common Recipes

Upsert with MERGE

graph.cypher("""
    MERGE (p:Person {email: 'alice@example.com'})
    ON CREATE SET p.created = '2024-01-01', p.name = 'Alice'
    ON MATCH SET p.last_seen = '2024-01-15'
""")

Top-K Nodes by Centrality

top_nodes = graph.pagerank(top_k=10)
for node in top_nodes:
    print(f"{node['title']}: {node['score']:.3f}")

2-Hop Neighborhood

graph.cypher("""
    MATCH (me:Person {name: 'Alice'})-[:KNOWS*2]-(fof:Person)
    WHERE fof <> me
    RETURN DISTINCT fof.name
""")

Export Subgraph

subgraph = (
    graph.type_filter('Person')
    .filter({'name': 'Alice'})
    .expand(hops=2)
    .to_subgraph()
)
subgraph.export('alice_network.graphml', format='graphml')

Create Index for Speed

graph.create_index('Product', 'category')

# ~3x faster on 100k+ node graphs (equality only; depends on selectivity)
result = graph.cypher("MATCH (p:Product) WHERE p.category = 'Electronics' RETURN p.name")

Parameterized Queries

graph.cypher(
    "MATCH (p:Person) WHERE p.city = $city AND p.age > $min_age RETURN p.name",
    params={'city': 'Oslo', 'min_age': 25}
)

Delete Subgraph

graph.cypher("""
    MATCH (u:User) WHERE u.status = 'inactive'
    DETACH DELETE u
""")

Aggregation with Relationship Properties

graph.cypher("""
    MATCH (p:Person)-[r:RATED]->(m:Movie)
    RETURN p.name, avg(r.score) AS avg_rating, count(m) AS movies_rated
    ORDER BY avg_rating DESC
""")

Cypher Queries

A substantial Cypher subset. See the Supported Cypher Subset table for exact coverage.

Single-label note: Each node has exactly one type. labels(n) returns a string, not a list. SET n:OtherLabel is not supported.

result = graph.cypher("""
    MATCH (p:Person)-[:KNOWS]->(f:Person)
    WHERE p.age > 30 AND f.city = 'Oslo'
    RETURN p.name AS person, f.name AS friend, p.age AS age
    ORDER BY p.age DESC
    LIMIT 10
""")

# Read queries → list[dict]
for row in result:
    print(f"{row['person']} knows {row['friend']}")

# Pass to_df=True for a DataFrame
df = graph.cypher("MATCH (n:Person) RETURN n.name, n.age ORDER BY n.age", to_df=True)

WHERE Clause

# Comparisons: =, <>, <, >, <=, >=
graph.cypher("MATCH (n:Product) WHERE n.price >= 500 RETURN n.title, n.price")

# Boolean operators: AND, OR, NOT
graph.cypher("MATCH (n:Person) WHERE n.age > 25 AND NOT n.city = 'Oslo' RETURN n.name")

# Null checks
graph.cypher("MATCH (n:Person) WHERE n.email IS NOT NULL RETURN n.name")

# String predicates: CONTAINS, STARTS WITH, ENDS WITH
graph.cypher("MATCH (n:Person) WHERE n.name CONTAINS 'ali' RETURN n.name")

# IN lists
graph.cypher("MATCH (n:Person) WHERE n.city IN ['Oslo', 'Bergen'] RETURN n.name")

# Regex matching with =~
graph.cypher("MATCH (n:Person) WHERE n.name =~ '(?i)^ali.*' RETURN n.name")
graph.cypher("MATCH (n:Person) WHERE n.email =~ '.*@example\\.com$' RETURN n.name")

Relationship Properties

Relationships can have properties. Access them with r.property syntax:

# Create relationships with properties
graph.cypher("""
    MATCH (p:Person {name: 'Alice'}), (m:Movie {title: 'Inception'})
    CREATE (p)-[:RATED {score: 5, comment: 'Excellent'}]->(m)
""")

# Access, filter, aggregate, sort by relationship properties
graph.cypher("MATCH (p)-[r:RATED]->(m) RETURN p.name, r.score, r.comment, type(r)")
graph.cypher("MATCH (p)-[r:RATED]->(m) WHERE r.score >= 4 RETURN p.name, m.title")
graph.cypher("MATCH (p)-[r:RATED]->(m) RETURN avg(r.score) AS avg_rating")
graph.cypher("MATCH ()-[r:RATED]->(m) RETURN m.title, r.score ORDER BY r.score DESC")

Aggregation

graph.cypher("MATCH (n:Person) RETURN n.city, count(*) AS population ORDER BY population DESC")
graph.cypher("MATCH (n:Person) RETURN avg(n.age) AS avg_age, min(n.age), max(n.age)")

# DISTINCT
graph.cypher("MATCH (n:Person) RETURN DISTINCT n.city")
graph.cypher("MATCH (n:Person) RETURN count(DISTINCT n.city) AS unique_cities")

WITH Clause

graph.cypher("""
    MATCH (p:Person)-[:KNOWS]->(f:Person)
    WITH p, count(f) AS friend_count
    WHERE friend_count > 3
    RETURN p.name, friend_count
    ORDER BY friend_count DESC
""")

OPTIONAL MATCH

Left outer join — keeps rows even when no match:

graph.cypher("""
    MATCH (p:Person)
    OPTIONAL MATCH (p)-[:KNOWS]->(f:Person)
    RETURN p.name, count(f) AS friends
""")

Built-in Functions

Function Description
toUpper(expr) Convert to uppercase
toLower(expr) Convert to lowercase
toString(expr) Convert to string
toInteger(expr) Convert to integer
toFloat(expr) Convert to float
size(expr) Length of string or list
type(r) Relationship type
id(n) Node ID
labels(n) Node type (string, not list — single-label)
coalesce(a, b, ...) First non-null argument
length(p) Path hop count
nodes(p) Nodes in a path
relationships(p) Relationships in a path

Arithmetic

graph.cypher("MATCH (n:Product) RETURN n.title, n.price * 1.25 AS price_with_tax")

CASE Expressions

# Generic form
graph.cypher("""
    MATCH (n:Person)
    RETURN n.name,
           CASE WHEN n.age >= 18 THEN 'adult' ELSE 'minor' END AS category
""")

# Simple form
graph.cypher("""
    MATCH (n:Person)
    RETURN n.name,
           CASE n.city WHEN 'Oslo' THEN 'capital' WHEN 'Bergen' THEN 'west coast' ELSE 'other' END AS region
""")

List Comprehensions

[x IN list WHERE predicate | expression] syntax:

# Map: double each number
graph.cypher("UNWIND [1] AS _ RETURN [x IN [1, 2, 3, 4, 5] | x * 2] AS doubled")
# [2, 4, 6, 8, 10]

# Filter only
graph.cypher("UNWIND [1] AS _ RETURN [x IN [1, 2, 3, 4, 5] WHERE x > 3] AS filtered")
# [4, 5]

# Filter + map
graph.cypher("UNWIND [1] AS _ RETURN [x IN [1, 2, 3, 4, 5] WHERE x > 3 | x * 2] AS result")
# [8, 10]

# With collect() — transform aggregated values
graph.cypher("""
    MATCH (p:Person)
    WITH collect(p.name) AS names
    RETURN [x IN names | toUpper(x)] AS upper_names
""")

Note: List comprehensions require at least one row in the pipeline. Use UNWIND [1] AS _ or a preceding MATCH/WITH to provide the row context.

Map Projections

n {.prop1, .prop2, alias: expr} syntax — select specific properties from a node:

# Select only name and age (returns a dict per row)
graph.cypher("MATCH (p:Person) RETURN p {.name, .age} AS info")
# [{'info': {'name': 'Alice', 'age': 30}}, {'info': {'name': 'Bob', 'age': 25}}]

# Mix shorthand properties with computed values
graph.cypher("""
    MATCH (p:Person)-[:WORKS_AT]->(c:Company)
    RETURN p {.name, .age, company: c.name} AS info
""")
# [{'info': {'name': 'Alice', 'age': 30, 'company': 'Acme'}}, ...]

# System properties (id, type) work too
graph.cypher("MATCH (p:Person) RETURN p {.name, .type, .id} AS info LIMIT 1")
# [{'info': {'name': 'Alice', 'type': 'Person', 'id': 1}}]

Parameters

graph.cypher(
    "MATCH (n:Person) WHERE n.age > $min_age RETURN n.name, n.age",
    params={'min_age': 25}
)

# Parameters in inline pattern properties
graph.cypher(
    "MATCH (n:Person {name: $name}) RETURN n.age",
    params={'name': 'Alice'}
)

# Parameters with DataFrame output
df = graph.cypher(
    "MATCH (n:Person) WHERE n.age > $min_age RETURN n.name, n.age ORDER BY n.age",
    params={'min_age': 20}, to_df=True
)

UNWIND

Expand a list into rows:

graph.cypher("UNWIND [1, 2, 3] AS x RETURN x, x * 2 AS doubled")

UNION

graph.cypher("""
    MATCH (n:Person) WHERE n.city = 'Oslo' RETURN n.name AS name
    UNION
    MATCH (n:Person) WHERE n.age > 30 RETURN n.name AS name
""")

Variable-Length Paths

# 1 to 3 hops
graph.cypher("MATCH (a:Person)-[:KNOWS*1..3]->(b:Person) WHERE a.name = 'Alice' RETURN b.name")

# Exact 2 hops
graph.cypher("MATCH (a:Person)-[:KNOWS*2]->(b:Person) RETURN a.name, b.name")

WHERE EXISTS

Check for subpattern existence. The outer variable (p) is bound from MATCH. Both brace { } and parenthesis (( )) syntax are supported:

# Brace syntax
graph.cypher("MATCH (p:Person) WHERE EXISTS { (p)-[:KNOWS]->(:Person) } RETURN p.name")

# Parenthesis syntax (equivalent)
graph.cypher("MATCH (p:Person) WHERE EXISTS((p)-[:KNOWS]->(:Person)) RETURN p.name")

# Negation
graph.cypher("""
    MATCH (p:Person)
    WHERE NOT EXISTS { (p)-[:PURCHASED]->(:Product) }
    RETURN p.name
""")

# With property filter in inner pattern
graph.cypher("""
    MATCH (p:Person)
    WHERE EXISTS { (p)-[:KNOWS]->(:Person {city: 'Oslo'}) }
    RETURN p.name
""")

shortestPath()

Directed BFS shortest path between two nodes:

result = graph.cypher("""
    MATCH p = shortestPath((a:Person {name: 'Alice'})-[:KNOWS*..10]->(b:Person {name: 'Dave'}))
    RETURN length(p), nodes(p), relationships(p), a.name, b.name
""")
# [{'length(p)': 3, 'nodes(p)': [...], ...}]

# No path → empty list (not an error)

Path functions: length(p) returns hop count, nodes(p) returns node list, relationships(p) returns edge type list.

CREATE / SET / DELETE / REMOVE / MERGE

# CREATE — returns stats
result = graph.cypher("CREATE (n:Person {name: 'Alice', age: 30, city: 'Oslo'})")
print(result['stats']['nodes_created'])  # 1

# CREATE relationship between existing nodes
graph.cypher("""
    MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
    CREATE (a)-[:KNOWS]->(b)
""")

# SET — update properties
result = graph.cypher("MATCH (n:Person {name: 'Bob'}) SET n.age = 26, n.city = 'Stavanger'")
print(result['stats']['properties_set'])  # 2

# DELETE — plain DELETE errors if node has relationships; DETACH removes all
graph.cypher("MATCH (n:Person {name: 'Alice'}) DETACH DELETE n")

# REMOVE — remove properties (id/type are immutable)
graph.cypher("MATCH (n:Person {name: 'Alice'}) REMOVE n.city")

# MERGE — match or create
graph.cypher("""
    MERGE (n:Person {name: 'Alice'})
    ON CREATE SET n.created = 'today'
    ON MATCH SET n.updated = 'today'
""")

Error example: DELETE on a node with relationships returns: "Cannot delete node with existing relationships. Use DETACH DELETE to remove the node and all its relationships."

Mutation Semantics

Atomicity: Each cypher() call is atomic at the statement level — if any clause fails, the graph remains unchanged (copy-on-write internally). There are no multi-statement transactions; two separate cypher() calls are independent. Durability only via explicit save().

Index maintenance: Property and composite indexes are updated automatically by all mutation operations (CREATE, SET, DELETE, REMOVE, MERGE).

DataFrame Output

df = graph.cypher("""
    MATCH (p:Person)-[:KNOWS]->(f:Person)
    WITH p, count(f) AS friends
    RETURN p.name, p.city, friends
    ORDER BY friends DESC
""", to_df=True)

EXPLAIN

Prefix any Cypher query with EXPLAIN to see the query plan without executing it:

plan = graph.cypher("""
    EXPLAIN
    MATCH (p:Person)
    OPTIONAL MATCH (p)-[:KNOWS]->(f:Person)
    WITH p, count(f) AS friends
    RETURN p.name, friends
""")
print(plan)
# Query Plan:
#   1. NodeScan (MATCH) :Person
#   2. FusedOptionalMatchAggregate (optimized OPTIONAL MATCH + count)
#   3. Projection (RETURN) [p.name, friends]
# Optimizations: optional_match_fusion=1

Returns a string (not data). Mutation queries with EXPLAIN are not executed.

Supported Cypher Subset

Category Supported
Clauses MATCH, OPTIONAL MATCH, WHERE, RETURN, WITH, ORDER BY, SKIP, LIMIT, UNWIND, UNION/UNION ALL, CREATE, SET, DELETE, DETACH DELETE, REMOVE, MERGE, EXPLAIN
Patterns Node (n:Type), relationship -[:REL]->, variable-length *1..3, undirected -[:REL]-, properties {key: val}, p = shortestPath(...)
WHERE =, <>, <, >, <=, >=, =~ (regex), AND, OR, NOT, IS NULL, IS NOT NULL, IN [...], CONTAINS, STARTS WITH, ENDS WITH, EXISTS { pattern }, EXISTS(( pattern ))
RETURN n.prop, r.prop, AS aliases, DISTINCT, arithmetic +/-/*//, map projections n {.prop1, .prop2}
Aggregation count(*), count(expr), sum, avg/mean, min, max, collect, std
Expressions CASE WHEN...THEN...ELSE...END, $param, [x IN list WHERE ... | expr]
Functions toUpper, toLower, toString, toInteger, toFloat, size, length, type, id, labels, coalesce, nodes(p), relationships(p)
Mutations CREATE (n:Label {props}), CREATE (a)-[:TYPE]->(b), SET n.prop = expr, DELETE, DETACH DELETE, REMOVE n.prop, MERGE ... ON CREATE SET ... ON MATCH SET
Not supported CALL/stored procedures, FOREACH, subqueries, SET n:Label (label mutation), REMOVE n:Label, multi-label

Advanced API: Data Management

Expand section

For most use cases, use Cypher queries. The fluent API below is for bulk operations from DataFrames or complex data pipelines.

Adding Nodes

products_df = pd.DataFrame({
    'product_id': [101, 102, 103],
    'title': ['Laptop', 'Phone', 'Tablet'],
    'price': [999.99, 699.99, 349.99],
    'stock': [45, 120, 30]
})

graph.add_nodes(
    data=products_df,
    node_type='Product',
    unique_id_field='product_id',
    node_title_field='title',
    columns=['product_id', 'title', 'price', 'stock'],
    column_types={'launch_date': 'datetime'},  # explicit type hints (see Working with Dates)
    conflict_handling='update'  # 'update' | 'replace' | 'skip' | 'preserve'
)

Property Mapping

When adding nodes, unique_id_field and node_title_field are renamed to id and title. The original column names no longer exist as properties.

Your DataFrame Column Stored As Why
unique_id_field (e.g., user_id) id Canonical identifier
node_title_field (e.g., name) title Display/label field
All other columns Same name Preserved as-is
# After adding with unique_id_field='user_id', node_title_field='name':
graph.type_filter('User').filter({'user_id': 1001})  # WRONG — field was renamed
graph.type_filter('User').filter({'id': 1001})        # CORRECT

Use explain() to verify node counts at each step:

result = graph.type_filter('User').filter({'id': 1001})
print(result.explain())
# TYPE_FILTER User (1000 nodes) -> FILTER (1 nodes)

Retrieving Nodes

products = graph.type_filter('Product')
products.get_nodes()                       # all properties
products.get_properties(['price', 'stock'])  # specific properties
products.get_titles()                       # just titles

Working with Dates

graph.add_nodes(
    data=estimates_df,
    node_type='Estimate',
    unique_id_field='estimate_id',
    node_title_field='name',
    column_types={'valid_from': 'datetime', 'valid_to': 'datetime'}
)

graph.type_filter('Estimate').filter({'valid_from': {'>=': '2020-06-01'}})
graph.type_filter('Estimate').valid_at('2020-06-15')
graph.type_filter('Estimate').valid_during('2020-01-01', '2020-06-30')

Creating Connections

purchases_df = pd.DataFrame({
    'user_id': [1001, 1001, 1002],
    'product_id': [101, 103, 102],
    'date': ['2023-01-15', '2023-02-10', '2023-01-20'],
    'quantity': [1, 2, 1]
})

graph.add_connections(
    data=purchases_df,
    connection_type='PURCHASED',
    source_type='User',
    source_id_field='user_id',
    target_type='Product',
    target_id_field='product_id',
    columns=['date', 'quantity']
)

Note: source_type and target_type each refer to a single node type. To connect nodes of the same type, set both to the same value (e.g., source_type='Person', target_type='Person').

Batch Property Updates

result = graph.type_filter('Prospect').filter({'status': 'Inactive'}).update({
    'is_active': False,
    'deactivation_reason': 'status_inactive'
})

updated_graph = result['graph']
print(f"Updated {result['nodes_updated']} nodes")

Advanced API: Querying

Expand section

For most queries, prefer Cypher. The fluent API below is for building reusable query chains or when you need explain() and selection-based workflows.

Filtering

graph.type_filter('Product').filter({'price': 999.99})
graph.type_filter('Product').filter({'price': {'<': 500.0}, 'stock': {'>': 50}})
graph.type_filter('Product').filter({'id': {'in': [101, 103]}})
graph.type_filter('Product').filter({'category': {'is_null': True}})

# Orphan nodes (no connections)
graph.filter_orphans(include_orphans=True)

Sorting

graph.type_filter('Product').sort('price')
graph.type_filter('Product').sort('price', ascending=False)
graph.type_filter('Product').sort([('stock', False), ('price', True)])

Traversing the Graph

alice = graph.type_filter('User').filter({'title': 'Alice'})
alice_products = alice.traverse(connection_type='PURCHASED', direction='outgoing')

# Filter and sort traversal targets
expensive = alice.traverse(
    connection_type='PURCHASED',
    filter_target={'price': {'>=': 500.0}},
    sort_target='price',
    max_nodes=10
)

# Get connection information
alice.get_connections(include_node_properties=True)

Set Operations

n3 = graph.type_filter('Prospect').filter({'geoprovince': 'N3'})
m3 = graph.type_filter('Prospect').filter({'geoprovince': 'M3'})

n3.union(m3)                    # all nodes from both (OR)
n3.intersection(m3)             # nodes in both (AND)
n3.difference(m3)               # nodes in n3 but not m3
n3.symmetric_difference(m3)     # nodes in exactly one (XOR)

Pattern Matching

Expand section

For simpler pattern-based queries without full Cypher clause support:

results = graph.match_pattern(
    '(p:Play)-[:HAS_PROSPECT]->(pr:Prospect)-[:BECAME_DISCOVERY]->(d:Discovery)'
)

for match in results:
    print(f"Play: {match['p']['title']}, Discovery: {match['d']['title']}")

# With property conditions
graph.match_pattern('(u:User)-[:PURCHASED]->(p:Product {category: "Electronics"})')

# Limit results for large graphs
graph.match_pattern('(a:Person)-[:KNOWS]->(b:Person)', max_matches=100)

Graph Algorithms

Expand section

Shortest Path

path = graph.shortest_path(source_type='Person', source_id=1, target_type='Person', target_id=100)
for node in path:
    print(f"{node['type']}: {node['title']}")

All Paths

paths = graph.all_paths(
    source_type='Play', source_id=1,
    target_type='Wellbore', target_id=100,
    max_hops=4
)

Connected Components

components = graph.connected_components()
# Returns list of lists: [[node_indices...], [node_indices...], ...]
print(f"Found {len(components)} connected components")
print(f"Largest component: {len(components[0])} nodes")

graph.are_connected(source_type='Person', source_id=1, target_type='Person', target_id=100)

Centrality Algorithms

All centrality methods return a list of dicts with type, title, id, and score keys, sorted by score descending.

graph.betweenness_centrality(top_k=10)
graph.betweenness_centrality(normalized=True, sample_size=500)
graph.pagerank(top_k=10, damping_factor=0.85)
graph.degree_centrality(top_k=10)
graph.closeness_centrality(top_k=10)

Community Detection

Identify clusters of densely connected nodes.

# Louvain modularity optimization (recommended)
result = graph.louvain_communities()
# {'communities': {0: [{title, type, id}, ...], 1: [...]},
#  'modularity': 0.45, 'num_communities': 2}

for comm_id, members in result['communities'].items():
    names = [m['title'] for m in members]
    print(f"Community {comm_id}: {names}")

# With edge weights and resolution tuning
result = graph.louvain_communities(weight_property='strength', resolution=1.5)

# Label propagation (faster, less precise)
result = graph.label_propagation(max_iterations=100)

Node Degrees

degrees = graph.type_filter('Person').get_degrees()
# Returns: {'Alice': 5, 'Bob': 3, ...}

Spatial Operations

Expand section

Bounding Box

graph.type_filter('Discovery').within_bounds(
    lat_field='latitude', lon_field='longitude',
    min_lat=58.0, max_lat=62.0, min_lon=1.0, max_lon=5.0
)

Distance Queries (Haversine)

graph.type_filter('Wellbore').near_point_km(
    center_lat=60.5, center_lon=3.2, max_distance_km=50.0,
    lat_field='latitude', lon_field='longitude'
)

WKT Geometry Intersection

graph.type_filter('Field').intersects_geometry(
    'POLYGON((1 58, 5 58, 5 62, 1 62, 1 58))',
    geometry_field='wkt_geometry'
)

Point-in-Polygon

graph.type_filter('Block').contains_point(lat=60.5, lon=3.2, geometry_field='wkt_geometry')

Analytics

Expand section

Statistics

price_stats = graph.type_filter('Product').statistics('price')
unique_cats = graph.type_filter('Product').unique_values(property='category', max_length=10)

Calculations

graph.type_filter('Product').calculate(expression='price * 1.1', store_as='price_with_tax')

graph.type_filter('User').traverse('PURCHASED').calculate(
    expression='sum(price * quantity)', store_as='total_spent'
)

graph.type_filter('User').traverse('PURCHASED').count(store_as='product_count', group_by_parent=True)

Connection Aggregation

graph.type_filter('Discovery').traverse('EXTENDS_INTO').calculate(
    expression='sum(share_pct)',
    aggregate_connections=True
)

Supported: sum, avg/mean, min, max, count, std.


Schema and Indexes

Expand section

Schema Definition

graph.define_schema({
    'nodes': {
        'Prospect': {
            'required': ['npdid_prospect', 'prospect_name'],
            'optional': ['prospect_status'],
            'types': {'npdid_prospect': 'integer', 'prospect_name': 'string'}
        }
    },
    'connections': {
        'HAS_ESTIMATE': {'source': 'Prospect', 'target': 'ProspectEstimate'}
    }
})

errors = graph.validate_schema()
schema = graph.get_schema()

Indexes

Indexes accelerate equality lookups only (WHERE n.prop = value). Range conditions (<, >, <=, >=) always scan.

graph.create_index('Prospect', 'prospect_geoprovince')
graph.create_composite_index('Person', ['city', 'age'])

graph.list_indexes()
graph.drop_index('Prospect', 'prospect_geoprovince')

Indexes are maintained automatically by all mutation operations.


Import and Export

Expand section

Saving and Loading

graph.save("my_graph.kgl")
loaded_graph = kglite.load("my_graph.kgl")

Portability: Save files use bincode serialization and are not guaranteed portable across OS, CPU architecture, or library versions. Always re-export via a portable format (GraphML, CSV) when sharing across machines. Each file includes a format version and the library version that wrote it — check with graph_info()['format_version'] and graph_info()['library_version'] after loading. If the internal data structures change between releases, loading will fail with a clear version mismatch error rather than silent corruption.

Export Formats

graph.export('my_graph.graphml', format='graphml')  # Gephi, yEd
graph.export('my_graph.gexf', format='gexf')        # Gephi native
graph.export('my_graph.json', format='d3')           # D3.js
graph.export('my_graph.csv', format='csv')           # creates _nodes.csv + _edges.csv

graphml_string = graph.export_string(format='graphml')

Subgraph Extraction

subgraph = (
    graph.type_filter('Company')
    .filter({'title': 'Acme Corp'})
    .expand(hops=2)
    .to_subgraph()
)
subgraph.export('acme_network.graphml', format='graphml')

Performance

Expand section

Tips

  1. Batch operations — add nodes/connections in batches, not individually
  2. Specify columns — only include columns you need to reduce memory
  3. Filter by type firsttype_filter() before filter() for narrower scans
  4. Create indexes — on frequently filtered equality conditions (~3x on 100k+ nodes; depends on selectivity)
  5. Use lightweight methodsnode_count(), indices(), get_node_by_id() skip property materialization
  6. Cypher LIMIT — use LIMIT to avoid scanning entire result sets

Lightweight Methods

Method Returns Speed
node_count() Integer count (total graph if no filter, selection count after filter) Fastest
indices() List of node indices Fast
id_values() List of ID values Fast
get_ids() List of {id, title, type} dicts Medium
get_nodes() List of full node dicts Slowest

Lightweight path methods: shortest_path_length(), shortest_path_indices(), shortest_path_ids().

Performance Model

kglite is optimized for knowledge graph workloads — complex multi-step queries on heterogeneous, property-rich graphs. Operations have overhead compared to raw graph algorithms because they build selections, materialize Python dicts, and support the full query API.

Speed claims caveat: The "~3x" index speedup was measured on equality-filtered queries over 100k+ node graphs. Actual improvement depends on graph size, selectivity, and property cardinality. On small graphs (<1k nodes) the overhead of index lookup may not be noticeable. Always benchmark on your own data.

Threading

Designed for single-threaded use. The Rust code does not release the Python GIL during operations. If you share a graph instance across threads, guard access with your own lock.


Graph Maintenance

Expand section

After heavy mutation workloads (DELETE, REMOVE), the internal graph storage accumulates tombstones. Use graph_info() to monitor storage health.

Diagnostics

info = graph.graph_info()
# {'node_count': 950, 'node_capacity': 1000, 'node_tombstones': 50,
#  'edge_count': 2800, 'fragmentation_ratio': 0.05,
#  'type_count': 3, 'property_index_count': 2, 'composite_index_count': 0}

Vacuum — Compact Storage

if info['fragmentation_ratio'] > 0.3:
    result = graph.vacuum()
    print(f"Reclaimed {result['tombstones_removed']} slots, remapped {result['nodes_remapped']} nodes")

vacuum() rebuilds the graph with contiguous indices and rebuilds all indexes. Resets the current selection — call between query chains.

Reindex — Rebuild Indexes

graph.reindex()

Recovery tool, not routine maintenance. Indexes are maintained automatically by all mutations. Use reindex() only if you suspect corruption (e.g., after a crash during save()).

Recommended Workflow

info = graph.graph_info()
if info['fragmentation_ratio'] > 0.3:
    graph.vacuum()

Operation Reports

Expand section

Operations that modify the graph return detailed reports:

report = graph.add_nodes(data=df, node_type='Product', unique_id_field='product_id')
print(f"Created {report['nodes_created']} nodes in {report['processing_time_ms']}ms")

if report['has_errors']:
    print(f"Errors: {report['errors']}")

Node report fields: operation, timestamp, nodes_created, nodes_updated, nodes_skipped, processing_time_ms, has_errors, errors.

Connection report fields: connections_created, connections_skipped, property_fields_tracked.

graph.get_last_report()
graph.get_operation_index()
graph.get_report_history()

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kglite-0.5.1-cp313-cp313-win_amd64.whl (1.7 MB view details)

Uploaded CPython 3.13Windows x86-64

kglite-0.5.1-cp313-cp313-manylinux_2_35_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.35+ x86-64

kglite-0.5.1-cp313-cp313-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

kglite-0.5.1-cp313-cp313-macosx_10_12_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

kglite-0.5.1-cp312-cp312-win_amd64.whl (1.7 MB view details)

Uploaded CPython 3.12Windows x86-64

kglite-0.5.1-cp312-cp312-manylinux_2_35_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.35+ x86-64

kglite-0.5.1-cp312-cp312-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

kglite-0.5.1-cp312-cp312-macosx_10_12_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

kglite-0.5.1-cp311-cp311-win_amd64.whl (1.7 MB view details)

Uploaded CPython 3.11Windows x86-64

kglite-0.5.1-cp311-cp311-manylinux_2_35_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.35+ x86-64

kglite-0.5.1-cp311-cp311-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

kglite-0.5.1-cp311-cp311-macosx_10_12_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

kglite-0.5.1-cp310-cp310-win_amd64.whl (1.7 MB view details)

Uploaded CPython 3.10Windows x86-64

kglite-0.5.1-cp310-cp310-manylinux_2_35_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.35+ x86-64

kglite-0.5.1-cp310-cp310-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

kglite-0.5.1-cp310-cp310-macosx_10_12_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file kglite-0.5.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: kglite-0.5.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kglite-0.5.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 d7e5a91103e46f9a9a16dea0577dc04e8121de33f47100699b11423f58c51dbf
MD5 7d2c2a9f753d08bdbddb639b6d309641
BLAKE2b-256 5319650e96d5132c176db86eae5918a600b377a85c48f00479096af78e1bcc57

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp313-cp313-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp313-cp313-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 d0b7c577c208597acdd0cd42d2b8e222e2cb4b2cf1c93919c453026eb30e7487
MD5 d5b6acc926f2b41d304cbac7b821eb35
BLAKE2b-256 d594476840464077e5b0978f8c7d85d96e72b4fbbfe9df0b5ca386035882955d

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 15bde68a84b90132647e3c05d75ea22525255ee101089a13ea2bda2578d2d16e
MD5 e106f02b78c2fff6dd58f50785c3d711
BLAKE2b-256 881cd59176dafc5dfb77e465cfae5498d3b6d80fdbc530033f598ba212ab10da

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 26f96cb6f1db997f18ecbe73fbdcb7af2f28b592f35621855ee3f7179a39d51b
MD5 c68349f55714d77ee9838ed0975c6881
BLAKE2b-256 496c80efbc36ead905201ae4e64ee33192e062edde05aacf875dd7702dea07d4

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: kglite-0.5.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kglite-0.5.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 1a42f8e2832a6be85aa1c04f01c00a4b75b54952ceef4efb83bae6c57ba76af4
MD5 3b80ef10259e7d0c8114a05631637e24
BLAKE2b-256 e5c2a888fe7fdfbb585512f5780b8cbace70602daa79fdf30bbfcf852b6c2228

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 6018870028c8e4e3bf2a987c82985dca7e37751da0bc9e932352f3d0bd6107d0
MD5 670378b435e225402093ef7955c9ddfa
BLAKE2b-256 ea7e54667f936940794c0d9caa435957f7c1734e77855d1d17fc49a62ba5c371

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 25cb3f6c18e26b4dcb1b89ccdaec63e46307d54b8f4da01d38439c35cdf45638
MD5 78293ad0f5be6136b956a85423088446
BLAKE2b-256 e9213d57eb30088dcff96a02cbaaa51972d97e2ea16c3e565d784fa2b4b8892d

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 78369c368dcd5b4f298ff209b46d5bf25694188e24771c8bc5737737e8c4fa4c
MD5 445973d13689a70bc9dd92979072e44e
BLAKE2b-256 14d9a2c3482d56cf04eccdb54f1ff8c27c6a612d24828b86c73d00b4e5cdaf53

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: kglite-0.5.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kglite-0.5.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a1e62af3c7a5f125164e77814228758775ae4822e53adb880c3d1216e7ead470
MD5 0309474ecadbc5c2436f78823b6a8d8e
BLAKE2b-256 366d8149ac9737b29f0593f7ada46af6a9846c926027dc2b620536a1b2bbd6a9

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp311-cp311-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp311-cp311-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 5895691eae4be8793f9ac28528a0cc19537e9b63e2ccd2887088d3293f5fa070
MD5 e7736d8f429035c5f17d937a9914697e
BLAKE2b-256 1fa36b15285a5e2c226c6c60d1451cacdfc715821893c03744552da89288c9b1

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1650913340e3dae82b131f4fd5200b4e0039de34754c7a2771595fb35e771ab6
MD5 8e55784c25d8d53c47be365a36d766b0
BLAKE2b-256 ad7dee9ab667a8f311bfdd550670346bbe0fc8ec76fc5ccb08234d6602b53e07

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 eb5bcbff672e201a2ea21aaea81e72f2e4942f4ca19f7ae5a3a1918693086da6
MD5 445acf62eed8bb4c4d53dd71171d1ab6
BLAKE2b-256 8e95264c70094db5513e24839841daf19113ad2748ff6b9af624b84a0c34b1d0

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: kglite-0.5.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kglite-0.5.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8d08be01f32004812cdbfa80c2b2275902e7f3a21b81a3a8224269ae6d131e1c
MD5 8b2ff6d68630e0c94fc12286df981191
BLAKE2b-256 82296b2a663a30f3e61b6f04e3aded947acf55ab64366ea7b886944e949688c8

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp310-cp310-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 85701e5e5b986b6032d1b9e49c2f7e70c3bd22d312ce27081847589a42c69429
MD5 73377c0f6cd0001ff341d3d1259a2902
BLAKE2b-256 298afe4204b3f83b8d08bbc3e75881512c41a4866f8430c9f6cbbe399d62525b

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0531bda3ed132d3dcc58ac4acec343735337a9af049e1821368e92adb93735ad
MD5 b7525782be45d324c66ebffa0b537d8d
BLAKE2b-256 141272f0f2a8eaca47a4f14fee72ed8c2d94525f99439bbb81f2cd5edb32ad66

See more details on using hashes here.

File details

Details for the file kglite-0.5.1-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kglite-0.5.1-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 50ef2949cb5b57cb3e095f34c9f1c73d7c13e7f55f2786c21f387d66b077fc4b
MD5 74209980951e468173e25f6e20ade24a
BLAKE2b-256 d6e567ff5caaf6b83cc21b159ae94afa31f0a73e2d33bcbe67ed2c005c0aa477

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page