Skip to main content

A graph analytics engine built directly on Apache Arrow

Project description

Lynxes

A Fast, Zero-Copy Graph Analytics Engine Built Natively on Apache Arrow.

PyPI version Python versions alpha rust-engine

Why Lynxes | Quickstart | API Overview | Architecture

Lynxes is a blazingly fast, lazy-evaluated graph analytics engine. Unlike traditional Python libraries that wrap generic structures, Lynxes builds a graph-native engine directly over Arrow, completely bypassing the overhead of NetworkX or igraph.

Why Lynxes

  • Zero-Copy Arrow BackingNodeFrame and EdgeFrame directly own Apache Arrow RecordBatch. No intermediate copies, no Pandas/Polars dependency.
  • Graph Structure as a First-Class CitizenEdgeFrame always maintains a Compressed Sparse Row (CSR) index. Neighbor lookups are O(degree) from day one — no full table scans.
  • Lazy by Default — No computation happens until you call .collect(). The built-in optimizer runs Predicate Pushdown, Projection Pushdown, Traversal Pruning, and Subgraph Caching before execution.
  • Language-Agnostic Core — The query engine, storage engine, and graph algorithms are written entirely in Rust. Python is a thin zero-overhead PyO3 wrapper.

Quickstart

Install

pip install lynxes
# or
uv add lynxes

Build from source

git clone https://github.com/your-org/lynxes
cd lynxes/py-lynxes
uv run maturin develop --release

Python API

import lynxes as lx

# Load from .gf text, .gfb binary, or Parquet
g = lx.read_gf("graph.gf")
# g = lx.read_parquet_graph("nodes.parquet", "edges.parquet")
# g = lx.read_gfb("graph.gfb")

# Build a lazy plan — nothing executes yet
result = (
    g.lazy()
    .filter_nodes(lx.col("age") > 25)
    .expand("KNOWS", hops=2, direction="out")
    .aggregate_neighbors("KNOWS", lx.count().alias("friend_count"))
    .sort("friend_count", descending=True)
    .limit(10)
    .collect()
)

print(result)

Pattern Matching

Cypher-like pattern matching over the lazy execution engine:

result = (
    g.lazy()
    .match_pattern(
        [
            lx.node("person", "Person"),
            lx.edge("WORKS_AT"),
            lx.node("company", "Company"),
        ],
        where_=lx.col("person.age") > 25,
    )
    .collect()
)

Graph Algorithms

# PageRank
pr = g.pagerank()                          # → NodeFrame with 'pagerank' column

# Shortest path
path = g.shortest_path("alice", "charlie") # → ["alice", "bob", "charlie"]

# Connected components
cc = g.connected_components()              # → NodeFrame with 'component_id' column

# Betweenness centrality
bc = g.betweenness_centrality()

# Community detection (Louvain / Label Propagation)
cm = g.community_detection()

Remote Connectors

# Neo4j (Cypher)
g = lx.read_neo4j("bolt://localhost:7687", "neo4j", "password")

# ArangoDB (AQL)
g = lx.read_arangodb(
    endpoint="http://localhost:8529",
    database="mydb",
    graph="social",
    vertex_collection="persons",
    edge_collection="knows",
)

# SPARQL endpoint
g = lx.read_sparql(
    endpoint="https://dbpedia.org/sparql",
    node_template="SELECT ?id WHERE { ?id a <Thing> }",
    edge_template="SELECT ?s ?o WHERE { ?s ?p ?o }",
)

Distributed Graph Partitioning

# Partition a large graph across N shards
pg = g.partition(4, strategy="hash")   # or "range" / "label"
print(pg.n_shards)                     # 4
print(pg.stats())                      # imbalance ratio, boundary edges, …

# BFS across shard boundaries
nodes, edges = pg.distributed_expand(["alice"], hops=2, direction="out")

# Merge shards back into one GraphFrame
merged = pg.merge()

CLI

# Inspect a .gfb file
lynxes inspect graph.gfb

# Convert formats
lynxes convert graph.gf graph.gfb

# Run a filter query
lynxes query graph.gfb --filter "age > 25" --limit 10

API Overview

Top-level functions

Function Description
lx.read_gf(path) Load a .gf text graph
lx.read_gfb(path) Load a .gfb binary graph
lx.read_parquet_graph(nodes, edges) Load from Parquet files
lx.read_neo4j(uri, user, password) Connect to Neo4j
lx.read_arangodb(...) Connect to ArangoDB
lx.read_sparql(endpoint, ...) Connect to SPARQL endpoint
lx.col(name) Create a column expression
lx.count() / lx.sum(e) / lx.mean(e) Aggregation expressions
lx.node(alias, label?) Pattern node descriptor
lx.edge(type?) Pattern edge descriptor
lx.partition_graph(g, n) Partition a GraphFrame

GraphFrame methods

Method Returns
.lazy() LazyGraphFrame
.nodes() / .edges() NodeFrame / EdgeFrame
.node_count() / .edge_count() int
.subgraph(ids) / .subgraph_by_label(l) GraphFrame
.pagerank(...) NodeFrame
.shortest_path(src, dst) list[str]
.connected_components() NodeFrame
.betweenness_centrality() NodeFrame
.community_detection() NodeFrame
.partition(n, strategy) PartitionedGraph
.write_gf(path) / .write_gfb(path)
.write_parquet_graph(nodes, edges)

LazyGraphFrame methods

Method Description
.filter_nodes(expr) Keep nodes matching expression
.filter_edges(expr) Keep edges matching expression
.select_nodes(cols) / .select_edges(cols) Project columns
.expand(type?, hops, direction) BFS graph traversal
.aggregate_neighbors(type, agg) Aggregate over neighbor edges
.match_pattern(steps, where_?) Cypher-like pattern matching
.sort(by, descending) Sort result
.limit(n) Cap result size
.explain() Print logical plan
.collect() Execute → GraphFrame
.collect_nodes() Execute → NodeFrame
.collect_edges() Execute → EdgeFrame

Architecture

Lynxes is organized as a multi-crate Rust workspace with a thin Python layer on top:

py-lynxes/                ← Python package (maturin / PyO3)
  src/lynxes/             ← lynxes Python namespace
  tests/unit/             ← pytest integration tests
  tests/benchmark/        ← NetworkX / igraph comparisons

crates/
  lynxes/                 ← Umbrella re-export crate
  lynxes-core/            ← Arrow frames, CSR index, algorithms,
  │                           expression types, logical plan, optimizer
  lynxes-plan/            ← Logical plan re-exports (thin)
  lynxes-io/              ← File I/O (.gf parser, .gfb binary, Parquet)
  lynxes-connect/         ← Remote connectors (Neo4j, ArangoDB,
  │                           SPARQL, Arrow Flight, GFConnector)
  lynxes-lazy/            ← LazyGraphFrame + query executor
  lynxes-python/          ← PyO3 binding crate (_lynxes.so)
  lynxes-cli/             ← `lynxes` command-line tool

Execution Pipeline

Python call
    │
    ▼
LazyGraphFrame (plan tree)
    │
    ▼
Optimizer ──── PredicatePushdown
            ── ProjectionPushdown
            ── TraversalPruning
            ── SubgraphCaching
            ── EarlyTermination
    │
    ▼
Executor ─────────────────────────────────────┐
    │                                         │
    ▼                                         ▼
NodeFrame / EdgeFrame                  CSR Index (O(degree))
(Arrow RecordBatch)                    BFS / Traversal / Algorithms

Crate Dependency Graph

lynxes-python ──┐
lynxes-cli    ──┤
                ├──► lynxes-lazy ──► lynxes-connect ──┐
                │                                      ├──► lynxes-io ──┐
                │                                      └──► lynxes-plan ─┤
                │                                                        ├──► lynxes-core
                └───────────────────────────────────────────────────────►┘

Documentation Map

  • DESIGN.md — In-depth architectural design and engine principles
  • docs/spec/ — Feature and restructure specifications
  • py-lynxes/tests/benchmark/ — Performance benchmarks vs NetworkX / igraph

Contributing

Please read DESIGN.md first. Core principles that are non-negotiable:

  1. Never wrap PolarsNodeFrame/EdgeFrame own Arrow RecordBatch directly
  2. CSR is mandatoryEdgeFrame always holds a CSR index; no linear scan fallbacks
  3. Lazy by default — All operations build a LogicalPlan; execution only on .collect()
  4. No optimization without measurement — Run cargo bench before claiming speedups

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lynxes-1.3.3.tar.gz (240.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lynxes-1.3.3-cp310-abi3-win_amd64.whl (5.1 MB view details)

Uploaded CPython 3.10+Windows x86-64

lynxes-1.3.3-cp310-abi3-manylinux_2_28_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

lynxes-1.3.3-cp310-abi3-manylinux_2_28_aarch64.whl (5.5 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

lynxes-1.3.3-cp310-abi3-macosx_11_0_arm64.whl (5.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file lynxes-1.3.3.tar.gz.

File metadata

  • Download URL: lynxes-1.3.3.tar.gz
  • Upload date:
  • Size: 240.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lynxes-1.3.3.tar.gz
Algorithm Hash digest
SHA256 42c5d4f9eadd78364a37d8d66958cd11cf2c982d21c908a19c4f1195827fb406
MD5 f8dd7b0b886447eae5290fc16fc53cf2
BLAKE2b-256 fc65ef701b75d57e031867c6e601c32974035eb6bf5fb5aa6a19d12c25c5c5c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for lynxes-1.3.3.tar.gz:

Publisher: release.yml on eastlighting1/Lynxes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lynxes-1.3.3-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: lynxes-1.3.3-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 5.1 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lynxes-1.3.3-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 50a700b10dda0c9a5d90f36f72f4595d4b6d21abdf7d5661d21daab7476c262e
MD5 5c8148b9886b381e6b67b3d513d3aa88
BLAKE2b-256 1a3e6674fc9f22d7008d48cd3f22a162ad1e112ab0d92cff705f17df4df2213e

See more details on using hashes here.

Provenance

The following attestation bundles were made for lynxes-1.3.3-cp310-abi3-win_amd64.whl:

Publisher: release.yml on eastlighting1/Lynxes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lynxes-1.3.3-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lynxes-1.3.3-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0f4c724bf8e1070e957f805f07ca9c40b2f254feb5c1df8d5d894fed50ceb33b
MD5 ca3b549b1da6be39b0f9af3fc4c57af7
BLAKE2b-256 67af087aa1c43dca9547652d1e7fd0dae96641a03eb3116ef8feddf38fce65f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for lynxes-1.3.3-cp310-abi3-manylinux_2_28_x86_64.whl:

Publisher: release.yml on eastlighting1/Lynxes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lynxes-1.3.3-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for lynxes-1.3.3-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f0990bdacbf4aa1096d50cfa77ed81acabce726968879a34eb5d93495a0c3336
MD5 2232f5b9fb04b183017e5b17062502e8
BLAKE2b-256 d16a8da4c9f2536caac67bd112c772a4114bdb2daea5f7016a4dad3f4a374755

See more details on using hashes here.

Provenance

The following attestation bundles were made for lynxes-1.3.3-cp310-abi3-manylinux_2_28_aarch64.whl:

Publisher: release.yml on eastlighting1/Lynxes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lynxes-1.3.3-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lynxes-1.3.3-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3687c6867b829caeecc8fd21fdba2c4b715935ccceab62f6923a4287a76c17f9
MD5 915eb80a604bb121af6dbaedb3b9ddc2
BLAKE2b-256 7749a4629f651f449cb6ff3ba6f44ce663ea8c6cb8a534b1f6deca4ad27a67fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for lynxes-1.3.3-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on eastlighting1/Lynxes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page