Skip to main content

Create and transform graphs

Project description

linked

Create and transform graphs

To install: pip install linked

mini_dot_to_graph_jdict

Make graphs from the (mini) dot language.

>>> from linked import mini_dot_to_graph_jdict
>>> mini_dot_to_graph_jdict('''
... 1 -> 2
... 2, 3 -> 5, 6, 7
... ''')
{'nodes': [{'id': '1'},
{'id': '2'},
{'id': '5'},
{'id': '6'},
{'id': '7'},
{'id': '3'}],
'links': [{'source': '1', 'target': '2'},
{'source': '2', 'target': '5'},
{'source': '2', 'target': '6'},
{'source': '2', 'target': '7'},
{'source': '3', 'target': '5'},
{'source': '3', 'target': '6'},
{'source': '3', 'target': '7'}]}

linked.datasrc.vectors

Graph construction from high-dimensional vector data for network analysis and visualization.

This module provides tools to convert point/vector data into graph data (edge lists) suitable for network analysis, visualization, and dimensionality reduction. It focuses on sparse graph construction from large datasets (50K-500K points) with high dimensionality (up to 3000+ features).

Quick Start

import numpy as np
from linked import knn_graph, mutual_knn_graph, adaptive_knn_graph

# Generate sample data
vectors = np.random.rand(1000, 128)  # 1000 points, 128 dimensions

# Build a k-NN graph
graph = knn_graph(vectors, n_neighbors=15)
print(graph.shape)  # (15000, 3) - source, target, weight

# Build a mutual k-NN graph (more robust for high dimensions)
graph = mutual_knn_graph(vectors, n_neighbors=15, ensure_connectivity="mst")

# Build an adaptive k-NN graph (UMAP-style)
graph = adaptive_knn_graph(vectors, n_neighbors=15, local_connectivity=1)

Graph Construction Methods

1. k-Nearest Neighbors Graph

Connects each point to its k nearest neighbors.

from linked import knn_graph

graph = knn_graph(
    vectors,
    n_neighbors=15,           # Number of neighbors per point
    metric="euclidean",       # Distance metric
    mode="distance",          # "distance" or "connectivity"
    include_self=False,       # Include self-loops
    approximate=True,         # Use approximate nearest neighbors
    n_jobs=-1                 # Parallel processing
)

Output Format: Array of shape (n_edges, 3) with columns [source, target, weight] or (n_edges, 2) for connectivity mode.

2. Mutual k-Nearest Neighbors Graph

More robust for high-dimensional data. An edge exists only if both points are in each other's k-NN.

from linked import mutual_knn_graph

graph = mutual_knn_graph(
    vectors,
    n_neighbors=15,
    metric="euclidean",
    mode="distance",
    approximate=True,
    ensure_connectivity="mst",  # "none", "mst", or "nn"
    n_jobs=-1
)

Connectivity Options:

  • "none": May result in disconnected components
  • "mst": Add minimum spanning tree edges to ensure connectivity
  • "nn": Connect isolated vertices to their nearest neighbor

3. Epsilon-Neighborhood Graph

Connects all pairs of points within a radius.

from linked import epsilon_graph

graph = epsilon_graph(
    vectors,
    radius=0.5,              # Maximum distance for connections
    metric="euclidean",
    mode="distance"
)

4. Adaptive k-NN Graph

Uses local density scaling (UMAP-style) to adaptively weight edges based on local structure.

from linked import adaptive_knn_graph

graph = adaptive_knn_graph(
    vectors,
    n_neighbors=15,
    metric="euclidean",
    local_connectivity=1,    # Min strong connections per point
    bandwidth=1.0,           # Gaussian kernel bandwidth
    approximate=True
)

5. Random Graph Generation

Generate synthetic graphs for testing and benchmarking.

from linked import random_graph

# Erdos-Renyi: random edges with probability p
graph = random_graph(n_nodes=100, model="erdos_renyi", p=0.1, seed=42)

# Barabasi-Albert: preferential attachment
graph = random_graph(n_nodes=100, model="barabasi_albert", m=3, seed=42)

# Watts-Strogatz: small-world network
graph = random_graph(n_nodes=100, model="watts_strogatz", k=6, p=0.1, seed=42)

# Add random weights
graph = random_graph(n_nodes=100, model="erdos_renyi", p=0.1, weighted=True)

Sklearn-Style Estimators

For integration with sklearn pipelines:

from linked import KNNGraphEstimator, MutualKNNGraphEstimator

# Create estimator
estimator = KNNGraphEstimator(n_neighbors=15, approximate=True)

# Fit and transform
graph = estimator.fit_transform(vectors)

Available estimators:

  • KNNGraphEstimator
  • MutualKNNGraphEstimator
  • AdaptiveKNNGraphEstimator

Distance Metrics

All functions support various distance metrics via sklearn:

  • "euclidean" (default)
  • "manhattan"
  • "cosine"
  • "minkowski"
  • "chebyshev"
  • And many more from scipy.spatial.distance

Performance Considerations

Approximate vs. Exact

For large datasets, use approximate=True to enable approximate nearest neighbors via pynndescent:

# Fast approximate search
graph = knn_graph(vectors, n_neighbors=15, approximate=True)

# Slower exact search
graph = knn_graph(vectors, n_neighbors=15, approximate=False)

Approximate methods are typically 10-100x faster with 90-99% accuracy.

Parallel Processing

Use n_jobs=-1 to utilize all CPU cores:

graph = knn_graph(vectors, n_neighbors=15, n_jobs=-1)

Memory Management

For very large datasets (>500K points), consider:

  1. Using sparse output formats
  2. Processing in batches
  3. Using approximate methods
  4. Reducing n_neighbors

Output Format

All graph construction functions return numpy arrays:

With weights (mode="distance"):

array([[0, 5, 0.342],    # source=0, target=5, weight=0.342
       [0, 8, 0.521],    # source=0, target=8, weight=0.521
       [1, 3, 0.234],    # source=1, target=3, weight=0.234
       ...])

Without weights (mode="connectivity"):

array([[0, 5],    # source=0, target=5
       [0, 8],    # source=0, target=8
       [1, 3],    # source=1, target=3
       ...])

Integration with Visualization

Example: Use with force-directed layout:

import numpy as np
from linked import adaptive_knn_graph
import networkx as nx
import matplotlib.pyplot as plt

# Build graph
vectors = np.random.rand(100, 50)
edges = adaptive_knn_graph(vectors, n_neighbors=10)

# Convert to NetworkX
G = nx.Graph()
for source, target, weight in edges:
    G.add_edge(int(source), int(target), weight=weight)

# Layout and visualize
pos = nx.spring_layout(G)
nx.draw(G, pos, node_size=50, width=0.5)
plt.show()

Best Practices

  1. High-dimensional data: Use mutual_knn_graph or adaptive_knn_graph for better structure preservation

  2. Sparse graphs: Start with smaller n_neighbors (5-15) and increase if needed

  3. Connectivity: Use ensure_connectivity="mst" for visualization tasks

  4. Large datasets: Enable approximate=True and set appropriate n_jobs

  5. Distance metrics: Choose based on your data:

    • Euclidean: General-purpose, works well for normalized data
    • Cosine: Good for high-dimensional sparse data (e.g., text embeddings)
    • Manhattan: Robust to outliers

References

The algorithms implemented are based on recent research:

  • McInnes et al. (2018): UMAP - Uniform Manifold Approximation and Projection
  • Tang et al. (2016): LargeVis - Visualizing Large-scale High-dimensional Data
  • Jarvis & Patrick (1973): Clustering Using a Similarity Measure Based on Shared Near Neighbors
  • Belkin & Niyogi (2003): Laplacian Eigenmaps

For more details, see the included research document.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linked-0.0.6.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linked-0.0.6-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file linked-0.0.6.tar.gz.

File metadata

  • Download URL: linked-0.0.6.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for linked-0.0.6.tar.gz
Algorithm Hash digest
SHA256 b940b35f6cae633930d4916794631bf0b647cae6e906829997ef8eac702709ff
MD5 deed9fd0b01f8d6d246b72be9e9699b3
BLAKE2b-256 f636b0f26b7397056633fbacb65e4d48994d4ebc31e228cc28f6e8966705c630

See more details on using hashes here.

File details

Details for the file linked-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: linked-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for linked-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 bb53fefe015aedb2e901c54934c7a363bc32840ed13961e07813d940c08ebc11
MD5 f81f485f08a4b4acfd1c24d9b95201bf
BLAKE2b-256 50dfe092024137c4871008494a244da1bff9e53f37f82abca0ee8a7932f2331a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page