Create and transform graphs
Project description
linked
Create and transform graphs
To install: pip install linked
mini_dot_to_graph_jdict
Make graphs from the (mini) dot language.
>>> from linked import mini_dot_to_graph_jdict
>>> mini_dot_to_graph_jdict('''
... 1 -> 2
... 2, 3 -> 5, 6, 7
... ''')
{'nodes': [{'id': '1'},
{'id': '2'},
{'id': '5'},
{'id': '6'},
{'id': '7'},
{'id': '3'}],
'links': [{'source': '1', 'target': '2'},
{'source': '2', 'target': '5'},
{'source': '2', 'target': '6'},
{'source': '2', 'target': '7'},
{'source': '3', 'target': '5'},
{'source': '3', 'target': '6'},
{'source': '3', 'target': '7'}]}
linked.datasrc.vectors
Graph construction from high-dimensional vector data for network analysis and visualization.
This module provides tools to convert point/vector data into graph data (edge lists) suitable for network analysis, visualization, and dimensionality reduction. It focuses on sparse graph construction from large datasets (50K-500K points) with high dimensionality (up to 3000+ features).
Quick Start
import numpy as np
from linked import knn_graph, mutual_knn_graph, adaptive_knn_graph
# Generate sample data
vectors = np.random.rand(1000, 128) # 1000 points, 128 dimensions
# Build a k-NN graph
graph = knn_graph(vectors, n_neighbors=15)
print(graph.shape) # (15000, 3) - source, target, weight
# Build a mutual k-NN graph (more robust for high dimensions)
graph = mutual_knn_graph(vectors, n_neighbors=15, ensure_connectivity="mst")
# Build an adaptive k-NN graph (UMAP-style)
graph = adaptive_knn_graph(vectors, n_neighbors=15, local_connectivity=1)
Graph Construction Methods
1. k-Nearest Neighbors Graph
Connects each point to its k nearest neighbors.
from linked import knn_graph
graph = knn_graph(
vectors,
n_neighbors=15, # Number of neighbors per point
metric="euclidean", # Distance metric
mode="distance", # "distance" or "connectivity"
include_self=False, # Include self-loops
approximate=True, # Use approximate nearest neighbors
n_jobs=-1 # Parallel processing
)
Output Format: Array of shape (n_edges, 3) with columns [source, target, weight] or (n_edges, 2) for connectivity mode.
2. Mutual k-Nearest Neighbors Graph
More robust for high-dimensional data. An edge exists only if both points are in each other's k-NN.
from linked import mutual_knn_graph
graph = mutual_knn_graph(
vectors,
n_neighbors=15,
metric="euclidean",
mode="distance",
approximate=True,
ensure_connectivity="mst", # "none", "mst", or "nn"
n_jobs=-1
)
Connectivity Options:
"none": May result in disconnected components"mst": Add minimum spanning tree edges to ensure connectivity"nn": Connect isolated vertices to their nearest neighbor
3. Epsilon-Neighborhood Graph
Connects all pairs of points within a radius.
from linked import epsilon_graph
graph = epsilon_graph(
vectors,
radius=0.5, # Maximum distance for connections
metric="euclidean",
mode="distance"
)
4. Adaptive k-NN Graph
Uses local density scaling (UMAP-style) to adaptively weight edges based on local structure.
from linked import adaptive_knn_graph
graph = adaptive_knn_graph(
vectors,
n_neighbors=15,
metric="euclidean",
local_connectivity=1, # Min strong connections per point
bandwidth=1.0, # Gaussian kernel bandwidth
approximate=True
)
5. Random Graph Generation
Generate synthetic graphs for testing and benchmarking.
from linked import random_graph
# Erdos-Renyi: random edges with probability p
graph = random_graph(n_nodes=100, model="erdos_renyi", p=0.1, seed=42)
# Barabasi-Albert: preferential attachment
graph = random_graph(n_nodes=100, model="barabasi_albert", m=3, seed=42)
# Watts-Strogatz: small-world network
graph = random_graph(n_nodes=100, model="watts_strogatz", k=6, p=0.1, seed=42)
# Add random weights
graph = random_graph(n_nodes=100, model="erdos_renyi", p=0.1, weighted=True)
Sklearn-Style Estimators
For integration with sklearn pipelines:
from linked import KNNGraphEstimator, MutualKNNGraphEstimator
# Create estimator
estimator = KNNGraphEstimator(n_neighbors=15, approximate=True)
# Fit and transform
graph = estimator.fit_transform(vectors)
Available estimators:
KNNGraphEstimatorMutualKNNGraphEstimatorAdaptiveKNNGraphEstimator
Distance Metrics
All functions support various distance metrics via sklearn:
"euclidean"(default)"manhattan""cosine""minkowski""chebyshev"- And many more from
scipy.spatial.distance
Performance Considerations
Approximate vs. Exact
For large datasets, use approximate=True to enable approximate nearest neighbors via pynndescent:
# Fast approximate search
graph = knn_graph(vectors, n_neighbors=15, approximate=True)
# Slower exact search
graph = knn_graph(vectors, n_neighbors=15, approximate=False)
Approximate methods are typically 10-100x faster with 90-99% accuracy.
Parallel Processing
Use n_jobs=-1 to utilize all CPU cores:
graph = knn_graph(vectors, n_neighbors=15, n_jobs=-1)
Memory Management
For very large datasets (>500K points), consider:
- Using sparse output formats
- Processing in batches
- Using approximate methods
- Reducing
n_neighbors
Output Format
All graph construction functions return numpy arrays:
With weights (mode="distance"):
array([[0, 5, 0.342], # source=0, target=5, weight=0.342
[0, 8, 0.521], # source=0, target=8, weight=0.521
[1, 3, 0.234], # source=1, target=3, weight=0.234
...])
Without weights (mode="connectivity"):
array([[0, 5], # source=0, target=5
[0, 8], # source=0, target=8
[1, 3], # source=1, target=3
...])
Integration with Visualization
Example: Use with force-directed layout:
import numpy as np
from linked import adaptive_knn_graph
import networkx as nx
import matplotlib.pyplot as plt
# Build graph
vectors = np.random.rand(100, 50)
edges = adaptive_knn_graph(vectors, n_neighbors=10)
# Convert to NetworkX
G = nx.Graph()
for source, target, weight in edges:
G.add_edge(int(source), int(target), weight=weight)
# Layout and visualize
pos = nx.spring_layout(G)
nx.draw(G, pos, node_size=50, width=0.5)
plt.show()
Best Practices
-
High-dimensional data: Use
mutual_knn_graphoradaptive_knn_graphfor better structure preservation -
Sparse graphs: Start with smaller
n_neighbors(5-15) and increase if needed -
Connectivity: Use
ensure_connectivity="mst"for visualization tasks -
Large datasets: Enable
approximate=Trueand set appropriaten_jobs -
Distance metrics: Choose based on your data:
- Euclidean: General-purpose, works well for normalized data
- Cosine: Good for high-dimensional sparse data (e.g., text embeddings)
- Manhattan: Robust to outliers
References
The algorithms implemented are based on recent research:
- McInnes et al. (2018): UMAP - Uniform Manifold Approximation and Projection
- Tang et al. (2016): LargeVis - Visualizing Large-scale High-dimensional Data
- Jarvis & Patrick (1973): Clustering Using a Similarity Measure Based on Shared Near Neighbors
- Belkin & Niyogi (2003): Laplacian Eigenmaps
For more details, see the included research document.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file linked-0.0.6.tar.gz.
File metadata
- Download URL: linked-0.0.6.tar.gz
- Upload date:
- Size: 18.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b940b35f6cae633930d4916794631bf0b647cae6e906829997ef8eac702709ff
|
|
| MD5 |
deed9fd0b01f8d6d246b72be9e9699b3
|
|
| BLAKE2b-256 |
f636b0f26b7397056633fbacb65e4d48994d4ebc31e228cc28f6e8966705c630
|
File details
Details for the file linked-0.0.6-py3-none-any.whl.
File metadata
- Download URL: linked-0.0.6-py3-none-any.whl
- Upload date:
- Size: 17.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb53fefe015aedb2e901c54934c7a363bc32840ed13961e07813d940c08ebc11
|
|
| MD5 |
f81f485f08a4b4acfd1c24d9b95201bf
|
|
| BLAKE2b-256 |
50dfe092024137c4871008494a244da1bff9e53f37f82abca0ee8a7932f2331a
|