Semantic Networks from Embeddings

These details have not been verified by PyPI

Project links

Project description

Semnet: Graph structures from embeddings

Embeddings of Guardian headlines represented as a network by Semnet and visualised in Cosmograph

Semnet constructs graph structures from embeddings, enabling graph-based analysis and operations over embedded documents, images, and more.

Semnet uses Annoy to perform efficient pair-wise distance calculations across all embeddings in the dataset, then constructs NetworkX graphs representing relationships between embeddings.

Use cases

Semnet may be used for:

Deduplication: remove duplicate records (e.g., "Donald Trump", "Donald J. Trump) from datasets
Clustering: find groups of similar documents via community detection algorithms
Recommendation systems: Account for relationships, and take advantage of graph structures such as communities and paths in search and RAG
Knowledge graph construction: Build networks of related concepts or entities, as a regular NetworkX graph it's easy to add additional entities
Exploratory data analysis and visualisation, Cosmograph works brilliantly for large corpora

Exposing the full NetworkX and Annoy APIs, Semnet offers plenty of opportunity for experimentation depending on your use-case. Check out the examples for inspiration.

Quick Start

from semnet import SemanticNetwork
from sentence_transformers import SentenceTransformer
import networkx as nx

# Your documents
docs = [
    "The cat sat on the mat",
    "A cat was sitting on a mat",
    "The dog ran in the park",
    "I love Python",
    "Python is a great programming language",
]

# Generate embeddings (use any embedding provider)
embedding_model = SentenceTransformer("BAAI/bge-base-en-v1.5")
embeddings = embedding_model.encode(docs)

# Create and configure semantic network
sem = SemanticNetwork(thresh=0.3, verbose=True)  # Larger values give sparser networks

# Build the semantic graph from your embeddings
G = sem.fit_transform(embeddings, labels=docs)

# Analyze the graph
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Connected components: {nx.number_connected_components(G)}")

# Find similar document groups
for component in nx.connected_components(G):
    if len(component) > 1:
        similar_docs = [G.nodes[i]["label"] for i in component]
        print(f"Similar documents: {similar_docs}")

# Calculate centrality measures,
# Degree centrality not that interesting in the example, but shown here for demonstration
centrality = nx.degree_centrality(G)
for node, cent_value in centrality.items():
    print(f"Document: {G.nodes[node]['label']}, Degree Centrality: {cent_value:.4f}")
    G.nodes[node]["degree_centrality"] = cent_value

# Export to pandas
nodes_df, edges_df = sem.to_pandas(G)

Installation

pip install semnet

For development:

git clone https://github.com/specialprocedures/semnet.git
cd semnet
pip install -e ".[dev]"

Configuration Options

SemanticNetwork Parameters

metric: Distance metric for Annoy index ('angular', 'euclidean', etc.) (default: 'angular')
n_trees: Number of trees for Annoy index (more = better accuracy, slower) (default: 10)
thresh: Similarity threshold (0.0 to 1.0) (default: 0.3)
top_k: Maximum neighbors to check per document (default: 100)
verbose: Show progress bars and logging (default: False)

Method Parameters

fit(embeddings, labels=None, ids=None, node_data=None):
- embeddings are required pre-computed embeddings array with shape (n_docs, embedding_dim)
- labels are optional text labels/documents for the embeddings
- ids are optional custom IDs for the embeddings
- node_data is optional dictionary containing additional data to attach to nodes
transform(thresh=None, top_k=None): Optional threshold and top_k overrides
fit_transform(embeddings, labels=None, ids=None, node_data=None, thresh=None, top_k=None): Combined fit and transform
to_pandas(graph): Export NetworkX graph to pandas DataFrames

Performance Tips

Use "angular" metric for cosine similarity (default and recommended)
Increase n_trees for better accuracy (try 50-100 for large datasets)
Decrease top_k if you have memory constraints
Use smaller embedding models for speed: "all-MiniLM-L6-v2"
Use larger models for accuracy: "BAAI/bge-large-en-v1.5"

Requirements

Python 3.8+
networkx
annoy
numpy
pandas
tqdm

Project origin and statement on the use of AI

I love network analysis, and have explored embedding-derived semantic networks in the past as an alternative approach to representing, clustering and querying news data.

Whilst using semantic networks for graph analysis on some forthcoming research, I decided to package some of my code for others to use.

I kicked off the project by hand-refactoring my initial code into the class-based structure that forms the core functionality of the current module.

I then used Github Copilot in VSCode to:

Bootstrap scaffolding, tests, documentation, examples and typing
Refactor the core methods in the style of the scikit-learn API
Add additional functionality for convenient analysis of graph structures and to allow the use of custom embeddings.

Roadmap

Semnet is a relatively simple project focused on core graph construction functionality. Potential future additions:

Better examples showcasing network analysis on large corpora
Integration with graph visualization tools
Performance optimizations for very large datasets

License

MIT License

Citation

If you use Semnet in academic work, please cite:

@software{semnet,
  title={Semnet: Semantic Networks from Embeddings},
  author={Ian Goodrich},
  year={2025},
  url={https://github.com/specialprocedures/semnet}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.9

Mar 18, 2026

0.1.8

Mar 17, 2026

0.1.6

Oct 30, 2025

This version

0.1.3

Oct 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semnet-0.1.3.tar.gz (1.5 MB view details)

Uploaded Oct 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

semnet-0.1.3-py3-none-any.whl (9.9 kB view details)

Uploaded Oct 24, 2025 Python 3

File details

Details for the file semnet-0.1.3.tar.gz.

File metadata

Download URL: semnet-0.1.3.tar.gz
Upload date: Oct 24, 2025
Size: 1.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semnet-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`0b7ad6500457a799fc972f9f57a3b7be39bc94d390a3ef9dcfeff1dbcc4be897`
MD5	`ff84f58c0638950fb54190d17fd1b9f4`
BLAKE2b-256	`97fbc14855e488cc0b9f9d8f52d3c1e8bb5b495e6f2f1f5f51aa3ad82e6ca791`

See more details on using hashes here.

File details

Details for the file semnet-0.1.3-py3-none-any.whl.

File metadata

Download URL: semnet-0.1.3-py3-none-any.whl
Upload date: Oct 24, 2025
Size: 9.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semnet-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`80b77cecd320f9654d06db16b74099b2f0396d74695f19633231dd6ed5fa0139`
MD5	`8761608f9dcd4518057b3e0d3871f58f`
BLAKE2b-256	`b4c3a7f443740c00878cc781dff874d5bde6c3bb798aa91adca70ed1a1ef793d`

See more details on using hashes here.

semnet 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Semnet: Graph structures from embeddings

Use cases

Quick Start

Installation

Configuration Options

SemanticNetwork Parameters

Method Parameters

Performance Tips

Requirements

Project origin and statement on the use of AI

Roadmap

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes