Semantic Networks from Embeddings
Project description
Semnet: efficient graph structures from embeddings
Embeddings of Guardian headlines represented as a network by Semnet and visualised in Cosmograph
Introduction
Semnet constructs graph structures from embeddings, enabling graph-based analysis and operations over collections of embedded documents.
Semnet uses Annoy to perform efficient pair-wise distance calculations, allowing for million-embedding network construction in under ten minutes on consumer hardware.
Graphs are returned as NetworkX objects, opening up a wide range of algorithms for downstream use.
The name "Semnet" derives from semantic network[^1], as it was initially designed for an NLP use-case, but the tool will work well with any form of embedded document (e.g., images, audio, even or graphs).
[^1]: Technically-speaking a Semantic Similarity Network (SSN)
Semnet may be used for:
- Graph algorithms: enrich your data with communities, centrality and much more for down-stream use in search, RAG and context engineering
- Deduplication: remove duplicate records (e.g., "Donald Trump", "Donald J. Trump) from datasets
- Exploratory data analysis and visualisation, Cosmograph works brilliantly for large corpora
Exposing the full NetworkX and Annoy APIs, Semnet offers plenty of opportunity for experimentation depending on your use-case.
Check out the launch blog for more about Semnet and the examples for inspiration.
Installation
pip install semnet
Quick Start
from semnet import SemanticNetwork
from sentence_transformers import SentenceTransformer
# Your documents
docs = [
"The cat sat on the mat",
"A cat was sitting on a mat",
"The dog ran in the park",
"I love Python",
"Python is a great programming language",
]
# Generate embeddings (use any embedding provider)
embedding_model = SentenceTransformer("BAAI/bge-base-en-v1.5")
embeddings = embedding_model.encode(docs)
# Create and configure semantic network
sem = SemanticNetwork(thresh=0.3, verbose=True) # Larger values give sparser networks
# Build a NetworkX graph object from your embeddings
G = sem.fit_transform(embeddings, labels=docs)
# Export to pandas using the standalone function
from semnet import to_pandas
nodes, edges = to_pandas(G)
Requirements
- Python 3.8+
- networkx
- annoy
- numpy
- pandas
- tqdm
Recommended for examples:
- sentence-transformers
- cosmograph
Project origin
I love network analysis, and have explored embedding-derived semantic networks in the past as an alternative approach to representing, clustering and querying news data.
Semnet started life as a few functions I'd been using for deduplication and disambiguation of structured output from LLMs. I could see a number of potential uses for my code, so I decided to package it up for others to use.
Statement on the use of AI
I kicked off the project by hand-refactoring my initial code into the class-based structure that forms the core functionality of the current module.
I then used Github Copilot in VSCode to:
- Bootstrap scaffolding, tests, documentation, examples and typing
- Refactor the core methods in the style of the scikit-learn API
- Add additional functionality, e.g., the ability to pass custom data to nodes
- Walk me through deployment to readthedocs and pypi
Roadmap
Semnet is a relatively simple project focused on core graph construction functionality. I don't have much in the way of immediate plans to expand it, however can see the potential for a few future additions:
- Performance optimizations for very large datasets
- Utilities for deduplication, as that's my main use case
- Integration with graph visualization tools
License
MIT License
Citation
If you use Semnet in academic work, please cite:
@software{semnet,
title={Semnet: Semantic Networks from Embeddings},
author={Ian Goodrich},
year={2025},
url={https://github.com/specialprocedures/semnet}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semnet-0.1.9.tar.gz.
File metadata
- Download URL: semnet-0.1.9.tar.gz
- Upload date:
- Size: 3.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9dbb0bd54ac2c436875cf8d5c4bea3ce02db68ab1083bab663693ffcc4df5fb3
|
|
| MD5 |
af2687dcc505951ed3ab52fed43c3002
|
|
| BLAKE2b-256 |
fe07ecf4c913382b999c8b2ab7142c46e18cddddbe26d625bf8b8e9929defe6e
|
File details
Details for the file semnet-0.1.9-py3-none-any.whl.
File metadata
- Download URL: semnet-0.1.9-py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5a5b13081f9b5e44f07ad2e2f14882831d1717c45921a4aca383286a635987c
|
|
| MD5 |
46bcd7ce297f58c17fe23f4e8f260912
|
|
| BLAKE2b-256 |
e9929adfa2433bcfda3b8fdada1612d17ff1a2f95cf413a5512888c764849ea0
|