Skip to main content

Semantic Networks from Embeddings

Project description

Semnet: efficient graph structures from embeddings

Embeddings of Guardian headlines represented as a network structure by Semnet and visualised by Cosmograph Embeddings of Guardian headlines represented as a network by Semnet and visualised in Cosmograph

Introduction

Semnet constructs graph structures from embeddings, enabling graph-based analysis and operations over collections of embedded documents.

Semnet uses Annoy to perform efficient pair-wise distance calculations, allowing for million-embedding network construction in under ten minutes on consumer hardware.

Graphs are returned as NetworkX objects, opening up a wide range of algorithms for downstream use.

The name "Semnet" derives from semantic network[^1], as it was initially designed for an NLP use-case, but the tool will work well with any form of embedded document (e.g., images, audio, even or graphs).

[^1]: Technically-speaking a Semantic Similarity Network (SSN)

Semnet may be used for:

  • Graph algorithms: enrich your data with communities, centrality and much more for down-stream use in search, RAG and context engineering
  • Deduplication: remove duplicate records (e.g., "Donald Trump", "Donald J. Trump) from datasets
  • Exploratory data analysis and visualisation, Cosmograph works brilliantly for large corpora

Exposing the full NetworkX and Annoy APIs, Semnet offers plenty of opportunity for experimentation depending on your use-case.

Check out the launch blog for more about Semnet and the examples for inspiration.

Installation

pip install semnet

Quick Start

from semnet import SemanticNetwork
from sentence_transformers import SentenceTransformer

# Your documents
docs = [
    "The cat sat on the mat",
    "A cat was sitting on a mat",
    "The dog ran in the park",
    "I love Python",
    "Python is a great programming language",
]

# Generate embeddings (use any embedding provider)
embedding_model = SentenceTransformer("BAAI/bge-base-en-v1.5")
embeddings = embedding_model.encode(docs)

# Create and configure semantic network
sem = SemanticNetwork(thresh=0.3, verbose=True)  # Larger values give sparser networks

# Build a NetworkX graph object from your embeddings
G = sem.fit_transform(embeddings, labels=docs)

# Export to pandas using the standalone function
from semnet import to_pandas
nodes, edges = to_pandas(G)

Requirements

  • Python 3.8+
  • networkx
  • annoy
  • numpy
  • pandas
  • tqdm

Recommended for examples:

  • sentence-transformers
  • cosmograph

Project origin

I love network analysis, and have explored embedding-derived semantic networks in the past as an alternative approach to representing, clustering and querying news data.

Semnet started life as a few functions I'd been using for deduplication and disambiguation of structured output from LLMs. I could see a number of potential uses for my code, so I decided to package it up for others to use.

Statement on the use of AI

I kicked off the project by hand-refactoring my initial code into the class-based structure that forms the core functionality of the current module.

I then used Github Copilot in VSCode to:

  • Bootstrap scaffolding, tests, documentation, examples and typing
  • Refactor the core methods in the style of the scikit-learn API
  • Add additional functionality, e.g., the ability to pass custom data to nodes
  • Walk me through deployment to readthedocs and pypi

Roadmap

Semnet is a relatively simple project focused on core graph construction functionality. I don't have much in the way of immediate plans to expand it, however can see the potential for a few future additions:

  • Performance optimizations for very large datasets
  • Utilities for deduplication, as that's my main use case
  • Integration with graph visualization tools

License

MIT License

Citation

If you use Semnet in academic work, please cite:

@software{semnet,
  title={Semnet: Semantic Networks from Embeddings},
  author={Ian Goodrich},
  year={2025},
  url={https://github.com/specialprocedures/semnet}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semnet-0.1.8.tar.gz (3.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semnet-0.1.8-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file semnet-0.1.8.tar.gz.

File metadata

  • Download URL: semnet-0.1.8.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semnet-0.1.8.tar.gz
Algorithm Hash digest
SHA256 578522371d0b929897ef26c2d8134cc5e6c68b303f3531a5905cd13663441799
MD5 8bd562956fc843f90ec38acb3263a452
BLAKE2b-256 0e99b39311ac7f041354d8e45f56fab1ceb12fcae5152145986eae21ccdf82d4

See more details on using hashes here.

File details

Details for the file semnet-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: semnet-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semnet-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d591c30ed8b4d319d441cb67b9c300eed5e129f8f79ca55212e85b8ff897cc19
MD5 2fa7ee5aa419f97c18786bff4722c8a1
BLAKE2b-256 55d57cb7da0dedc5b273e0221556a7f19491e2a0ca4985ad36372d3fc08516e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page