Skip to main content

Semantic Networks from Embeddings

Project description

Semnet: efficient graph structures from embeddings

Embeddings of Guardian headlines represented as a network structure by Semnet and visualised by Cosmograph Embeddings of Guardian headlines represented as a network by Semnet and visualised in Cosmograph

Introduction

Semnet constructs graph structures from embeddings, enabling graph-based analysis and operations over collections of embedded documents.

Semnet uses Annoy to perform efficient pair-wise distance calculations, allowing for million-embedding network construction in under ten minutes on consumer hardware.

Graphs are returned as NetworkX objects, opening up a wide range of algorithms for downstream use.

The name "Semnet" derives from semantic network[^1], as it was initially designed for an NLP use-case, but the tool will work well with any form of embedded document (e.g., images, audio, even or graphs).

[^1]: Technically-speaking a Semantic Similarity Network (SSN)

Semnet may be used for:

  • Graph algorithms: enrich your data with communities, centrality and much more for down-stream use in search, RAG and context engineering
  • Deduplication: remove duplicate records (e.g., "Donald Trump", "Donald J. Trump) from datasets
  • Exploratory data analysis and visualisation, Cosmograph works brilliantly for large corpora

Exposing the full NetworkX and Annoy APIs, Semnet offers plenty of opportunity for experimentation depending on your use-case.

Check out the launch blog for more about Semnet and the examples for inspiration.

Installation

pip install semnet

Quick Start

from semnet import SemanticNetwork
from sentence_transformers import SentenceTransformer

# Your documents
docs = [
    "The cat sat on the mat",
    "A cat was sitting on a mat",
    "The dog ran in the park",
    "I love Python",
    "Python is a great programming language",
]

# Generate embeddings (use any embedding provider)
embedding_model = SentenceTransformer("BAAI/bge-base-en-v1.5")
embeddings = embedding_model.encode(docs)

# Create and configure semantic network
sem = SemanticNetwork(thresh=0.3, verbose=True)  # Larger values give sparser networks

# Build a NetworkX graph object from your embeddings
G = sem.fit_transform(embeddings, labels=docs)

# Export to pandas using the standalone function
from semnet import to_pandas
nodes, edges = to_pandas(G)

Requirements

  • Python 3.8+
  • networkx
  • annoy
  • numpy
  • pandas
  • tqdm

Recommended for examples:

  • sentence-transformers
  • cosmograph

Project origin

I love network analysis, and have explored embedding-derived semantic networks in the past as an alternative approach to representing, clustering and querying news data.

Semnet started life as a few functions I'd been using for deduplication and disambiguation of structured output from LLMs. I could see a number of potential uses for my code, so I decided to package it up for others to use.

Statement on the use of AI

I kicked off the project by hand-refactoring my initial code into the class-based structure that forms the core functionality of the current module.

I then used Github Copilot in VSCode to:

  • Bootstrap scaffolding, tests, documentation, examples and typing
  • Refactor the core methods in the style of the scikit-learn API
  • Add additional functionality, e.g., the ability to pass custom data to nodes
  • Walk me through deployment to readthedocs and pypi

Roadmap

Semnet is a relatively simple project focused on core graph construction functionality. I don't have much in the way of immediate plans to expand it, however can see the potential for a few future additions:

  • Performance optimizations for very large datasets
  • Utilities for deduplication, as that's my main use case
  • Integration with graph visualization tools

License

MIT License

Citation

If you use Semnet in academic work, please cite:

@software{semnet,
  title={Semnet: Semantic Networks from Embeddings},
  author={Ian Goodrich},
  year={2025},
  url={https://github.com/specialprocedures/semnet}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semnet-0.1.9.tar.gz (3.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semnet-0.1.9-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file semnet-0.1.9.tar.gz.

File metadata

  • Download URL: semnet-0.1.9.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semnet-0.1.9.tar.gz
Algorithm Hash digest
SHA256 9dbb0bd54ac2c436875cf8d5c4bea3ce02db68ab1083bab663693ffcc4df5fb3
MD5 af2687dcc505951ed3ab52fed43c3002
BLAKE2b-256 fe07ecf4c913382b999c8b2ab7142c46e18cddddbe26d625bf8b8e9929defe6e

See more details on using hashes here.

File details

Details for the file semnet-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: semnet-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semnet-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 c5a5b13081f9b5e44f07ad2e2f14882831d1717c45921a4aca383286a635987c
MD5 46bcd7ce297f58c17fe23f4e8f260912
BLAKE2b-256 e9929adfa2433bcfda3b8fdada1612d17ff1a2f95cf413a5512888c764849ea0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page