Skip to main content

Semantic and co-occurrence graphs for midsized texts

Project description

kenon

Semantic and co-occurrence graphs for midsized texts. Kenon builds weighted graphs from text using corpus-internal statistics — no neural models or external training data required. Supports co-occurrence windows, TF-IDF similarity, PMI embeddings, and disparity filter backbone extraction.

Installation

uv add kenon
python -m spacy download en_core_web_sm

Quickstart

from kenon import (
    Tokenizer,
    get_stopwords,
    build_cooccurrence_graph,
    extract_backbone,
)

# 1. Tokenize
tokenizer = Tokenizer("en_core_web_sm", lemmatize=True)
tokens = tokenizer.flat_tokens("The cat sat on the mat. The dog ran in the park.")

# 2. Build graph
stopwords = get_stopwords("english")
graph = build_cooccurrence_graph(tokens, window=2, stopwords=stopwords)

# 3. Extract backbone
backbone = extract_backbone(graph, min_alpha_ptile=0.3, min_degree=2)
print(f"Backbone: {backbone.number_of_nodes()} nodes, {backbone.number_of_edges()} edges")

Features

  • Tokenization: spaCy-backed sentence splitting, tokenization, and lemmatization
  • Stopwords: Merged NLTK + sklearn stopword lists with custom extensions
  • Embeddings: Count vectors, TF-IDF, and PMI (via chronowords) — all corpus-internal
  • Co-occurrence graphs: Skip-gram window co-occurrence with collocation detection
  • Semantic graphs: Cosine similarity graphs from any embedder
  • Backbone extraction: Disparity filter for statistically significant edges

Documentation

See the docs/ directory for full API reference and examples.

Made by

Crow Intelligence

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kenon-0.1.0.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kenon-0.1.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file kenon-0.1.0.tar.gz.

File metadata

  • Download URL: kenon-0.1.0.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kenon-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0b82ab3850801e991411a81b0f5d89801dd3850e9d198ee55a0882245a0f82ec
MD5 4db2fd58b0c7eabb9cec0f52f44cb1de
BLAKE2b-256 e871019c8081b2eeb474fa22be63a0fae2d1501b64b13fa2793f447b5eac8f5c

See more details on using hashes here.

Provenance

The following attestation bundles were made for kenon-0.1.0.tar.gz:

Publisher: publish.yml on crow-intelligence/kenon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kenon-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kenon-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kenon-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fc057c068764eba7a515be26b77c6b6ed57c358446a2af8a5b24ea59cd6baa9f
MD5 61ab307e09bfdba0be3c53c860946517
BLAKE2b-256 aeb98c081da07390f3a8aad77b50629330264ce8869fa6d0d836380e775ad313

See more details on using hashes here.

Provenance

The following attestation bundles were made for kenon-0.1.0-py3-none-any.whl:

Publisher: publish.yml on crow-intelligence/kenon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page