Dictionary-first extraction, alias canonicalization, and lightweight reranking utilities for technical documents.

These details have not been verified by PyPI

Project links

Project description

skeinrank-core

skeinrank-core is the lightweight Python SDK and CLI for deterministic local terminology canonicalization.

It is the zero-friction entrypoint for SkeinRank: no Governance API, Elasticsearch, RabbitMQ, Celery, Docker, OpenRouter token, or ML dependencies are required.

30-second demo

import skeinrank

print(skeinrank.canonicalize("k8s pg timeout"))
# kubernetes postgresql timeout

print(skeinrank.extract("sev1 on kube after deploy"))
# ['critical incident', 'kubernetes', 'deployment']

The module-level helpers use a built-in platform_ops_demo dictionary so the first call works without a file. The demo dictionary is small enough to inspect, but expressive enough to show infrastructure, incidents, CI/CD, search, RAG, and context-shaped company language.

The same built-in dictionary also demonstrates why context matters:

import skeinrank

print(skeinrank.canonicalize("pg timeout"))
# postgresql timeout

print(skeinrank.canonicalize("pg layout"))
# page layout

print(skeinrank.canonicalize("pg dashboard"))
# product group

CLI from a source checkout:

poetry run skeinrank canonicalize "k8s pg timeout" --text
poetry run skeinrank extract "sev1 on kube after deploy" --text --compact

Install from a checkout

cd packages/skeinrank-core
poetry install
poetry run pytest -q

Legacy reranking modules remain in the source tree for compatibility, but the package no longer exposes heavyweight ML install extras. The local SDK facade, demo dictionary, CLI, and document helpers do not require ML dependencies.

Public Python facade

Use SkeinRank when you want to pass a dictionary in code:

from skeinrank import SkeinRank

sr = SkeinRank({
    "kubernetes": ["k8s", "kube", "kuber"],
    "postgresql": ["pg", "postgres", "psql"],
})

print(sr.canonicalize("kuber timeout on pg"))
# kubernetes timeout on postgresql

print(sr.extract("kuber timeout on pg"))
# ['kubernetes', 'postgresql']

Use explain=True when you need offsets, slots, and highlighted evidence:

result = sr.extract("k8s rollout uses pg", explain=True)

print(result.canonical_values)
print(result.matches[0].alias)
print(result.matches[0].highlighted_fragment)

The same facade can load a full SkeinRank dictionary JSON/YAML file:

from skeinrank import SkeinRank

sr = SkeinRank.from_file("company.dictionary.yaml")
print(sr.canonicalize("k8s rollout uses pg database"))

Built-in demo dictionary and examples

The built-in platform_ops_demo dictionary contains more than 30 canonical terms and more than 80 aliases across platform operations, incidents, CI/CD, search, RAG, and SkeinRank concepts. It is intentionally not a production vocabulary; it is a compact first-touch dictionary for demos, tests, tutorials, and screenshots.

Useful demo phrases:

Input	Output
`k8s pg timeout`	`kubernetes postgresql timeout`
`sev1 on kube after pg migration`	`critical incident on kubernetes after postgresql database migration`
`gha deploy hit rmq latency spike`	`github actions`, `deployment`, `message queue`, `latency`
`pg layout`	`page layout`
`pg dashboard`	`product group`

Examples live in ../../examples/sdk:

zero_friction_demo.py runs the facade from Python.
platform_ops_demo.dictionary.json exports the built-in dictionary in the public dictionary shape.

Dictionary-first SDK

The lower-level dictionary SDK remains available for callers that already use the governance export or skeinrank-migrate dictionary shape.

from skeinrank import load_dictionary, extract_terms, canonicalize_text

dictionary = load_dictionary("../../examples/migration/console_dictionary.example.json")

result = extract_terms(
    "This instruction helps deploy 500 k8s servers backed by Postgres.",
    dictionary=dictionary,
)

print(result.canonical_values)  # ['kubernetes', 'postgresql']

canonicalized = canonicalize_text(
    "k8s rollout uses pg database",
    dictionary=dictionary,
)
print(canonicalized.text)  # kubernetes rollout uses postgresql database

Stable dictionary exports include:

SkeinRank, canonicalize(...), extract(...), demo_dictionary(...), demo_dictionary_payload(...)
Dictionary, DictionaryTerm, DictionaryAlias, DictionaryStopListEntry
load_dictionary(...), validate_dictionary(...)
extract_terms(...), canonicalize_text(...)
ExtractionResult, TermMatch, CanonicalizedText

The matcher is deterministic and local. It honors active/deprecated term and alias statuses, profile/global stop lists, returns offsets, and includes evidence snippets with <mark>...</mark> highlights.

Document text extraction utilities

Local document helpers can extract text before running the SDK matcher. They do not require the Governance API, Elasticsearch, Celery, or a database.

from skeinrank import load_document_text, extract_terms_from_document

text = load_document_text("incident-runbook.md")
result = extract_terms_from_document(
    "incident-runbook.md",
    dictionary="../../examples/migration/console_dictionary.example.json",
)

print(result.document.file_name)
print(result.extraction.canonical_values)

Supported formats without extra dependencies:

text-like files: .txt, .md, .rst, .log, .csv, .tsv, .json, .jsonl, .yaml, .yml
.html / .htm with scripts/styles ignored
.docx via a small stdlib ZIP/XML reader

PDF extraction is supported when the caller installs pypdf in the environment. The core package does not require it by default.

Local CLI

Validate a dictionary exported from the governance API or used by skeinrank-migrate:

poetry run skeinrank validate-dictionary ../../examples/migration/console_dictionary.example.json
poetry run skeinrank validate-dictionary ../../examples/migration/console_dictionary.example.yaml --json

Run zero-config demo extraction/canonicalization:

poetry run skeinrank extract "k8s rollout uses pg database" --text --compact
poetry run skeinrank canonicalize "k8s rollout uses pg database" --text

Print or export the built-in demo dictionary:

poetry run skeinrank demo-dictionary --compact
poetry run skeinrank demo-dictionary --output ../../examples/sdk/platform_ops_demo.dictionary.json

Run the example script:

poetry run python ../../examples/sdk/zero_friction_demo.py

Run against a specific dictionary file:

poetry run skeinrank extract "k8s rollout uses pg database" \
  --text \
  --dictionary ../../examples/migration/console_dictionary.example.json

poetry run skeinrank canonicalize incident-runbook.md \
  --dictionary ../../examples/migration/console_dictionary.example.json \
  --output incident-runbook.canonicalized.txt

Extract plain text from a document before matching:

poetry run skeinrank document-text incident-runbook.docx --output incident-runbook.txt

The CLI returns JSON for extract, raw text by default for canonicalize and document-text, and supports --output, --compact, --max-matches, and --context-chars where relevant.

Attribute extraction and enrichment

The older attribute/profile API is still available for advanced local enrichment workflows.

from skeinrank import build_attribute_profile, enrich_texts

profile = build_attribute_profile(
    profile_id="company_terms",
    aliases={
        "kubernetes": ["k8s", "kube", "kuber"],
        "postgresql": ["pg", "postgres", "psql"],
    },
    slots={
        "kubernetes": "TOOL",
        "postgresql": "DB",
    },
    snapshot_version="company_terms@v1",
)

rows = enrich_texts(
    [
        {"id": "doc-1", "text": "k8s timeout after upgrade"},
        {"id": "doc-2", "text": "pg latency spike"},
    ],
    profile=profile,
)

print(rows[0]["canonical_values"])

Use this layer when you need profile templates, fuzzy alias fallback, richer passport/debug traces, or JSONL enrichment helpers.

Publishing checklist

The package is published through the manual publish-skeinrank-core GitHub Actions workflow. The recommended flow is:

Build and test locally.
Publish to TestPyPI.
Install from TestPyPI in a clean environment.
Publish to PyPI only after the TestPyPI smoke test passes.

Local packaging checks:

poetry install
poetry run pytest -q
poetry build
poetry run python -m pip install --upgrade twine
poetry run twine check dist/*

See docs/PUBLISHING.md for the full release checklist.

Public API policy

Only symbols re-exported from skeinrank.__init__ should be treated as stable public API. Internal modules may change without notice.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.11.0

Jun 8, 2026

This version

0.10.0

Jun 7, 2026

0.0.16

May 10, 2026

0.0.1

Jan 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skeinrank-0.10.0.tar.gz (67.7 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skeinrank-0.10.0-py3-none-any.whl (80.5 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file skeinrank-0.10.0.tar.gz.

File metadata

Download URL: skeinrank-0.10.0.tar.gz
Upload date: Jun 7, 2026
Size: 67.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for skeinrank-0.10.0.tar.gz
Algorithm	Hash digest
SHA256	`b8cc2961eed5ffafe989a12d9810779b0f53d05590d20c943cdb390fc0d3286b`
MD5	`d29b27ee96082e399ba640f6accec1a6`
BLAKE2b-256	`53200404fe63d557e585ea05feb184ec0dfc905fa5169ce988d7cc27de84f182`

See more details on using hashes here.

Provenance

The following attestation bundles were made for skeinrank-0.10.0.tar.gz:

Publisher: publish-skeinrank-core.yml on SkeinRank/skeinrank

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: skeinrank-0.10.0.tar.gz
- Subject digest: b8cc2961eed5ffafe989a12d9810779b0f53d05590d20c943cdb390fc0d3286b
- Sigstore transparency entry: 1741900558
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: SkeinRank/skeinrank@b11fdbb67419dac073a691dddd15e9a27cd3797f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/SkeinRank
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-skeinrank-core.yml@b11fdbb67419dac073a691dddd15e9a27cd3797f
- Trigger Event: workflow_dispatch

File details

Details for the file skeinrank-0.10.0-py3-none-any.whl.

File metadata

Download URL: skeinrank-0.10.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 80.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for skeinrank-0.10.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2eeff5871e04d3d06485e05df63fb21fbe45157d72bb55fc19863c2f9f056ce6`
MD5	`5e36be5fa22860f4847e1a507cfe9118`
BLAKE2b-256	`0359508c8693e366c5d88c1f2d525adc9fae825aa23bb9ce6ddf81cacfd60506`

See more details on using hashes here.

Provenance

The following attestation bundles were made for skeinrank-0.10.0-py3-none-any.whl:

Publisher: publish-skeinrank-core.yml on SkeinRank/skeinrank

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: skeinrank-0.10.0-py3-none-any.whl
- Subject digest: 2eeff5871e04d3d06485e05df63fb21fbe45157d72bb55fc19863c2f9f056ce6
- Sigstore transparency entry: 1741900615
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: SkeinRank/skeinrank@b11fdbb67419dac073a691dddd15e9a27cd3797f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/SkeinRank
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-skeinrank-core.yml@b11fdbb67419dac073a691dddd15e9a27cd3797f
- Trigger Event: workflow_dispatch

skeinrank 0.10.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

skeinrank-core

30-second demo

Install from a checkout

Public Python facade

Built-in demo dictionary and examples

Dictionary-first SDK

Document text extraction utilities

Local CLI

Attribute extraction and enrichment

Publishing checklist

Public API policy

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance