Skip to main content

This is a package that helps data scientists and data analysts to capture notes while they work through data science tasks. The captured tasks can then be searched and analyzed.

Project description

KMDS Logo

Knowledge Management for Data Science (KMDS)

Capture, organize, and reuse knowledge from your data science experiments.

DOI License Documentation Status


🌟 What is KMDS?

KMDS is a Python-based tool for systematic knowledge management in data science and analytics projects. It helps you document the incremental process of exploration, data preparation, and model development — capturing context, decisions, and rationale so that valuable insights are not lost over time.

The Problem It Solves

Experimental work generates a stream of decisions and findings. The context and rationale behind each step are often documented ad-hoc, if at all. When it is time to revisit a question or build on earlier work, the research trail has gone cold. KMDS fixes this by providing a structured, ontology-backed way to log, search, and share your findings.

Who Can Use KMDS?

KMDS was originally designed for data scientists writing Python. Recent additions to the CLI and natural-language tooling mean it is now practical for a broader set of users:

User How they interact with KMDS
Data scientist Python API, notebooks, CLI — full access to all features
Software developer CLI tools and Python API for automating knowledge capture in pipelines
Business analyst CLI commands and natural-language ingestion — no ontology code required

🎥 Watch a quick overview of KMDS: YouTube Video


✨ Key Features

  • Structured Observation Capture: Log findings from exploration, data representation, modeling choice, and model selection stages using Python or the CLI.
  • Natural Language Ingestion: Describe a finding in plain English — KMDS classifies it, extracts structured entities, and optionally logs it to the knowledge base. No ontology code required.
  • Ontology-Backed Knowledge Base: Store and reload workflow knowledge as RDF/OWL artifacts that can be shared across projects and teams.
  • Semantic Search: Build a vector index from your knowledge base and retrieve relevant findings with natural-language queries.
  • LLM Search Orchestrator: Route natural-language questions to structured KMDS search templates with automatic semantic fallback.
  • CLI-First Usability: Every major feature is accessible as a command-line tool — usable by developers and analysts without writing notebook code.
  • Simple Reporting Surface: Load observations into tabular form for review, sharing, and downstream analysis.

🚀 Getting Started

1. Installation

Install KMDS in your Python environment:

pip install kmds

2. Usage

As you work through your analysis, log your findings to kmds. Check out the examples below.

3. Quick Summary Logging (CLI)

KMDS now supports logging exploratory observations directly from a free-text project summary. This is useful for business analysts and other non-developers who want to capture findings quickly.

Application workflow example (explicit, non-interactive):

kmds-summary-log \
  --summary "This is a daily reporting workflow for support operations. Missing category labels were found in intake data." \
  --workflow-name "support_reporting_intake" \
  --workflow-type application \
  --project-file ./support_reporting_intake.xml \
  --create-project \
  --no-prompt

Ambiguous summary example (interactive prompt):

kmds-summary-log \
  --summary "Project kickoff notes for the upcoming quarter." \
  --workflow-name "quarterly_kickoff_notes" \
  --project-file ./quarterly_kickoff_notes.xml \
  --create-project

In the ambiguous case, KMDS will ask whether the workflow is application or experimental, then continue logging exploratory observations.

4. Export Executive Summary (CLI)

You can export a non-technical executive summary from a KMDS project file.

kmds-exec-summary \
  --project-file ./support_reporting_intake.xml \
  --output-file ./support_reporting_exec_summary.txt

Optional LLM mode (falls back to local summary if API/model is unavailable):

kmds-exec-summary \
  --project-file ./support_reporting_intake.xml \
  --output-file ./support_reporting_exec_summary.txt \
  --use-llm \
  --model gemini-1.5-flash

Markdown output option:

kmds-exec-summary \
  --project-file ./support_reporting_intake.xml \
  --output-file ./support_reporting_exec_summary.md \
  --format markdown

5. Natural Language Observation Ingestion

KMDS can classify a free-form natural language statement into the existing KMDS observation schema, extract structured entities, and either return a summary or log the result into a KMDS knowledge base.

Summary mode example:

kmds-observe \
  --text "The model accuracy dropped by 5% after pruning on 2026-04-20." \
  --mode summary \
  --output-format json

Log mode example for a new project:

kmds-observe \
  --text "Missing values were observed in the customer_age field during intake validation." \
  --mode log \
  --workflow-name "support_reporting_intake" \
  --project-file ./support_reporting_intake.xml \
  --workflow-type application \
  --create-project

Python API example:

from kmds.utils.natural_language_observation import map_text_to_observation

mapping = map_text_to_observation(
    "We engineered a rolling 7 day demand feature from timestamped order counts."
)

print(mapping.workflow_family)
print(mapping.observation_type)
print(mapping.extracted_entities)

6. Semantic Search (CLI)

Build a vector index from a KMDS knowledge base and retrieve relevant findings with a natural-language query. No API key required.

kmds-search \
  --kb ./support_reporting_intake.xml \
  --query "What data quality issues were found?" \
  --n-results 5

Or from the Python API:

from kmds.search import SemanticIndex

idx = SemanticIndex()
idx.build("./support_reporting_intake.xml")
results = idx.search("What data quality issues were found?", n_results=5)
for r in results:
    print(r["obs_type"], "|", r["finding"])

7. LLM Search Orchestrator (CLI)

Ask a free-form question. The orchestrator routes it to the best KMDS observation-query template using an LLM, executes the template, and synthesises a plain-English answer. Falls back to semantic search automatically.

export GOOGLE_API_KEY="your-api-key"
kmds-ask \
  --kb ./support_reporting_intake.xml \
  --query "What assumptions drove the final model selection?"

The full documentation covers custom LLM functions, available routing templates, and output formats.

This repository includes two detailed examples:


🤝 Contributing

We welcome contributions! If you have an idea for a new feature or would like to report a bug, please open an issue. If you'd like to contribute code, please fork the repository and submit a pull request.


📄 License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.


📞 Contact

If you have questions or are interested in the following, please schedule a meeting:

  • Help with a data analysis task for your use case.
  • Developing a custom ontology-based solution.
  • Integrating KMDS with other tools in your data science stack.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kmds-0.3.2.tar.gz (5.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kmds-0.3.2-py3-none-any.whl (5.8 MB view details)

Uploaded Python 3

File details

Details for the file kmds-0.3.2.tar.gz.

File metadata

  • Download URL: kmds-0.3.2.tar.gz
  • Upload date:
  • Size: 5.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for kmds-0.3.2.tar.gz
Algorithm Hash digest
SHA256 f7411f6223a0578eb3055af028b3ec8f31e550e9366c60ad7f8f077e5cc08745
MD5 45a78394c1463a57a8dcb9b6802f70ad
BLAKE2b-256 f2352d4ff9797b1729935c1834f174ab636586f81bfc0a154b8c89ac5dc517b2

See more details on using hashes here.

File details

Details for the file kmds-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: kmds-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 5.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for kmds-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0d97f2edf7aea42a57ee15ae9ab5b1b5fff83bbb3972c35106e4b21704a74906
MD5 e0117e56da6110a1ee297dc5a6b12743
BLAKE2b-256 4ffee2e96e2810da5c671137db1831c4a7123a11e1f7a69d51212c16be54ea8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page