This is a package that helps data scientists and data analysts to capture notes while they work through data science tasks. The captured tasks can then be searched and analyzed.

Project description

Knowledge Management for Data Science (KMDS)

Capture, organize, and reuse knowledge from your data science experiments.

🌟 What is KMDS?

KMDS is a Python-based tool for systematic knowledge management in data science and analytics projects. It helps you document the incremental process of exploration, data preparation, and model development — capturing context, decisions, and rationale so that valuable insights are not lost over time.

The Problem It Solves

Experimental work generates a stream of decisions and findings. The context and rationale behind each step are often documented ad-hoc, if at all. When it is time to revisit a question or build on earlier work, the research trail has gone cold. KMDS fixes this by providing a structured, ontology-backed way to log, search, and share your findings.

Who Can Use KMDS?

KMDS was originally designed for data scientists writing Python. Recent additions to the CLI and natural-language tooling mean it is now practical for a broader set of users:

User	How they interact with KMDS
Data scientist	Python API, notebooks, CLI — full access to all features
Software developer	CLI tools and Python API for automating knowledge capture in pipelines
Business analyst	CLI commands and natural-language ingestion — no ontology code required

🎥 Watch a quick overview of KMDS: YouTube Video

✨ Key Features

Structured Observation Capture: Log findings from exploration, data representation, modeling choice, and model selection stages using Python or the CLI.
Natural Language Ingestion: Describe a finding in plain English — KMDS classifies it, extracts structured entities, and optionally logs it to the knowledge base. No ontology code required.
Ontology-Backed Knowledge Base: Store and reload workflow knowledge as RDF/OWL artifacts that can be shared across projects and teams.
Semantic Search: Build a vector index from your knowledge base and retrieve relevant findings with natural-language queries.
LLM Search Orchestrator: Route natural-language questions to structured KMDS search templates with automatic semantic fallback.
CLI-First Usability: Every major feature is accessible as a command-line tool — usable by developers and analysts without writing notebook code.
Simple Reporting Surface: Load observations into tabular form for review, sharing, and downstream analysis.

🚀 Getting Started

1. Installation

Install KMDS in your Python environment:

pip install kmds

2. Usage

As you work through your analysis, log your findings to kmds. Check out the examples below.

3. Quick Summary Logging (CLI)

KMDS now supports logging exploratory observations directly from a free-text project summary. This is useful for business analysts and other non-developers who want to capture findings quickly.

Application workflow example (explicit, non-interactive):

kmds-summary-log \
  --summary "This is a daily reporting workflow for support operations. Missing category labels were found in intake data." \
  --workflow-name "support_reporting_intake" \
  --workflow-type application \
  --project-file ./support_reporting_intake.xml \
  --create-project \
  --no-prompt

Ambiguous summary example (interactive prompt):

kmds-summary-log \
  --summary "Project kickoff notes for the upcoming quarter." \
  --workflow-name "quarterly_kickoff_notes" \
  --project-file ./quarterly_kickoff_notes.xml \
  --create-project

In the ambiguous case, KMDS will ask whether the workflow is application or experimental, then continue logging exploratory observations.

4. Export Executive Summary (CLI)

You can export a non-technical executive summary from a KMDS project file.

kmds-exec-summary \
  --project-file ./support_reporting_intake.xml \
  --output-file ./support_reporting_exec_summary.txt

Optional LLM mode (falls back to local summary if API/model is unavailable):

kmds-exec-summary \
  --project-file ./support_reporting_intake.xml \
  --output-file ./support_reporting_exec_summary.txt \
  --use-llm \
  --model gemini-1.5-flash

Markdown output option:

kmds-exec-summary \
  --project-file ./support_reporting_intake.xml \
  --output-file ./support_reporting_exec_summary.md \
  --format markdown

5. Natural Language Observation Ingestion

KMDS can classify a free-form natural language statement into the existing KMDS observation schema, extract structured entities, and either return a summary or log the result into a KMDS knowledge base.

Summary mode example:

kmds-observe \
  --text "The model accuracy dropped by 5% after pruning on 2026-04-20." \
  --mode summary \
  --output-format json

Log mode example for a new project:

kmds-observe \
  --text "Missing values were observed in the customer_age field during intake validation." \
  --mode log \
  --workflow-name "support_reporting_intake" \
  --project-file ./support_reporting_intake.xml \
  --workflow-type application \
  --create-project

Python API example:

from kmds.utils.natural_language_observation import map_text_to_observation

mapping = map_text_to_observation(
    "We engineered a rolling 7 day demand feature from timestamped order counts."
)

print(mapping.workflow_family)
print(mapping.observation_type)
print(mapping.extracted_entities)

6. Semantic Search (CLI)

Build a vector index from a KMDS knowledge base and retrieve relevant findings with a natural-language query. No API key required.

kmds-search \
  --kb ./support_reporting_intake.xml \
  --query "What data quality issues were found?" \
  --n-results 5

Or from the Python API:

from kmds.search import SemanticIndex

idx = SemanticIndex()
idx.build("./support_reporting_intake.xml")
results = idx.search("What data quality issues were found?", n_results=5)
for r in results:
    print(r["obs_type"], "|", r["finding"])

7. LLM Search Orchestrator (CLI)

Ask a free-form question. The orchestrator routes it to the best KMDS observation-query template using an LLM, executes the template, and synthesises a plain-English answer. Falls back to semantic search automatically.

export GOOGLE_API_KEY="your-api-key"
kmds-ask \
  --kb ./support_reporting_intake.xml \
  --query "What assumptions drove the final model selection?"

The full documentation covers custom LLM functions, available routing templates, and output formats.

This repository includes two detailed examples:

Analytics Example: Evaluates the effectiveness of a ticket resolution help desk.
Machine Learning Example: Uses Principal Component Analysis (PCA) to summarize online store sales activity.
- Notebooks
- Infographic

🤝 Contributing

We welcome contributions! If you have an idea for a new feature or would like to report a bug, please open an issue. If you'd like to contribute code, please fork the repository and submit a pull request.

📄 License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

📞 Contact

If you have questions or are interested in the following, please schedule a meeting:

Help with a data analysis task for your use case.
Developing a custom ontology-based solution.
Integrating KMDS with other tools in your data science stack.

Project details

Release history Release notifications | RSS feed

This version

0.3.3

Apr 20, 2026

0.3.2

Apr 20, 2026

0.3.1

Feb 2, 2026

0.3.0

Feb 2, 2026

0.2.16

May 15, 2024

0.2.15

May 4, 2024

0.2.14

Mar 2, 2024

0.2.13

Mar 2, 2024

0.2.12

Feb 28, 2024

0.2.11

Feb 27, 2024

0.2.10

Feb 27, 2024

0.2.9

Feb 27, 2024

0.2.7

Feb 18, 2024

0.1.0

Feb 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kmds-0.3.3.tar.gz (5.6 MB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kmds-0.3.3-py3-none-any.whl (5.8 MB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file kmds-0.3.3.tar.gz.

File metadata

Download URL: kmds-0.3.3.tar.gz
Upload date: Apr 20, 2026
Size: 5.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for kmds-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`6ffacebac2a34024cce06b282da85657acb5b86fae07d945de9d399a59681523`
MD5	`ded854efa3fb0469cc908d533e388890`
BLAKE2b-256	`e192268f343213d61d64fd8300ef3a05c328ff2b677a94083fa075a30f6490ac`

See more details on using hashes here.

File details

Details for the file kmds-0.3.3-py3-none-any.whl.

File metadata

Download URL: kmds-0.3.3-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 5.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for kmds-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c8d754ea6809875710814df90ed5d2ad024628994fd20c1b0042bf41fcf7e1b1`
MD5	`bb366f3b9e4ae568e432038b91d60666`
BLAKE2b-256	`69de7b7bb7526d87c13247557b40257ab5b914c728549b9654c8d4cca16c8fc1`

See more details on using hashes here.

kmds 0.3.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Knowledge Management for Data Science (KMDS)

🌟 What is KMDS?

The Problem It Solves

Who Can Use KMDS?

✨ Key Features

🚀 Getting Started

1. Installation

2. Usage

3. Quick Summary Logging (CLI)

4. Export Executive Summary (CLI)

5. Natural Language Observation Ingestion

6. Semantic Search (CLI)

7. LLM Search Orchestrator (CLI)

🤝 Contributing

📄 License

📞 Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes