Skip to main content

Repository-grounded KMDS helper that analyzes project artifacts and builds a KMDS knowledge graph

Project description


KMDS Data Helper: Repo Architect Framework

A modular, multi-persona framework for analyzing data science repositories. Uses local LLMs (via Ollama) to synthesize insights from documentation, data schemas, and Jupyter notebooks.

📂 Project Structure

KMDS-Helper follows a strict modular architecture to separate concerns:

  • src/kmds_data_helper/: Core logic modules (Config, Processing, LLM, Engine).
  • documents/: Project documentation (.pdf, .txt).
  • data/: Physical data assets (CSVs) - isolated from output.
  • notebooks/: Experimental code (.ipynb).
  • output/: Isolated directory for generated reports.

🛠️ Installation & Setup

  1. Environment: Ensure you are using the local virtual environment.
    source .venv/bin/activate
    
  2. LLM Engine: Requires Ollama running locally with the qwen2.5-coder:7b model.
  3. Dependencies:
    pip install rich ollama dataprofiler pymupdf4llm nbformat pyyaml
    

⚙️ Configuration

The framework is controlled by kmds_config.yaml in the root directory. You can toggle persona behaviors (Scientist, Tech Lead, Architect) and pathing without changing Python code.

🚀 Usage

Run the main orchestrator from the project root:

uv run uvicorn api:app --reload

##🧪 Tests

uv run pytest tests/test_personas.py

📦 Packaged Usage (v1)

This first version assumes a fixed repository structure. A user can install the package, run the knowledge-graph aggregator in a cloned repo, and produce a KMDS knowledge graph.

Required folders in the cloned repo

  • documents/
  • notebooks/
  • data_dictionary/
  • output/

Expected helper output artifacts

At least one of these files should exist in output/:

  • full_service_report.json
  • kmds_summary.json
  • kmds_strategic_summary.json

Install

From the project root:

pip install -e .

Generate knowledge graph from helper outputs

kmds-kb --workspace . --project-file project_knowledge_graph.xml --mode auto

The command validates the required folders, ingests the helper output artifacts, and writes:

  • project_knowledge_graph.xml

Adapter command (direct use)

You can also run the output adapter directly for a single file:

kmds-analyze --input output/full_service_report.json --project-file project_knowledge_graph.xml --create-project --workflow-name kmds_project_workflow --mode auto

Backward-compatible template script

If you are using the template script path, this remains supported:

python kb_aggregator.py --workspace . --project-file project_knowledge_graph.xml --mode auto

Common failure messages

  • Missing folder(s): one or more required directories are absent.
  • No helper output files found: none of the expected JSON artifacts are present in output/.
  • Project file already exists in create mode: rerun with update mode or choose a new target path.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kmds_data_helper-0.3.0.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kmds_data_helper-0.3.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file kmds_data_helper-0.3.0.tar.gz.

File metadata

  • Download URL: kmds_data_helper-0.3.0.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kmds_data_helper-0.3.0.tar.gz
Algorithm Hash digest
SHA256 60ca5678be7182f5664ea8e14e158352f9aa6a90ca7a226a8035560f1ab06cbd
MD5 a8043b294bf8ee3ec7c27b1399747680
BLAKE2b-256 cba71a3a89ac66af8b490f5b4275d2a56044c66da1b3d865889749290ec11630

See more details on using hashes here.

File details

Details for the file kmds_data_helper-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: kmds_data_helper-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kmds_data_helper-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7fad3523552d9c576febd22688b25e14c52f2b41d6a9cb1e37730f4db5e17001
MD5 3afe4f3d8f3422f05775982f3217ebb0
BLAKE2b-256 f2968ab5a2030fc0ed18d8b22ae5214a801fcca6e28c0ab64ff38e9cfd81b681

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page