Skip to main content

Repository-grounded KMDS helper that analyzes project artifacts and builds a KMDS knowledge graph

Project description


KMDS Data Helper: Repo Architect Framework

A modular, multi-persona framework for analyzing data science repositories. Uses local LLMs (via Ollama) to synthesize insights from documentation, data schemas, and Jupyter notebooks.

📂 Project Structure

KMDS-Helper follows a strict modular architecture to separate concerns:

  • src/kmds_data_helper/: Core logic modules (Config, Processing, LLM, Engine).
  • documents/: Project documentation (.pdf, .txt).
  • data/: Physical data assets (CSVs) - isolated from output.
  • notebooks/: Experimental code (.ipynb).
  • output/: Isolated directory for generated reports.

🛠️ Installation & Setup

  1. Environment: Ensure you are using the local virtual environment.
    source .venv/bin/activate
    
  2. LLM Engine: Requires Ollama running locally with the qwen2.5-coder:7b model.
  3. Dependencies:
    pip install rich ollama dataprofiler pymupdf4llm nbformat pyyaml
    

⚙️ Configuration

The framework is controlled by kmds_config.yaml in the root directory. You can toggle persona behaviors (Scientist, Tech Lead, Architect) and pathing without changing Python code.

🚀 Usage

Run the main orchestrator from the project root:

python3 main.py

📦 Packaged Usage (v1)

This first version assumes a fixed repository structure. A user can install the package, run the knowledge-graph aggregator in a cloned repo, and produce a KMDS knowledge graph.

Required folders in the cloned repo

  • documents/
  • notebooks/
  • data_dictionary/
  • output/

Expected helper output artifacts

At least one of these files should exist in output/:

  • full_service_report.json
  • kmds_summary.json
  • kmds_strategic_summary.json

Install

From the project root:

pip install -e .

Generate knowledge graph from helper outputs

kmds-kb --workspace . --project-file project_knowledge_graph.xml --mode auto

The command validates the required folders, ingests the helper output artifacts, and writes:

  • project_knowledge_graph.xml

Adapter command (direct use)

You can also run the output adapter directly for a single file:

kmds-analyze --input output/full_service_report.json --project-file project_knowledge_graph.xml --create-project --workflow-name kmds_project_workflow --mode auto

Backward-compatible template script

If you are using the template script path, this remains supported:

python kb_aggregator.py --workspace . --project-file project_knowledge_graph.xml --mode auto

Common failure messages

  • Missing folder(s): one or more required directories are absent.
  • No helper output files found: none of the expected JSON artifacts are present in output/.
  • Project file already exists in create mode: rerun with update mode or choose a new target path.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kmds_data_helper-0.2.0.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kmds_data_helper-0.2.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file kmds_data_helper-0.2.0.tar.gz.

File metadata

  • Download URL: kmds_data_helper-0.2.0.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kmds_data_helper-0.2.0.tar.gz
Algorithm Hash digest
SHA256 83d0620e4c8dc6ff6bb1fa02746579f99571205387106a6f68350c28e42a15d5
MD5 8507af4ef4c404668e8effe328fb30cf
BLAKE2b-256 11eb4ad60dc79abafbb7c1fea8418d01ba728dd2e950c823366ed1cd9617d1c7

See more details on using hashes here.

File details

Details for the file kmds_data_helper-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: kmds_data_helper-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kmds_data_helper-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9480005c77d546f76985bf78cd5b79d620ddac8b879f6986dd6052480a4bb668
MD5 4164d80839464312af92f926d5c536dc
BLAKE2b-256 0a7ca000f532809fad49556c09681419cacfcbfef5d7fe5b7a836f494bebe5cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page