Modern Data Centric AI system for Large Language Models

These details have not been verified by PyPI

Project links

Project description

DataFlow Knowledge Graph

Knowledge graph data preparation with DataFlow style operators and pipelines

DataFlow-KG framework

DataFlow Knowledge Graph: An LLM-Driven Knowledge Graph Processing Library

Build, enrich, reason over, and operationalize knowledge graphs with composable operators.

GitHub | Documentation | 中文 README

0. News

1. 🤖 Overview

DataFlow-KG (short for DataFlow Knowledge Graph) is an LLM-driven knowledge graph processing library built on top of the DataFlow ecosystem. It is designed to provide reusable, extensible, and modular operators for knowledge graph construction, reasoning, retrieval, querying, and domain-specific applications. The original DataFlow project provides a clean, elegant, and highly extensible foundation for building practical data-centric LLM workflows.

Rather than treating KG workflows as isolated scripts, DataFlow-KG organizes graph capabilities into operator packages by graph type and application scenario. These operators can be composed into larger pipelines, including but not limited to:

knowledge graph construction
graph reasoning
graph retrieval
domain-specific knowledge graph applications

DataFlow-KG aims to serve as a unified infrastructure layer for research and development on graph-centric LLM applications.

2. ✨ Key Features

2.1. Modular Operator Library for KG Workflows

DataFlow-KG provides reusable operators that can be flexibly composed into pipelines for graph construction, graph enrichment, reasoning, retrieval, and task-specific graph processing. Operators are not standalone utilities. They are designed to be assembled into end-to-end workflows, enabling scalable and reproducible graph data engineering.

2.2 Unified Support for Multiple KG Paradigms

The library supports a broad range of graph settings in one framework, including general KG, commonsense KG, temporal KG, multimodal KG, hyper-relational KG, Graph RAG, and domain-specific KGs. As an extension of DataFlow, DataFlow-KG follows the same design philosophy of composable operators and pipeline-based processing, making it easy to integrate with broader data preparation workflows.

2.3. Research-to-Application Coverage

The framework is designed for both research scenarios and practical vertical applications, supporting graph processing tasks from foundational KG construction to specialized domain deployment.

3. 🔍 Installation

3.1. Create and activate a Python environment

conda create -n dfkg python=3.10
conda activate dfkg

3.2. Install DataFlow-KG

pip install uv
uv pip install dataflow-kg

If you want to enable local GPU inference, use:

conda create -n dfkg python=3.10
conda activate dfkg

pip install uv
uv pip install dataflow-kg[vllm]

DataFlow-KG supports Python >= 3.10.

3.3. Verify the installation

You can check whether the installation is successful with:

dfkg -v

If the installation is correct and DataFlow-KG is the latest release, you will see something like:

open-dataflow-kg codebase version: 0.9.0
        Checking for updates...
        Local version:  0.9.0
        PyPI newest version:  0.9.0
        You are using the latest version: 0.9.0.

In addition, the dfkg env command can be used to inspect the current hardware and software environment, which is useful for bug reporting:

dfkg env

4. 🚀 Quickstart

DataFlow-KG follows a code generation + custom modification + script execution workflow. In practice, you initialize a project with the CLI, customize the generated pipeline script if needed, and then run the Python file to execute your workflow.

You can get started in three steps.

4.1. Initialize a project

Run the following command in an empty directory:

dfkg init

4.2. Choose a pipeline type

Pipelines with the same name across different folders are usually incremental variants with different dependency requirements:

Directory	Required Resources
`api_pipelines`	CPU + LLM API
`gpu_pipelines`	CPU + API + local GPU

Tip: If you are new to DataFlow-KG, start with api_pipelines. Later, if you have a local GPU, you can replace LLMServing with a local model backend.

4.3. Run your first pipeline

Go into any pipeline directory, for example:

cd api_pipelines

Open one of the generated Python pipeline files. In most cases, you only need to check two configurations:

4.3.1 Input data path

self.storage = FileStorage(
    first_entry_file_name="<path_to_dataset>"
)

By default, this points to the provided example dataset, so you can run it directly. You can also replace it with your own dataset path.

4.3.2 LLM serving configuration

If you are using an API-based serving backend, set the API key first.

Linux / macOS

export DF_API_KEY=sk-xxxxx

Windows CMD

set DF_API_KEY=sk-xxxxx

PowerShell

$env:DF_API_KEY="sk-xxxxx"

Then run the pipeline script:

python xxx_pipeline.py

5. 📚 Licence

DataFlow-KG is released under the Apache License 2.0.

6. 🎓 Citation

If you use DataFlow-KG in your research, please cite:

@misc{dataflowkg2026,
  title={DataFlow-KG: LLM-Driven Knowledge Graph Processing Library},
  author={DataFlow-KG Team},
  year={2026},
  howpublished={\url{https://github.com/OpenDCAI/DataFlow-KG}}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.10.0

May 15, 2026

0.9.4

May 8, 2026

0.9.3

May 8, 2026

0.9.2

May 8, 2026

0.9.1

May 8, 2026

This version

0.9.0

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataflow_kg-0.9.0.tar.gz (380.7 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataflow_kg-0.9.0-py3-none-any.whl (596.4 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file dataflow_kg-0.9.0.tar.gz.

File metadata

Download URL: dataflow_kg-0.9.0.tar.gz
Upload date: May 8, 2026
Size: 380.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dataflow_kg-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`b6c5d1aa58361d106962dc7c9dfe0e723ed16ad058d18ad5ae57adc961be88c5`
MD5	`26e06e02a6435872ba35f452e0e9cc29`
BLAKE2b-256	`25cd0e5527747c2abf401fc6aa54d13dfab421d90672b999ce3fa560d1a6cf27`

See more details on using hashes here.

File details

Details for the file dataflow_kg-0.9.0-py3-none-any.whl.

File metadata

Download URL: dataflow_kg-0.9.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 596.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dataflow_kg-0.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`82f5aa530c7a4cb025ab3e1af72d147d8134461e2e06469f74084077ae9024a3`
MD5	`cfa49bcf086bfeb64f4ed9144fb62751`
BLAKE2b-256	`8db5fc1e9a5b47e05338906248650fdcc897ae52ea5b528e2d8ac2d4e1d871d9`

See more details on using hashes here.

dataflow-kg 0.9.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DataFlow Knowledge Graph

0. News

1. 🤖 Overview

2. ✨ Key Features

2.1. Modular Operator Library for KG Workflows

2.2 Unified Support for Multiple KG Paradigms

2.3. Research-to-Application Coverage

3. 🔍 Installation

3.1. Create and activate a Python environment

3.2. Install DataFlow-KG

3.3. Verify the installation

4. 🚀 Quickstart

4.1. Initialize a project

4.2. Choose a pipeline type

4.3. Run your first pipeline

4.3.1 Input data path

4.3.2 LLM serving configuration

5. 📚 Licence

6. 🎓 Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes