Typed, YAML-defined data flows with built-in incremental processing across major SQL warehouses.
Project description
NexusLabData - CORE Library
YAML- and Python-based data projects for extraction, ingestion, transformation, and consumption — with built-in incremental processing across multiple databases and engines.
Status: alpha. This repository is a read-only public mirror of an actively developed internal project. External pull requests are not accepted yet — please open an issue for bugs and feature requests.
What it is
nld-core (NexusLabData core) gives you a unified way to manage a data project — whether it targets
a single database or spans multiple databases and engines. You describe your structures (typed
schemas) and flows (how data is extracted, ingested, transformed, and consumed) in YAML, pick a
connector, and the framework runs them consistently everywhere.
It ships with standards that make the experience smoother across every project:
- Structure templates and field templates — consistent, reusable schema definitions.
- Standard incremental strategies — "process only what changed" works the same way everywhere.
- Execution and incremental standard logging — monitor what ran and where each delta stopped.
Quickstart
# Install with the connector extra you need (PostgreSQL shown here)
pip install "nld-core[postgres]"
Create a project, declare a flow, and run it:
# nld_project.yml
name: my_data_project
version: '0.0.1'
# flows/my_flow.yml
name: my_flow
task: my_project.tasks.MyDataTask
data_connectors:
source: source_connector
target_structure: source.my_table
# my_project/tasks.py
from typing import ClassVar
from nld.flow.incremental.no_increment.logic import NO_INCREMENT_FLOW_INCREMENTAL_LOGIC
from nld.flow.task import DataFlowTask
class MyDataTask(DataFlowTask):
"""Minimal data flow task."""
_INCREMENTAL_LOGIC: ClassVar = NO_INCREMENT_FLOW_INCREMENTAL_LOGIC
init_params = ["source_connector"]
def run_flow(self) -> None:
# Your transformation logic here.
...
# Execute the flow
nld flow execute --name my_flow
Core concepts
Each concept has a detailed guide in the nld-agents marketplace (the nld-core-usage plugin).
| Concept | What it is | Guide |
|---|---|---|
| Flow | A unit of data movement/transformation, defined in YAML and backed by a DataFlowTask (Python) or a SQL definition. Flows declare their connectors, target structure, and predecessors, and the framework orders and runs them. |
nld-core-usage:guide-flows |
| Structure | A typed schema — fields with data types, lengths, and characterisations (primary key, unique, functional key, …). Structures can be deployed to a database and diffed against the live schema. | nld-core-usage:guide-structures |
| Connector | A storage abstraction over a database (which also brings a query engine), an object storage, or a file storage — PostgreSQL, Snowflake, BigQuery, DuckDB, S3, Azure Blob, or the local file system. The same flow runs against any connector. | nld-core-usage:guide-connections |
| Incremental | Strategies (by_key, by_source_tst, no_increment) backed by persisted state and watermarks, so each run propagates only the data that changed at the source. |
nld-core-usage:guide-incremental |
| Execution monitoring | Every flow run and its steps are recorded — status (succeeded / warning / failed), start and end time, the requestor, and the load strategy — to a state backend you can query to see what ran and whether it succeeded. | nld-core-usage:how-to-get-execution-info |
Supported connectors
| Connector | Install extra |
|---|---|
| PostgreSQL | postgres |
| Snowflake | snowflake |
| BigQuery | bigquery |
| DuckDB | duckdb |
| S3 | s3_blob_storage |
| Azure Blob Storage | azure_blob_storage |
| Local File System | built-in |
Install several at once:
pip install "nld-core[postgres,snowflake,bigquery,duckdb]"
CLI
nld flow execute --name <flow_name> # run a flow
nld flow info --name <flow_name> # inspect a flow
nld flow deps --name <flow_name> # flow dependency graph as JSON
nld flow state execution get-state <flow_name> # inspect persisted execution state
nld connection list # list configured connections
nld connection get-structure --connection-name <name> # extract schema from a live database
nld structure info --name <name> # inspect a structure
nld project info # project overview
Requirements
- Python >= 3.12
Build NLD projects with agents
We maintain a Claude Code marketplace of skills that help you scaffold and build a complete NLD data project — data-platform conventions, connectors, flows, and incremental strategies:
- NLD agents marketplace: https://github.com/nexuslab-data-agents/nld-agents
It bundles the standard skills our team uses for the data platform, so an agent can help you go from an empty repo to working flows that follow the NLD conventions.
Where to next
- Issues / feature requests: https://github.com/nexuslab-data/nld-core/issues
License
Apache-2.0. See LICENSE.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nld_core-0.1.1a1-py3-none-any.whl.
File metadata
- Download URL: nld_core-0.1.1a1-py3-none-any.whl
- Upload date:
- Size: 590.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3cd95195e2d68990011b12edf6b727dded4a62ac165e7d94e0511540ae8aed9
|
|
| MD5 |
e7903821341751a5bdeaec4ffdbe8e7e
|
|
| BLAKE2b-256 |
d40e62cb7264fe1521ed081402ed4140dbe7e7e6891df1ea00fd2fa75a01750b
|