Skip to main content

Typed, YAML-defined data flows with built-in incremental processing across major SQL warehouses.

Project description

NexusLabData - CORE Library

YAML- and Python-based data projects for extraction, ingestion, transformation, and consumption — with built-in incremental processing across multiple databases and engines.

PyPI version Python versions License

Status: alpha. This repository is a read-only public mirror of an actively developed internal project. External pull requests are not accepted yet — please open an issue for bugs and feature requests.

What it is

nld-core (NexusLabData core) gives you a unified way to manage a data project — whether it targets a single database or spans multiple databases and engines. You describe your structures (typed schemas) and flows (how data is extracted, ingested, transformed, and consumed) in YAML, pick a connector, and the framework runs them consistently everywhere.

It ships with standards that make the experience smoother across every project:

  • Structure templates and field templates — consistent, reusable schema definitions.
  • Standard incremental strategies — "process only what changed" works the same way everywhere.
  • Execution and incremental standard logging — monitor what ran and where each delta stopped.

Quickstart

# Install with the connector extra you need (PostgreSQL shown here)
pip install "nld-core[postgres]"

Create a project, declare a flow, and run it:

# nld_project.yml
name: my_data_project
version: '0.0.1'
# flows/my_flow.yml
name: my_flow
task: my_project.tasks.MyDataTask
data_connectors:
  source: source_connector
target_structure: source.my_table
# my_project/tasks.py
from typing import ClassVar

from nld.flow.incremental.no_increment.logic import NO_INCREMENT_FLOW_INCREMENTAL_LOGIC
from nld.flow.task import DataFlowTask


class MyDataTask(DataFlowTask):
    """Minimal data flow task."""

    _INCREMENTAL_LOGIC: ClassVar = NO_INCREMENT_FLOW_INCREMENTAL_LOGIC
    init_params = ["source_connector"]

    def run_flow(self) -> None:
        # Your transformation logic here.
        ...
# Execute the flow
nld flow execute --name my_flow

Core concepts

Each concept has a detailed guide in the nld-agents marketplace (the nld-core-usage plugin).

Concept What it is Guide
Flow A unit of data movement/transformation, defined in YAML and backed by a DataFlowTask (Python) or a SQL definition. Flows declare their connectors, target structure, and predecessors, and the framework orders and runs them. nld-core-usage:guide-flows
Structure A typed schema — fields with data types, lengths, and characterisations (primary key, unique, functional key, …). Structures can be deployed to a database and diffed against the live schema. nld-core-usage:guide-structures
Connector A storage abstraction over a database (which also brings a query engine), an object storage, or a file storage — PostgreSQL, Snowflake, BigQuery, DuckDB, S3, Azure Blob, or the local file system. The same flow runs against any connector. nld-core-usage:guide-connections
Incremental Strategies (by_key, by_source_tst, no_increment) backed by persisted state and watermarks, so each run propagates only the data that changed at the source. nld-core-usage:guide-incremental
Execution monitoring Every flow run and its steps are recorded — status (succeeded / warning / failed), start and end time, the requestor, and the load strategy — to a state backend you can query to see what ran and whether it succeeded. nld-core-usage:how-to-get-execution-info

Supported connectors

Connector Install extra
PostgreSQL postgres
Snowflake snowflake
BigQuery bigquery
DuckDB duckdb
S3 s3_blob_storage
Azure Blob Storage azure_blob_storage
Local File System built-in

Install several at once:

pip install "nld-core[postgres,snowflake,bigquery,duckdb]"

CLI

nld flow execute --name <flow_name>          # run a flow
nld flow info --name <flow_name>             # inspect a flow
nld flow deps --name <flow_name>             # flow dependency graph as JSON
nld flow state execution get-state <flow_name>   # inspect persisted execution state
nld connection list                          # list configured connections
nld connection get-structure --connection-name <name>   # extract schema from a live database
nld structure info --name <name>             # inspect a structure
nld project info                             # project overview

Requirements

  • Python >= 3.12

Build NLD projects with agents

We maintain a Claude Code marketplace of skills that help you scaffold and build a complete NLD data project — data-platform conventions, connectors, flows, and incremental strategies:

It bundles the standard skills our team uses for the data platform, so an agent can help you go from an empty repo to working flows that follow the NLD conventions.

Where to next

License

Apache-2.0. See LICENSE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nld_core-0.1.1a2-py3-none-any.whl (590.1 kB view details)

Uploaded Python 3

File details

Details for the file nld_core-0.1.1a2-py3-none-any.whl.

File metadata

  • Download URL: nld_core-0.1.1a2-py3-none-any.whl
  • Upload date:
  • Size: 590.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for nld_core-0.1.1a2-py3-none-any.whl
Algorithm Hash digest
SHA256 d111b9a33a2ca4cd4b534d3a946c762be7d13f27b1c95f8b5bd231b43ca55dcb
MD5 41f228810634d6fd3d6e628311de7d50
BLAKE2b-256 58c165ae2c1b00e7851370ff58fc60f55e27b69b890679769513fa0fa7032253

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page