Skip to main content

Command line compiler for dataforge core projects

Project description

DataForge Core-Light DataForge Core-Dark

DataForge helps data analysts and engineers build and extend data solutions by leveraging modern software engineering principles.

OSSRank Publish & Test

Understanding DataForge

DataForge enables writing of inline functions using single-column SQL expressions rather than CTEs, procedural scripts, or set-based models.

For an overview of the underlying concepts, check out this introduction blog.

Each function:

  • is pure, with no side effects
  • returns single column
  • is composable with other functions

DataForge software engineering principles:

These principles allow DataForge projects to be easy to modify and extend - even with thousands of integrated pipelines.

Explore the Core CLI or learn more about how Core powers DataForge Cloud.

Requirements

Dataforge Core is a code framework and command line tool to develop transformation functions and compile them into executable Spark SQL.

To run the CLI you will need:

  • Java 8 or higher
  • A PostgreSQL v14+ server with a dedicated empty database
    • Check out our friends over at Tembo
  • Python version 3.12+

The CLI also includes an integration to run the code in Databricks. To support this you need:

Installation and Quickstart

  • Open a new command line window

  • Validate Java and Python are installed correctly:

    > java --version
    openjdk 21.0.3 2024-04-16 LTS
    
    > python --version
    Python 3.12.3
    
  • Install Dataforge by running:

    > pip install dataforge-core
    Collecting dataforge-core...
    Installing collected packages: dataforge-core
    Successfully installed dataforge-core...
    
  • Validate installation:

    > dataforge --version
    dataforge-core 1.0.0
    
  • Configure connections and credentials to Postgres and optionally Databricks

    > dataforge --configure
    Enter postgres connection string: postgresql://postgres:<postgres-server-url>:5432/postgres
    Do you want to configure Databricks SQL Warehouse connection (y/n)? y
    Enter Server hostname: <workspace-url>.cloud.databricks.com
    Enter HTTP path: /sql/1.0/warehouses/<warehouse-guid>
    Enter access token: <token-guid>
    Enter catalog name: <unity_catalog_name>
    Enter schema name: <schema_in_catalog_name>
    Connecting to Databricks SQL Warehouse <workspace-url>.cloud.databricks.com
    Databricks connection validated successfully
    Profile saved in C:\Users...
    
  • Navigate to an empty folder and initialize project structure and sample files:

    > dataforge --init
    Initialized project in C:\Users...
    
  • Deploy dataforge structures to Postgres

    > dataforge --seed
    All objects in schema(s) log,meta in postgres database will be deleted. Do you want to continue (y/n)? y
    Initializing database..
    Database initialized
    
  • Build sample project

    > dataforge --build
    Validating project path C:\Users...
    Started import with id 1
    Importing project files...
    <list of files>
    Files parsed
    Loading objects...
    Objects loaded
    Expressions validated
    Generated 8 source queries
    Generated 1 output queries
    Generated run.sql
    Import completed successfully
    
  • Execute in Databricks

    > dataforge --run
    Connecting to Databricks SQL Warehouse <workspace-url>.cloud.databricks.com
    Executing query
    Execution completed successfully
    

Commands

-h, --helpDisplay this help message and exit
-v, --versionDisplay the installed DataForge version
-c, --configureConnect to Postgres database and optionally Databricks SQL Warehouse
-s, --seedDeploy tables and scripts to postgres database
-i, --init [Project Path]Initialize project folder structure with sample code
-b, --build [Project Path]Compile code, store results in Postgres, and generate target SQL files
-r, --run [Project Path]Run compiled project on Databricks SQL Warehouse
-p, --profile [Profile Path]Update path of stored credentials profile file

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataforge_core-1.5.2.tar.gz (71.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataforge_core-1.5.2-py3-none-any.whl (75.0 kB view details)

Uploaded Python 3

File details

Details for the file dataforge_core-1.5.2.tar.gz.

File metadata

  • Download URL: dataforge_core-1.5.2.tar.gz
  • Upload date:
  • Size: 71.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataforge_core-1.5.2.tar.gz
Algorithm Hash digest
SHA256 30dc1a77b5802a9eae25bbf029caa7782940bc3d6bc7e19a0c80165f50d69341
MD5 3ee2dd8cc1e448886c3a1c6609bdb938
BLAKE2b-256 21669044bba5fba30c78f28cc19b894d0a04fedee0ab14861790cc90f4415565

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataforge_core-1.5.2.tar.gz:

Publisher: python-publish.yml on dataforgelabs/dataforge-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataforge_core-1.5.2-py3-none-any.whl.

File metadata

  • Download URL: dataforge_core-1.5.2-py3-none-any.whl
  • Upload date:
  • Size: 75.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataforge_core-1.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ce01d6cdd2e1ef613f89fca0dfed9b9cff961d50de9ca7754c36e3197903e7d7
MD5 cfbe76b1e238f57b2b3f1960111443b9
BLAKE2b-256 637d2f85f289d37f834277e2999e3053f52bccee7b5e1f06557beeb8e91756f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataforge_core-1.5.2-py3-none-any.whl:

Publisher: python-publish.yml on dataforgelabs/dataforge-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page