Command line compiler for dataforge core projects
Project description
DataForge helps data analysts and engineers build and extend data solutions by leveraging modern software engineering principles.
Understanding DataForge
DataForge enables writing of inline functions using single-column SQL expressions rather than CTEs, procedural scripts, or set-based models.
For an overview of the underlying concepts, check out this introduction blog.
Each function:
- is pure, with no side effects
- returns single column
- is composable with other functions
DataForge software engineering principles:
These principles allow DataForge projects to be easy to modify and extend - even with thousands of integrated pipelines.
Explore the Core CLI or learn more about how Core powers DataForge Cloud.
Requirements
Dataforge Core is a code framework and command line tool to develop transformation functions and compile them into executable Spark SQL.
To run the CLI you will need:
- Java 8 or higher
- Amazon Corretto is a great option
- A PostgreSQL v14+ server with a dedicated empty database
- Check out our friends over at Tembo
- Python version 3.12+
The CLI also includes an integration to run the code in Databricks. To support this you need:
Installation and Quickstart
-
Open a new command line window
-
Validate Java and Python are installed correctly:
> java --version openjdk 21.0.3 2024-04-16 LTS
> python --version Python 3.12.3
-
Install Dataforge by running:
> pip install dataforge-core Collecting dataforge-core... Installing collected packages: dataforge-core Successfully installed dataforge-core...
-
Validate installation:
> dataforge --version dataforge-core 1.0.0
-
Configure connections and credentials to Postgres and optionally Databricks
> dataforge --configure Enter postgres connection string: postgresql://postgres:<postgres-server-url>:5432/postgres Do you want to configure Databricks SQL Warehouse connection (y/n)? y Enter Server hostname: <workspace-url>.cloud.databricks.com Enter HTTP path: /sql/1.0/warehouses/<warehouse-guid> Enter access token: <token-guid> Enter catalog name: <unity_catalog_name> Enter schema name: <schema_in_catalog_name> Connecting to Databricks SQL Warehouse <workspace-url>.cloud.databricks.com Databricks connection validated successfully Profile saved in C:\Users...
-
Navigate to an empty folder and initialize project structure and sample files:
> dataforge --init Initialized project in C:\Users...
-
Deploy dataforge structures to Postgres
> dataforge --seed All objects in schema(s) log,meta in postgres database will be deleted. Do you want to continue (y/n)? y Initializing database.. Database initialized
-
Build sample project
> dataforge --build Validating project path C:\Users... Started import with id 1 Importing project files... <list of files> Files parsed Loading objects... Objects loaded Expressions validated Generated 8 source queries Generated 1 output queries Generated run.sql Import completed successfully
-
Execute in Databricks
> dataforge --run Connecting to Databricks SQL Warehouse <workspace-url>.cloud.databricks.com Executing query Execution completed successfully
Commands
-h, --help | Display this help message and exit |
-v, --version | Display the installed DataForge version |
-c, --configure | Connect to Postgres database and optionally Databricks SQL Warehouse |
-s, --seed | Deploy tables and scripts to postgres database |
-i, --init [Project Path] | Initialize project folder structure with sample code |
-b, --build [Project Path] | Compile code, store results in Postgres, and generate target SQL files |
-r, --run [Project Path] | Run compiled project on Databricks SQL Warehouse |
-p, --profile [Profile Path] | Update path of stored credentials profile file |
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dataforge_core-1.2.0.tar.gz
.
File metadata
- Download URL: dataforge_core-1.2.0.tar.gz
- Upload date:
- Size: 65.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7d4eba83644b575657c57a4e632c47da9af4b088db357954068f759a0634d36 |
|
MD5 | 0ca55d84fb9e7f8f96a6573307a1eff8 |
|
BLAKE2b-256 | f947c4287b5a522f58551f487d800472d37804786579d972b2437b7883f7d31c |
Provenance
The following attestation bundles were made for dataforge_core-1.2.0.tar.gz
:
Publisher:
python-publish.yml
on dataforgelabs/dataforge-core
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
dataforge_core-1.2.0.tar.gz
- Subject digest:
c7d4eba83644b575657c57a4e632c47da9af4b088db357954068f759a0634d36
- Sigstore transparency entry: 148928111
- Sigstore integration time:
- Predicate type:
File details
Details for the file dataforge_core-1.2.0-py3-none-any.whl
.
File metadata
- Download URL: dataforge_core-1.2.0-py3-none-any.whl
- Upload date:
- Size: 68.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 383402d3c1acd7a7084725b7a088ecf309263d08a677dd4f2b04a5e4f151ff20 |
|
MD5 | 71904b9df2adefc36294ec44dabecc6f |
|
BLAKE2b-256 | 2a482d9477ef5e3bca51b07b418ada4250b7737bc961eb3676f97a1b8e51b726 |
Provenance
The following attestation bundles were made for dataforge_core-1.2.0-py3-none-any.whl
:
Publisher:
python-publish.yml
on dataforgelabs/dataforge-core
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
dataforge_core-1.2.0-py3-none-any.whl
- Subject digest:
383402d3c1acd7a7084725b7a088ecf309263d08a677dd4f2b04a5e4f151ff20
- Sigstore transparency entry: 148928112
- Sigstore integration time:
- Predicate type: