Skip to main content

ADF-style data ingestion for Databricks — pick a source (Volume, ADLS, S3, Database, REST API), fill a few fields, run

Project description

DashIngest — Databricks Library

CI PyPI License

Part of the Dashlibs suite — Databricks libraries built for business users.

ADF-style data ingestion: pick a source kind, fill a few plain fields — no hand-written abfss:// URIs or JDBC connection strings — and run.

Installation

%pip install dash-ingest

Quick Start

import dashingest
dashingest.launch()   # Opens interactive UI in your Databricks notebook

Or drive it directly from code:

from dashingest import ADLSSource, IngestTarget, run_ingestion

source = ADLSSource(storage_account="myacct", container="raw", path="sales/2024.csv")
target = IngestTarget(table="main.bronze.sales", write_mode="merge", merge_keys=["order_id"])
result = run_ingestion(source, target)
result.display()

Sources

Kind What you provide
Databricks Volume catalog, schema, volume, path
ADLS Gen2 storage account, container, path
Amazon S3 bucket, path
DBFS path
Database (JDBC) engine (postgres/mysql/sqlserver/oracle/snowflake), host, database, table or query
REST API URL, optional JSON path to the records

File format (csv/json/parquet/excel/avro/orc/text) is inferred from the path's extension if not set explicitly — most ingestions need zero format options.

File format readers

Each format has its own options dataclass with real per-format defaults — not a generic options dict. Excel gets the most coverage, since vanilla Spark has no native Excel reader and a raw file path alone doesn't tell it which sheet to read, where the header starts, or whether the workbook is password-protected:

from dashingest import ExcelReaderOptions, VolumeSource

source = VolumeSource(
    catalog="main", schema_name="bronze", volume="landing",
    path="regional_sales.xlsx",
    reader_options=ExcelReaderOptions(
        sheet_name="Q1 Actuals",
        header_row=2,              # skips two title/banner rows above the header
        workbook_password="secret",  # optional
    ),
)

Set sheet_names=["Jan", "Feb", "Mar"] instead of sheet_name to read and stack several same-shaped sheets into one DataFrame — the common "one tab per month" spreadsheet layout.

CsvReaderOptions (delimiter, quote/escape chars, encoding, null markers, date/timestamp formats, parse mode), JsonReaderOptions, ParquetReaderOptions/OrcReaderOptions (schema merging), and TextReaderOptions are also available — pass any of them via reader_options= on a source.

Write modes

append · overwrite · merge (upsert into Delta by merge_keys, with schema evolution where the runtime supports it).

Test Connection & Preview

Both the UI and the API let you check a source before committing to a full run — the same pattern ADF's linked-service "Test Connection" and dataset "Data preview" use:

from dashingest import test_connection, preview

test_connection(source).display()   # reachability/credentials check, no data read
preview(source, limit=10)            # pandas DataFrame of the first N rows

test_connection runs a lightweight check per source kind: SELECT 1 for databases, an HTTP request for REST APIs, a filesystem existence check for Volumes/ADLS/S3/DBFS (no dbutils needed — it uses Spark's Hadoop filesystem API directly, so it works the same way across all of them).

Advanced database & REST options

DatabaseSource supports SSL, JDBC fetch size, parallel reads (split a large table by partition_column across num_partitions), and a raw connection_properties escape hatch:

from dashingest import DatabaseSource

source = DatabaseSource(
    engine="postgresql", host="db.internal", database="analytics",
    table="events", user="svc", password="...",
    ssl=True, num_partitions=8, partition_column="id",
    lower_bound=0, upper_bound=10_000_000,
)

RestApiSource supports auth (bearer / api_key / basic) and pagination (page_param or cursor-based, up to max_pages):

from dashingest import RestApiSource

source = RestApiSource(
    url="https://api.example.com/records",
    auth_type="bearer", bearer_token="...",
    pagination="cursor", cursor_json_path="meta.next_cursor", max_pages=50,
)

Part of Dashlibs

Library Purpose
dash-dq Data Quality
dash-synthetic Synthetic Data Generation
dash-ml ML Lifecycle Management
dash-ingest Data Ingestion
dash-gov Data Governance
dash-ontology Ontology & Lineage for AI

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dash_ingest-0.1.1.tar.gz (73.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dash_ingest-0.1.1-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file dash_ingest-0.1.1.tar.gz.

File metadata

  • Download URL: dash_ingest-0.1.1.tar.gz
  • Upload date:
  • Size: 73.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dash_ingest-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f4b20e41c3e20d335714ef6aa860aa438e61b182ff59afd3b412357bfc070c64
MD5 e52b53dfa77821247bca025108cd05c4
BLAKE2b-256 35eec8fc916ccb18d76ec109acafb720383d7b38b9b2e59180262bc5389585a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for dash_ingest-0.1.1.tar.gz:

Publisher: release.yml on dash-libs/dash-ingest

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dash_ingest-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dash_ingest-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dash_ingest-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 86fee88841f43acf835a2416b58f7acb3711e76665635e941fc9d24c456f5dce
MD5 38ca71d59964bce55e1c4f83e91a948f
BLAKE2b-256 fae3ace27e75d0830768fc5f65bbfe2bc6be1ffe8455dfb5f6ec7b7e38b31e95

See more details on using hashes here.

Provenance

The following attestation bundles were made for dash_ingest-0.1.1-py3-none-any.whl:

Publisher: release.yml on dash-libs/dash-ingest

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page