Skip to main content

A lakeFS provider package built by Treeverse.

Project description

Apache license Provider test status PyPI version Code of Conduct

lakeFS airflow provider

lakeFS airflow provider enables a smooth integration of lakeFS with airflow's DAGs. "Use the lakeFS provider to create branches, commit objects, wait for files to be written, and more."

For usage example, check out the example DAG

What is lakeFS

lakeFS is an open source layer that delivers resilience and manageability to object-storage based data lakes.

With lakeFS you can build repeatable, atomic and versioned data lake operations - from complex ETL jobs to data science and analytics.

lakeFS supports AWS S3, Azure Blob Storage and Google Cloud Storage as its underlying storage service. It is API compatible with S3, and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc.

For more information see the official lakeFS documentation.

Capabilities

Development Environment for Data

  • Experimentation - try tools, upgrade versions and evaluate code changes in isolation.
  • Reproducibility - go back to any point of time to a consistent version of your data lake.

Continuous Data Integration

  • Ingest new data safely by enforcing best practices - make sure new data sources adhere to your lake’s best practices such as format and schema enforcement, naming convention, etc.
  • Metadata validation - prevent breaking changes from entering the production data environment.

Continuous Data Deployment

  • Instantly revert changes to data - if low quality data is exposed to your consumers, you can revert instantly to a former, consistent and correct snapshot of your data lake.
  • Enforce cross collection consistency - provide to consumers several collections of data that must be synchronized, in one atomic, revertible, action.
  • Prevent data quality issues by enabling
    • Testing of production data before exposing it to users / consumers.
    • Testing of intermediate results in your DAG to avoid cascading quality issues.

Publishing

The repository include GitHub workflow that is trigger on publish event and will build and push the package to PyPI.

Use the following steps to release:

  • Update setup.py with the new package version
  • Update CHANGELOG.md with changes for the new release
  • Use GitHub release, use semver vX.X.X

Community

Stay up to date and get lakeFS support via:

  • Slack (to get help from our team and other users).
  • Twitter (follow for updates and news)
  • YouTube (learn from video tutorials)
  • Contact us (for anything)

More information

Licensing

lakeFS is completely free and open source and licensed under the Apache 2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow-provider-lakefs-0.48.0.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airflow_provider_lakefs-0.48.0-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file airflow-provider-lakefs-0.48.0.tar.gz.

File metadata

  • Download URL: airflow-provider-lakefs-0.48.0.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for airflow-provider-lakefs-0.48.0.tar.gz
Algorithm Hash digest
SHA256 2529c02e09724ef88c96bfc2104fafd463547a43344f03fafc5e67335101dee8
MD5 e07762617c8b31335bda51e14011a296
BLAKE2b-256 6b6c728b3c6789496230b56d86af908d44235f73b6d9ea072617eb8509cd8d74

See more details on using hashes here.

File details

Details for the file airflow_provider_lakefs-0.48.0-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_provider_lakefs-0.48.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b12feff07ec559ab69c32791db636f9137e723dd0c3116989ca5a541a6456a34
MD5 5a41ae0e628503dd1a86637783bf7057
BLAKE2b-256 d29e6b0a2bd63d2f5fe5a25493964c4bdbea441814fc991fcd50379a2b1b9d68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page