Skip to main content

Lighthouse for Python: a package facilitating the creation of data pipelines.

Project description

pyhouse

This is a port of Lighthouse, a library written in Scala, that facilitates the creation of data pipelines that are based on Apache Spark. It also comes with some related convenience functions, like integrations to the AWS parameter store.

This port is targeted at Python and PySpark. It is not an exact port of the Scala code: we add what we need as we go along.

Usage

One of this library’s main usages is to build a class-based data catalog, that supports chaining of sources. For example, if you had a dataset in a text file that needed to be transformed (clean, derive some statistic, …) then you could write this as such:

from pyhouse.datalake.file_system_data_link import FileSystemDataLink

link = FileSystemDataLink(
    environment="dev",
    session = get_spark(),
    path = "s3://bucket-foo/file-bar.csv",
    format="csv",
    savemode="errorifexists",
    partitioned_by=("some-key", "another-key"),
    options={"header": True, "sep": "\t"}
)

link.read().groupBy("client").count().show()

The advantage of such data links becomes clear when there are multiple of them that are combined in a module (the “catalog”): there would be one source of truth that many scripts can refer to. Hardcoded paths scattered across scripts would be a thing of the past.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhouse-0.0.13.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

pyhouse-0.0.13-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file pyhouse-0.0.13.tar.gz.

File metadata

  • Download URL: pyhouse-0.0.13.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for pyhouse-0.0.13.tar.gz
Algorithm Hash digest
SHA256 b7d420353dc2877624f924b1deed8786404a84de2abfcf0cb2962cc357b4cbaa
MD5 4575404312a42433232370d274a34d7f
BLAKE2b-256 884656dce04820a12aa99c68caf6e2ede6b8c02995b33e512f6c962ea12b2eb4

See more details on using hashes here.

File details

Details for the file pyhouse-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: pyhouse-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for pyhouse-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 ad737c1809ba38f0eb4fec92c04ef7bd7f7045bd0df541dc7246fe3873d64a8f
MD5 765493ab257e58d50b327184f68fc4c8
BLAKE2b-256 c54acaebc293d14f38a5f6b6d8e1802f154defb483a5c1cc5a587db885859206

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page