Skip to main content

Common code for Python projects involving GCP, Pandas, and Spark.

Project description

Falgueras 🪴

PyPI version

Development framework for Python projects involving GCP, Pandas, and Spark.

The main goal is to accelerate development of data-driven projects by providing a common framework for developers with different backgrounds: software engineers, big data engineers and data scientists.

Installation

pip install falgueras (requieres Python>=3.10)

Set GOOGLE_APPLICATION_CREDENTIALS environment variable to enable GCP services.

Run local Spark applications in Windows from IntelliJ

try fast fail fast learn fast

For local Spark execution in Windows, the following environment variables must be set appropriately:

  • SPARK_HOME; version spark-3.5.2-bin-hadoop3.
  • HADOOP_HOME; same value than SPARK_HOME.
  • JAVA_HOME; recommended Java SDK 11.
  • PATH += %HADOOP_HOME%\bin, %JAVA_HOME%\bin.

%HADOOP_HOME%\bin must contain files winutils.exe and hadoop.dll, download from here.

Additionally, add findspark.init() at the beginning of the script in order to set and add environment variables and dependencies to sys.path.

Connect to BigQuery from Spark

As shown in the spark_session_utils.py, the SparkSession used must include the jar com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.41.1 in order to communicate with BigQuery.

Packages

falgueras.common

Shared code between other packages and utils functions: datetime, json, enums, logging.

falgueras.gcp

The functionalities of various Google Cloud Platform (GCP) services are encapsulated within custom client classes. This approach enhances clarity and promotes better encapsulation.

For instance, Google Cloud Storage (GCS) operations are wrapped in the gcp.GcsClient class, which has an attribute that holds the actual storage.Client object from GCS. Multiple GcsClient instances can share the same storage.Client object.

falgueras.pandas

Pandas related code.

The pandas_repo.py file provides a modular and extensible framework for handling pandas DataFrame operations across various storage systems. Using the PandasRepo abstract base class and PandasRepoProtocol, it standardizes read and write operations while enabling custom implementations for specific backends such as BigQuery (BqPandasRepo). These implementations encapsulate backend-specific logic, allowing users to interact with data sources using a consistent interface.

falgueras.spark

Spark related code.

In the same way than the pandas_repo.py file, the spark_repo.py file provides a modular and extensible framework for handling Spark DataFrame operations across various storage systems. Using the SparkRepo abstract base class and SparkRepoProtocol, it standardizes read and write operations while enabling custom implementations for specific backends such as BigQuery (BqSparkRepo). These implementations encapsulate backend-specific logic, allowing users to interact with data sources using a consistent interface.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

falgueras-1.0.0.tar.gz (282.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

falgueras-1.0.0-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file falgueras-1.0.0.tar.gz.

File metadata

  • Download URL: falgueras-1.0.0.tar.gz
  • Upload date:
  • Size: 282.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for falgueras-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f1994ced4999cad2579b5f9965dee72be2d9cfdf4553fd9d58fc0cfeb4cb8e84
MD5 64d218daa60faf2d2b532fd23ad87f49
BLAKE2b-256 bced2cbf6f0603b904aa997864043cd05d4eb1724b9f1bf115c5717a7e633dc4

See more details on using hashes here.

File details

Details for the file falgueras-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: falgueras-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for falgueras-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 615f423d522350d2b3719f25e37348b65739923021889786b1e62c62283812d8
MD5 8803045d82c74ccd6120cca4a278b1f0
BLAKE2b-256 099dfb89e660ce62b716b0f62563d94f1b4fc322846e2abce5c76dc39cb3e4bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page