Solution for DS Team
Project description
pipelinesds
Pipelinesds is a library that includes functions used in Kubeflow pipelines such as:
pipeliner.py
get_results_from_bq()
Returns SQL results from BigQuery as a DataFrame.
-
Parameters:
bq_client
: BigQuery client.bq_storage_client
: BigQuery Storage client.table
: Name of the table/view to get data from.where_clause
: Optional SQL WHERE clause to filter data.
-
Returns:
pd.DataFrame
: Data from the view/table.
delete_old_data()
Deletes old data from a BigQuery table.
- Parameters:
bq_client
: BigQuery client.table
: Name of the table/view to delete data from.where_clause
: SQL WHERE clause to filter data for deletion.
write_dataframe_to_bq()
Writes a DataFrame to a BigQuery table.
- Parameters:
bq_client
: BigQuery client.df
: DataFrame to write.table_id
: Table in BigQuery to write the DataFrame.write_disposition
: Type of write operation ('WRITE_APPEND', 'WRITE_TRUNCATE', or 'WRITE_EMPTY').job_config
: Configuration for the load job.
read_gcs_file()
Reads a file from a specific path on Google Cloud Storage.
-
Parameters:
gcs_client
: Google Cloud Storage client.bucket_name
: Name of the bucket on GCS where the file is stored.destination_blob_name
: Path in the bucket to read the file.
-
Returns:
object
: The object read from the file.
save_gcs_file()
Saves content to a specific path on Google Cloud Storage.
- Parameters:
gcs_client
: Google Cloud Storage client.bucket_name
: Name of the bucket on GCS where the file will be saved.destination_blob_name
: Path in the bucket to save the file.content
: The content to be saved.content_type
: The MIME type of the content (e.g., 'text/html' or 'application/json').
tester.py
test_data()
Tests data for issues using a test suite.
-
Parameters:
current_data
: Current data to test.reference_data
: Reference data.config_file
: Tests configuration file.stage
: Stage of the pipeline ('test_input' or 'test_output').
-
Returns:
pd.DataFrame
: Test results.
check_data_drift()
Checks data for drift.
-
Parameters:
current_data
: Current data to check.reference_data
: Reference data.config_file
: Tests configuration file.
-
Returns:
pd.DataFrame
: Test results.
send_email_with_table()
Sends an email with an HTML table.
- Parameters:
credentials_frame
: DataFrame with credentials.subject
: Subject of the email.html_table
: Data to send in the email.receiver_email
: Email address to send the email to.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pipelinesds-0.0.2.tar.gz
(5.0 kB
view details)
Built Distribution
File details
Details for the file pipelinesds-0.0.2.tar.gz
.
File metadata
- Download URL: pipelinesds-0.0.2.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.13.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ce637a351946825cd35abfcbe4302288fb021b0d4309b0dbd2581f63c6cd197 |
|
MD5 | 39d2bda9956b782465d04464fbe66960 |
|
BLAKE2b-256 | 05fcded810c3fcf8301bc78b2605ba9b75cd80c0e60618d2ad61c2d8b9ada3c3 |
File details
Details for the file pipelinesds-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: pipelinesds-0.0.2-py3-none-any.whl
- Upload date:
- Size: 4.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.13.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01d75cbe400617c45ff1e4537a9fa7040735e4e45399947b14b2962fbdcbc0c8 |
|
MD5 | 3df89218f1ec4c32dbce93bf29efe88d |
|
BLAKE2b-256 | f853bfb20d2755af1b663aa6623ff064471654df5be72c2011e8c4a6b333507c |