Skip to main content

pycondusco lets you run a function iteratively, passing it the rows of a dataframe or the results of a query.

Project description

PyCondusco

Overview

pycondusco lets you run a function iteratively, passing it the rows of a dataframe or the results of a query.

We call the functions pycondusco runs pipelines, and define a pipeline as a function that accepts a list of parameters and does something based on the values of the parameters.

The most common use case for pycondusco are data pipelines. For data pipelines that primarily run SQL queries, we can template queries with a library (ie. pystache), so that parametrized values are separated from the query logic. We can then render the query with the appropriate values:

import pycondusco
from pycondusco.run_pipeline import run_pipeline
import pystache

json_string = '{"first_name": "First", "last_name":"Last"}'

params = [
    {
        'k1':'v1',
        'k2':'v2',
    },
    {
        'k1':'v1',
        'k2': json_string,
    },
]

def pipeline(params):
    print pystache.render('k1 value is {{k1}}, k2 is {{k2}}',params)

run_pipeline(pipeline,params)

pycondusco provides the following extensions in functionality to the above design pattern:

  • the user can provide a query and each row of results is iteratively passed to the pipeline
  • any JSON-string parameter will be converted to an object before being passed to the pipeline

Functions

function description
run_pipeline(pipeline, parameters) iteratively pass each row of parameters to a pipeline, converting any JSON parameters to objects
run_pipeline_gbq(pipeline, query, project) calls run_pipeline with the results of query executed via bigquery

Installation

pip install pycondusco
export GOOGLE_APPLICATION_CREDENTIALS="<FILE_PATH_TO_CREDENTIALS>"

Features

  • Name-based substitution of query-results including JSON into pipelines, iterating through rows of parameters dataframe:
import pystache
from google.cloud import bigquery
import pycondusco
from pycondusco.run_pipeline_gbq import run_pipeline_gbq

client = bigquery.Client()

def pipeline(params):
    query = """
      SELECT
        {{#list}}
          SUM(CASE WHEN author.name ='{{name}}' THEN 1 ELSE 0 END) as n_{{name_clean}},
        {{/list}}
        repo_name
      FROM `bigquery-public-data.github_repos.sample_commits`
      GROUP BY repo_name
    """

    query_job = client.query(pystache.render(query, params))
    results = query_job.result()  # Waits for job to complete.
    for row in results:
        print(dict(row.items()))


query = """
   SELECT CONCAT('[',
   STRING_AGG(
     CONCAT('{\"name\":\"',name,'\",'
       ,'\"name_clean\":\"', REGEXP_REPLACE(name, r'[^[:alpha:]]', ''),'\"}'
     )
   ),
   ']') as list
   FROM (
     SELECT author.name,
       COUNT(commit) n_commits
     FROM `bigquery-public-data.github_repos.sample_commits`
     GROUP BY 1
     ORDER BY 2 DESC
     LIMIT 10
   )
"""

run_pipeline_gbq(pipeline, client, query)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycondusco-0.1.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

pycondusco-0.1.0-py2-none-any.whl (19.7 kB view details)

Uploaded Python 2

File details

Details for the file pycondusco-0.1.0.tar.gz.

File metadata

  • Download URL: pycondusco-0.1.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/2.7.13

File hashes

Hashes for pycondusco-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a755c6119ec94019be7a238d6018c429fbefbee7e877807f41d0dd5db911d44d
MD5 216a4f2ecf2e30102cb4c2831df40338
BLAKE2b-256 68fdfa8bb5196a5f365693aadedc2af43374c40135fc18234129a6a0833d98f1

See more details on using hashes here.

File details

Details for the file pycondusco-0.1.0-py2-none-any.whl.

File metadata

  • Download URL: pycondusco-0.1.0-py2-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/2.7.13

File hashes

Hashes for pycondusco-0.1.0-py2-none-any.whl
Algorithm Hash digest
SHA256 483a4af16fdbe150ba6f42a9683ff027a06b88e287084ea24b1dce7b4f58d1eb
MD5 9d4284625249b5aafbfadfba37cb7cf1
BLAKE2b-256 9af8c377eb8d0d8165109dfa7836095a7f353df31743e910bfae2ed9066212c8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page