Skip to main content

Supercharge BigQuery with BigFunctions

Project description

logo_and_name

Supercharge BigQuery
with BigFunctions

Upgrade your data impact
with 100+ ready-to-use BigQuery Functions
(+ build a catalog of functions)

Website | GitHub



🔍️ 1. What is BigFunctions?

BigFunctions is:

a framework to build a governed catalog of powerful BigQuery functions at YOUR company.

100+ open-source functions to supercharge BigQuery that you can call directly (no install) or redeploy in YOUR catalog.


💡 2. Why BigFunctions?

As a data-analyst
You'll have new powers! (such as loading data from any source or activating your data through reverse ETL).

As an analytics-engineer
You'll feel at home with BigFunctions style which imitates the one of dbt (with a yaml standard and a CLI).
You'll love the idea of getting more things done through SQL.

As a data-engineer
You'll easily build software-engineering best practices through unit testing, cicd, pull request validation, continuous deployment, etc.
You will love avoiding reinventing the wheel by using functions already developed by the community.

As a central data-team player in a large company
You'll be proud of providing a governed catalog of curated functions to your 10000+ employees with mutualized and maintainable effort.

As a security champion
You will enjoy the ability to validate the code of functions before deployment thanks to your git validation workflow, CI Testing, binary authorization, etc.

As an open-source lover
You'll be able to contribute so that a problem solved for you is solved for everyone.


👀 3. Call public BigFunctions without install from your GCP project

All BigFunctions represented by a 'yaml' file in bigfunctions folder of the GitHub repo are automatically deployed in public datasets so that you can call them directly without install from your BigQuery project.

Give it a try! Execute this SQL query from your GCP Project 👀:

select bigfunctions.eu.faker("name", "it_IT")

Explore all available bigfunctions here.


🚀 4. Deploy BigFunctions in your GCP project

You can also deploy any bigfunction in your project! To deploy my_bigfunction defined in bigfunctions/my_bigfunction.yaml file, simply call:

bigfun deploy my_bigfunction

Details about bigfun command line are given below.


💥 5. bigfun CLI

bigfun CLI (command-line-interface) facilitates BigFunctions development, test, deployment, documentation and monitoring.

5.1 Install bigfun 🛠️

Clone the repo and from the repo directory run:

virtualenv venv
. venv/bin/activate
pip install --editable .

5.2 Use bigfun 🔥

$ bigfun --help
Usage: bigfun [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  deploy  Deploy BIGFUNCTION
  doc     Generate, serve and publish documentation
  test    Test BIGFUNCTION

5.3 Deploy you first function 👨‍💻

  1. Make sure the gcloud command is installed on your computer
  2. Activate the application-default account with gcloud auth application-default login. A browser window should open, and you should be prompted to log into your Google account. Once you've done that, bigfun will use your oauth'd credentials to connect to BigQuery through BigQuery python client!
  3. Get or create a DATASET where you have permission to edit data and where the function will be deployed.
  4. The DATASET must belong to a PROJECT in which you have permission to run BigQuery queries.

You now can deploy is_email_valid function with:

bigfun deploy is_email_valid

The first time you run this command it will ask for PROJECT and DATASET.

Your inputs will be written to config.yaml file in current directory so that you won't be asked again (unless you delete the entries in config.yaml). You can also override this config at deploy time: bigfun deploy is_email_valid --project=PROJECT --dataset=DATASET.

Test it with 👀:

select PROJECT.DATASET.is_email_valid('paul.marcombes@unytics.io')

5.4 Deploy you first javascript function which depends on npm packages 👽

To deploy a javascript function which depends on npm packages there are additional requirements in addition to the ones above.

  1. You will need to install each npm package on your machine and bundle it into one file. For that, you need to install nodejs.
  2. The bundled js file will be uploaded into a cloud storage bucket in which you must have write access. The bucket name is asked when you run bigfun deploy. Users of your functions must have read access to the bucket.

You now can deploy render_template function with:

bigfun deploy render_template

Test it with 👀:

select PROJECT.DATASET.render_template('Hello {{ user }}', json '{"user": "James"}')

5.5 Deploy you first remote function ⚡️

To deploy a remote function (e.g. python function), there are additional requirements in addition to the ones of Deploy you first function section.

  1. A Cloud Run service will be deployed to host the code (as seen here). So you must have permissions to deploy a Cloud Run service in your project PROJECT.
  2. gcloud CLI will be used directly to deploy the service (using gcloud run deploy). Then, make sure you are logged in with gcloud by calling: gcloud auth login. A browser window should also open, and you should be prompted to log into your Google account. WARNING: you read correctly: you have to authenticate twice. Once for bigquery python client (to deploy any function including remote as seen above.) and once now to use gcloud (to deploy a Cloud Run service).
  3. A BigQuery Remote Connection will be created to link BigQuery with the Cloud Run service. You then should have permissions to create a remote connection. BigQuery Connection Admin or BigQuery Admin roles have these permissions.
  4. A service account will be automatically created by Google along with the BigQuery Remote Connection. BigQuery will use this service account of the remote connection to invoke the Cloud Run service. You then must have the permission to authorize this service account to invoke the Cloud Run service. This permission is provided in the role roles/run.admin

You now can deploy faker function with:

bigfun deploy faker

Test it with 👀:

select PROJECT.DATASET.faker("name", "it_IT")

👋 6. Contribute

BigFunctions is fully open-source. Any contribution is more than welcome 🤗!


Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigfunctions-0.1.tar.gz (25.5 kB view details)

Uploaded Source

File details

Details for the file bigfunctions-0.1.tar.gz.

File metadata

  • Download URL: bigfunctions-0.1.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.2

File hashes

Hashes for bigfunctions-0.1.tar.gz
Algorithm Hash digest
SHA256 4a3645346c077b5c3c5bf7807fa3ef7f7afe881e21fba40e4258efc29e9d553d
MD5 a43d841143bb56778962e668c510e58e
BLAKE2b-256 8cc8a4a5676ff89cb3705f9f004c4c68930df7b724c7e3eba433d0ec425738c0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page