Fabric BigQuery Data Sync Utility

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Fabric Sync for BigQuery

This project is provided as an accelerator to help mirror or synchronize data from Google BigQuery to Fabric. The primary use cases for this accelerator are:

BigQuery customers who wish to continue to leverage their existing investments and data estate while optimizing their PowerBI experience and reducing overall analytics TCO
BigQuery customers who wish to migrate all or part of their data estate to Microsoft Fabric

Project Overview

For many of our customers, the native mirroring capabilities in Fabric are one of the most exciting features of the platform. While Fabric currently supports a growing number of different mirroring sources, BigQuery is not yet supported. This current gap in capabilities is the foundation of this accelerator.

The goal of this accelerator is to simplify the process of synchronizing data from Google BigQuery to Microsoft Fabric with an emphasis on reliability, performance, and cost optimization. The accelerator is implemented using Spark (PySpark) using many concepts common to an ETL framework. The accelerator is more than just an ETL framework however in that it uses BigQuery metadata to solve for the most optimal way to synchronize data between the two platforms.

Version 2.2.3

Support for:

Support for multi-region configurations
Support for dataset-level materialization configuration
Initial load optimization for Time Ingestion Tables
Logging improvements
- SyncLogger.set_level(SyncLogLevel.DEBUG)
Threading error handling improvements
Experimental Install-Time Configurations:
- GCP Billing Export Configuration

Getting Started

The accelerator includes an automated installer that can set up your Fabric workspace and install all required dependencies automatically. To use the installer:

Download the current version Installer notebook
Import the installer into your Fabric Workspace
Attach the installer to a Lakehouse within the Workspace
Upload your GCP Service Account credential json file to OneLake
Update the configuration parameters:
- loader_name – custom name for the sync operation used in dashboards/reports (ex: HR Data Sync, BQ Sales Transaction Mirror)
- metadata_lakehouse - name of the lakehouse used to drive the Fabric Sync process
- target_lakehouse - name of the lakehouse where your BQ data will be synced to
- gcp_project_id - the GCP billing project id that contains the in-scope dataset
- gcp_dataset_id - the target BQ dataset name/id
- gcp_credential_path - the File API Path to your JSON credential file (Example: /lakehouse/default/Files/my-credential-file.json")
- enable_schemas - flag to enable Fabric lakehouse schemas (Schemas REQUIRED for Mirrored Databases)
- target_type - Fabric LAKEHOUSE or MIRRORED_DATABASE
- create_spark_environment - flag to create a Fabric Spark environment as part of installation
- spark_environment_name - name for Fabric Spark Environment item
Run the installer notebook

The installer performs the following actions:

Creates the Fabric Sync metadata Lakehouse, if it does not exist
Creates the Fabric Sync mirror target (LAKEHOUSE or MIRRORED_DATABASE), if it does not exist
Creates the metadata tables and required metadata
Downloads the correct version of your BQ Spark connector based on your configured spark runtime
Creates an initial default user configuration file based on your config parameters
Creates a Fabric Spark Environment with required libraries, if configured
Installs a fully configured and ready-to-run Fabric-Sync-Notebook into your workspace

Automatic Version Upgrades

The Fabric Sync Accelerator will automatically upgrade itself as new runtime versions are added. If you are using PyPi to load the FabricSync package and allow the latest version of package to be pulled.The accelerator will keep your metastore and configuration up-to-date automatically.

Note that behaviors and defaults for existing configurations do not change. Any updates to default behaviors will only apply to new configurations or when manually changed. Performance optimizations will apply to all configurations. Please see the Release Log for the latest.

Features & Capabilities

Within the accelerator there is an ever-growing set of capabilities that either offer feature parity or enhance & optimize the overall synchronization process. Below is an overview of some of the core capabilities:

Multi-Project/Multi-Dataset/Multi-Region sync support
Support for Fabric Lakehouse and Fabric Mirror Database destinations
Table pattern-match filters to filter (include/exclude) during discovery
Table & Partition expiration based on BigQuery configuration
Synching support for Views & Materialized Views
Support for handling tables with required partition filters
BigQuery connector configuration for alternative billing and materialization targets
Rename BigQuery tables and map to specific Lakehouse targets
Rename or convert data types using table-level column mapping
Shape BigQuery source with an alternate source sql query and/or source predicate
Complex-type (STRUCT/ARRAY) handling/flattening
Support for Delta schema evolution for evolving BigQuery table/view schemas
Override BigQuery native partitioning with a partitioning schema optimized for the Lakehouse (Delta partitioning)
Automatic Lakehouse table maintenance on synced tables
Detailed process telemetry that tracks data movement and pairs with native Delta Time Travel capabilities

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bluewatersql

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.2.11

Aug 12, 2025

2.2.10

Aug 5, 2025

2.2.9

Jul 17, 2025

This version

2.2.8

Jul 1, 2025

2.2.7

Jun 26, 2025

2.2.6

Jun 18, 2025

2.2.5

Jun 17, 2025

2.2.4

Jun 16, 2025

2.2.3

Jun 16, 2025

2.2.2

Jun 11, 2025

2.2.1

Jun 10, 2025

2.2.0

Jun 6, 2025

2.1.17

Apr 23, 2025

2.1.16

Apr 4, 2025

2.1.15

Apr 1, 2025

2.1.14

Mar 27, 2025

2.1.13

Mar 24, 2025

2.1.12

Mar 7, 2025

2.1.11

Mar 6, 2025

2.1.10

Mar 4, 2025

2.1.9

Feb 26, 2025

2.1.8

Feb 25, 2025

2.1.7

Feb 25, 2025

2.1.6

Feb 24, 2025

2.1.5 yanked

Feb 19, 2025

Reason this release was yanked:

Bug with BQ Labels

2.1.4

Feb 11, 2025

2.1.3

Feb 10, 2025

2.1.2

Feb 10, 2025

2.1.1

Feb 7, 2025

2.1.0

Feb 5, 2025

2.0.8

Jan 10, 2025

2.0.7

Jan 7, 2025

2.0.6

Jan 7, 2025

2.0.5

Jan 7, 2025

2.0.4

Jan 1, 2025

2.0.3

Dec 24, 2024

2.0.2

Dec 24, 2024

2.0.1

Dec 24, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fabricsync-2.2.8.tar.gz (133.6 kB view details)

Uploaded Jul 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fabricsync-2.2.8-py3-none-any.whl (144.8 kB view details)

Uploaded Jul 1, 2025 Python 3

File details

Details for the file fabricsync-2.2.8.tar.gz.

File metadata

Download URL: fabricsync-2.2.8.tar.gz
Upload date: Jul 1, 2025
Size: 133.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fabricsync-2.2.8.tar.gz
Algorithm	Hash digest
SHA256	`282334b76df26d0c96052770ea412cc1ba11a08de89e1993981fed4d1ac54e60`
MD5	`2cc9cde82f5cf1fe65932f003f14a28e`
BLAKE2b-256	`728afdf19d0d186d9923f6e3e8270476b4f7a147e699c5ca4e1be7c59cdd03f0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fabricsync-2.2.8.tar.gz:

Publisher: python-publish.yml on microsoft/FabricBQSync

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fabricsync-2.2.8.tar.gz
- Subject digest: 282334b76df26d0c96052770ea412cc1ba11a08de89e1993981fed4d1ac54e60
- Sigstore transparency entry: 257350383
- Sigstore integration time: Jul 1, 2025
Source repository:
- Permalink: microsoft/FabricBQSync@ed130d850d2b87dd1a9d237f8fd5c5c5457e3357
- Branch / Tag: refs/tags/2.2.8
- Owner: https://github.com/microsoft
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@ed130d850d2b87dd1a9d237f8fd5c5c5457e3357
- Trigger Event: release

File details

Details for the file fabricsync-2.2.8-py3-none-any.whl.

File metadata

Download URL: fabricsync-2.2.8-py3-none-any.whl
Upload date: Jul 1, 2025
Size: 144.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fabricsync-2.2.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3dd05ee7a6ee4de67c74b652ac6ce2c6ce0b2d6116c3b4da03cebd08925b5752`
MD5	`18e4ccdef1943a305d4330cd1ba37954`
BLAKE2b-256	`0f46fb862c4c36507368b9e04209fb6e55bbf55d2260b97bd2294d25065953c6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fabricsync-2.2.8-py3-none-any.whl:

Publisher: python-publish.yml on microsoft/FabricBQSync

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fabricsync-2.2.8-py3-none-any.whl
- Subject digest: 3dd05ee7a6ee4de67c74b652ac6ce2c6ce0b2d6116c3b4da03cebd08925b5752
- Sigstore transparency entry: 257350391
- Sigstore integration time: Jul 1, 2025
Source repository:
- Permalink: microsoft/FabricBQSync@ed130d850d2b87dd1a9d237f8fd5c5c5457e3357
- Branch / Tag: refs/tags/2.2.8
- Owner: https://github.com/microsoft
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@ed130d850d2b87dd1a9d237f8fd5c5c5457e3357
- Trigger Event: release

FabricSync 2.2.8

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Fabric Sync for BigQuery

Project Overview

Version 2.2.3

Getting Started

Automatic Version Upgrades

Features & Capabilities

Contributing

Trademarks

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance