A Django plugin for exporting CMS data to Google BigQuery.
Project description
A Django application that provides a convenient way to export data from your Django models to Google BigQuery.
Features
Exports Django model data to BigQuery tables
Processes data in configurable batch sizes to manage memory usage
Handles date/time formats and UUID fields automatically
Allows custom field transformations with a simple decorator
Validates that model fields match BigQuery table schema
Provides retry mechanisms for resilient exports
Supports incremental exports with date filtering
Handles potential exceptions during data export with detailed error reporting
Installation
pip install django-bigquery-exporter
Requirements
Python 3.8+
Django
google-cloud-bigquery
google-api-python-client
Authentication
You need to authenticate with Google Cloud to use BigQuery. There are two main ways:
Using environment variables (recommended for production):
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/credentials.json"
Providing credentials directly in code (useful for development):
exporter = MyExporter( project="your-google-cloud-project-id", credentials="/path/to/your/credentials.json" )
Basic Usage
Create a subclass of BigQueryExporter and define the necessary attributes:
from bigquery_exporter.base import BigQueryExporter, custom_field
class BookExporter(BigQueryExporter):
model = Book
fields = ['id', 'title', 'author', 'publication_date', 'genre', 'rating']
batch = 1000
table_name = 'your_project.your_dataset.books'
replace_nulls_with_empty = False
@custom_field
def genre(self, instance):
"""Custom field to transform the genre into a structured format"""
return {
'code': instance.genre,
'name': instance.get_genre_display()
}
Then, export the data:
exporter = BookExporter() exporter.export()
Available Properties
- model:
Django model to export (required), default: None
- fields:
List of field names to export (required), default: []
- batch:
Number of records to process in each batch, default: 1000
- table_name:
Full BigQuery table name (required), default: ''
- replace_nulls_with_empty:
Whether to replace None values with empty strings, default: False
- include_pull_date:
Whether to include pull date in the export, default: False
- pull_date_field_name:
Name of the field to store the export timestamp, default: 'pull_date'
Available Methods
define_queryset()
Define the queryset to export. Override this method to filter or order your data:
def define_queryset(self):
# Only export books published in the last year
one_year_ago = datetime.date.today() - datetime.timedelta(days=365)
return self.model.objects.filter(publication_date__gte=one_year_ago).order_by('id')
export(pull_date=None, queryset=None)
Export data to BigQuery.
pull_date: Optional timestamp to record when the data was exported (only included if include_pull_date=True)
queryset: Optional queryset to override the default. Useful for backfilling specific data.
# Standard export
exporter = BookExporter()
errors = exporter.export()
# Export with specified pull_date
from datetime import datetime
exporter.export(pull_date=datetime.now())
# Backfilling specific data
historical_queryset = Book.objects.filter(
publication_date__year=2020
).order_by('id')
exporter.export(queryset=historical_queryset)
if errors:
print(f"Encountered {len(errors)} errors during export")
table_has_data(pull_date=None)
Check if the BigQuery table has data. When both pull_date is provided AND include_pull_date is True, it checks for data with that specific pull date. Otherwise, it just checks if the table has any data at all.
exporter = BookExporter()
# Check with explicit pull date (only works if include_pull_date=True)
pull_date = datetime.datetime.now()
if not exporter.table_has_data(pull_date):
exporter.export(pull_date=pull_date)
else:
print("Data already exported for today")
# Check for any data
if not exporter.table_has_data():
exporter.export()
else:
print("Table already has data")
Dependency Injection
Django BigQuery Exporter supports injection of the BigQuery client for better testability and flexibility:
# Injecting a custom BigQuery client
from google.cloud import bigquery
custom_client = bigquery.Client(project='my-project')
exporter = BookExporter(
client=custom_client
)
Custom Fields
Use the @custom_field decorator to create methods that transform data during export:
@custom_field
def full_name(self, instance):
return f"{instance.first_name} {instance.last_name}"
@custom_field
def category_details(self, instance):
# Return complex nested data
return {
'id': instance.category_id,
'name': instance.category.name,
'parent': instance.category.parent.name if instance.category.parent else None
}
Complete Example
Here’s a complete example with a Book model:
import datetime
from bigquery_exporter.base import BigQueryExporter, custom_field
from myapp.models import Book
class BookExporter(BigQueryExporter):
model = Book
batch = 1000
table_name = 'my_project.bookstore.books'
fields = [
'id', 'title', 'author', 'publication_date', 'is_bestseller',
'genre', 'page_count', 'created_at', 'updated_at', 'rating'
]
# Pull date configuration
include_pull_date = True # Include pull date in the export
pull_date_field_name = 'export_date' # Custom field name
def define_queryset(self):
# Only export books updated in the last 30 days
thirty_days_ago = datetime.date.today() - datetime.timedelta(days=30)
return Book.objects.filter(updated_at__gte=thirty_days_ago).order_by('id')
@custom_field
def genre(self, instance):
"""Return both the code and display name for the genre"""
GENRES = {
'SFF': 'Science Fiction & Fantasy',
'MYS': 'Mystery',
'ROM': 'Romance',
# ... other genres
}
return {
'code': instance.genre,
'name': GENRES.get(instance.genre, 'Unknown')
}
@custom_field
def rating(self, instance):
"""Calculate and return the average rating"""
avg_rating = instance.reviews.aggregate(avg=Avg('rating'))['avg'] or 0
return round(avg_rating, 1)
# In a task or management command
def export_books_to_bigquery():
pull_date = datetime.datetime.now()
exporter = BookExporter(
project='my-gcp-project',
credentials='/path/to/credentials.json'
)
# Check if data already exists for today
if exporter.table_has_data(pull_date) and not force_export:
print(f"Data already exists for {pull_date.date()}, skipping export")
return
# Perform the export
errors = exporter.export(pull_date=pull_date)
if errors:
print(f"Export completed with {len(errors)} errors")
else:
print(f"Successfully exported books to BigQuery")
Error Handling
The export() method returns a list of error objects for any failed row insertions. Each error includes:
The row index
The error message
The affected data
You can use this information to log errors or retry specific records.
Best Practices
ALWAYS define an ordering in define_queryset() when using batching - this is critical for consistent results
Set appropriate batch sizes based on your model’s complexity
Use custom fields to preprocess data before export
Implement idempotency checks with table_has_data()
Use the queryset parameter for backfilling historical data rather than modifying your exporter class
Consider using dependency injection for the BigQuery client for better testability
Catch and handle GoogleAPICallError and BigQueryExporterError exceptions
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file django_bigquery_exporter-0.2.4.tar.gz.
File metadata
- Download URL: django_bigquery_exporter-0.2.4.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6b52c19865d27e386527d77e93c4a5cc1f6cdc37d2aef73e01fd5bc02754dc3
|
|
| MD5 |
1d7f464373b69505f6f94524e550301f
|
|
| BLAKE2b-256 |
f1b9c5dff1e12d741283e19d62d26ac2875701e2902e1f1ca28d388632d691db
|
Provenance
The following attestation bundles were made for django_bigquery_exporter-0.2.4.tar.gz:
Publisher:
release.yml on industrydive/django-bigquery-exporter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
django_bigquery_exporter-0.2.4.tar.gz -
Subject digest:
a6b52c19865d27e386527d77e93c4a5cc1f6cdc37d2aef73e01fd5bc02754dc3 - Sigstore transparency entry: 821301857
- Sigstore integration time:
-
Permalink:
industrydive/django-bigquery-exporter@f02b4aabbd0975483fff1152e2343c3e4da5fe00 -
Branch / Tag:
refs/tags/v0.2.4 - Owner: https://github.com/industrydive
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f02b4aabbd0975483fff1152e2343c3e4da5fe00 -
Trigger Event:
release
-
Statement type:
File details
Details for the file django_bigquery_exporter-0.2.4-py3-none-any.whl.
File metadata
- Download URL: django_bigquery_exporter-0.2.4-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a36ccd7a12aee3dce76728cbb986aac853ed0308908bbe0a1a4414281590f51b
|
|
| MD5 |
8be9303e662874ebabbcc8d6a6088665
|
|
| BLAKE2b-256 |
76314ebde781b2e1aa683fae14f16ab6249617b7079b69e9c0631e0223de1774
|
Provenance
The following attestation bundles were made for django_bigquery_exporter-0.2.4-py3-none-any.whl:
Publisher:
release.yml on industrydive/django-bigquery-exporter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
django_bigquery_exporter-0.2.4-py3-none-any.whl -
Subject digest:
a36ccd7a12aee3dce76728cbb986aac853ed0308908bbe0a1a4414281590f51b - Sigstore transparency entry: 821301859
- Sigstore integration time:
-
Permalink:
industrydive/django-bigquery-exporter@f02b4aabbd0975483fff1152e2343c3e4da5fe00 -
Branch / Tag:
refs/tags/v0.2.4 - Owner: https://github.com/industrydive
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f02b4aabbd0975483fff1152e2343c3e4da5fe00 -
Trigger Event:
release
-
Statement type: