Work with BigQuery data as you do with lists in python
Project description
Introduction
This project serves as a BigQuery helper that allows you to query data from BigQuery, without worrying about memory limitation concerns, it makes working with BigQuery data as easy as working with lists in python
Installation :
python3.8 -m pip install bq-iterate
Usage
from bq_iterate import BqQueryRowIterator, batchify_iterator
query = "select * from <project_id>.<dataset_id>.<table_id>"
row_itrator = BqQueryRowIterator(query=query, batch_size=2000000) # choose a batch_size that will fit into your memory
batches = batchify_iterator(row_itrator, batch_slice=50000) # choose a batch_slice that will fit into your memory
data = []
for batch in batches:
# do your batch processing here
data.append(len(batch))
print(sum(data))
What happens behind the scenes :
bq_iterate provide two functionalities :
-
2 classes BqQueryIterator and BqTableRowIterator, they behave like an iterator, where they hold only <batch_size> elements in memory and when you want to access the element <batch_size + 1> the iterator calls in memory the next batch_size + 1 elements
-
A function batchify_iterator, what this function does, it takes an iterator and yields slices of it, the <batch_slice> can be bigger than the <batch_size> even if by common sens it's supposed to be smaller, it doesn't matter, since batchify_iterator will create in memory at each batch it yields, a list of <batch_slice> elements, once the batch consumed it freed from memory, since it's a generator
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for bq_iterate-0.1.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 113a3c86b700ac535718ef4a3fdf4b6b34cbff874936c8108fa5ee6f74e0cea8 |
|
MD5 | 85a70cb9d9a4cdc748f01fcd1de05493 |
|
BLAKE2b-256 | f73209bf44d2e4916ee7d3899cbc11d9a8dd6e88544ea519adce78a0d38771a7 |