Simple, high-speed batch data reader & preprocessor for ML applications.
Faucet ML is a Python package that enables high speed mini-batch data reading & preprocessing from BigQuery for machine learning model training.
Faucet ML is designed for cases where:
- Datasets are too large to fit into memory
- Model training requires mini-batches of data (SGD based algorithms)
- High speed batch data reading from BigQuery
- Automatic feature identification and preprocessing via. PyTorch
- Integration with Feast feature store (coming soon)
pip install faucetml
More about Faucet
Many training datasets are too large to fit in memory, but model training would benefit from using all of the training data. Naively issuing 1 query per mini-batch of data is unnecessarily expensive due round-trip network costs. Faucet is a library that solves these issues by:
- Fetching large "chunks" of data in non-blocking background threads
- where chunks are much larger than mini-batches, but still fit in memory
- Caching chunks locally
- Returning mini-batches from cached chunks in O(1) time
See examples for detailed ipython notebook examples on how to use Faucet.
# initialize the client fml = get_client( datastore="bigquery", credential_path="bq_creds.json", table_name="my_training_table", ds="2020-01-20", epochs=2, batch_size=1024 chunk_size=1024 * 10000, test_split_percent=20, )
# train & test for epoch in range(2): # training loop fml.prep_for_epoch() batch = fml.get_batch() while batch is not None: train(batch) batch = fml.get_batch() # evaluation loop fml.prep_for_eval() batch = fml.get_batch(eval=True) while batch is not None: test(batch) batch = fml.get_batch(eval=True)
- <input type="checkbox" disabled="" /> Support more data warehouses (redshift, hive, etc.)
- <input type="checkbox" disabled="" /> Support reading features & preprocessing specs from Feast
Suggestions for other features? Open an issue and let us know.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size faucetml-0.0.3.tar.gz (14.7 kB)||File type Source||Python version None||Upload date||Hashes View|