A Python interface for managing Parquet tables stored on Google Drive.
Project description
Authentication Setup
- Go to https://console.cloud.google.com/
- Create a new project
- On the left sidebar, APIs & Services -> Library
- Search and enable
Google Drive API - APIs & Services -> Credentials -> + Create credentials -> OAuth client ID -> Configure consent screen
- External user type
- Go back to ... -> OAuth client ID
- Select Desktop App
- Download credentials json file
- Go to https://console.cloud.google.com/apis/credentials/consent
- Add your email under test users
Config dictionary:
config = {
"folder_id": <string of characters in drive url after final '/' character>,
"client_config_file": "client_secrets.json"
}
DataGateway Class Documentation
The DataGateway class provides a simple interface to manage tabular data stored as Parquet files on Google Drive. Each table is organized inside its own folder within a specified root folder.
Initialization
DataGateway(config: dict)
- config (
dict): Configuration dictionary containing:folder_id(str): Google Drive folder ID where tables are stored.client_config_file(str): Path to the OAuth client secrets JSON file.
Methods
put(table_name: str, df: pandas.DataFrame, overwrite: bool = False)
Uploads the given DataFrame as a Parquet file inside the folder for the specified table. Also uploads a metadata Parquet file containing the DataFrame’s info.
table_name(str): Name of the table.df(pandas.DataFrame): DataFrame to upload.overwrite(bool, optional): Whether to overwrite existing files. Default isFalse.
Raises FileExistsError if the table exists and overwrite is False.
get(table_name: str) -> pandas.DataFrame
Downloads and returns the DataFrame for the specified table.
table_name(str): Name of the table.
Raises FileNotFoundError if the table or data file does not exist.
meta(table_name: str) -> str
Retrieves the metadata string (from the DataFrame’s .info()) for the specified table.
table_name(str): Name of the table.
Raises FileNotFoundError if the metadata file does not exist.
list() -> list[str]
Returns a list of all table folder names inside the root folder.
delete(table_name: str)
Deletes the entire folder for the specified table, including all data and metadata files. Prompts the user for confirmation before deleting.
table_name(str): Name of the table to delete.
If the folder does not exist, prints a message and aborts the operation.
Notes
- Each table is stored within its own dedicated folder inside the root folder identified by
folder_id. - Data and metadata files are stored in Parquet format for efficiency.
- The metadata file contains the output of the DataFrame
.info()method saved as Parquet. - Authentication uses OAuth with offline access and token refreshing handled transparently.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datagateway-0.2.0.tar.gz.
File metadata
- Download URL: datagateway-0.2.0.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a929856636477453004580c0854f0e4de330ea4256b25726386d0eeacb75c1d
|
|
| MD5 |
6f6540a412856580f7f6ec12e324c532
|
|
| BLAKE2b-256 |
d4940a0b0228545d2a27a56945a0f579bf5d1e9b1b52cef69bf78b8e6fe3ddc0
|
File details
Details for the file datagateway-0.2.0-py3-none-any.whl.
File metadata
- Download URL: datagateway-0.2.0-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be1eb6939855b6c4121764a4fa519e8a377a4df0f9b2d559fa4fe0645c99782a
|
|
| MD5 |
d924efec2670409e561699cc58069c10
|
|
| BLAKE2b-256 |
b9ed5c0ba1268b42c5ad7203bfc1f29a5804b57afcc7279a7491653093e00fa8
|