Skip to main content

A Python interface for managing Parquet tables stored on Google Drive.

Project description

Authentication Setup

  1. Go to https://console.cloud.google.com/
  2. Create a new project
  3. On the left sidebar, APIs & Services -> Library
  4. Search and enable Google Drive API
  5. APIs & Services -> Credentials -> + Create credentials -> OAuth client ID -> Configure consent screen
    • External user type
  6. Go back to ... -> OAuth client ID
  7. Select Desktop App
  8. Download credentials json file
  9. Go to https://console.cloud.google.com/apis/credentials/consent
  10. Add your email under test users

Config dictionary:

config = {
    "folder_id": <string of characters in drive url after final '/' character>,
    "client_config_file": "client_secrets.json"
}

DataGateway Class Documentation

The DataGateway class provides a simple interface to manage tabular data stored as Parquet files on Google Drive. Each table is organized inside its own folder within a specified root folder.


Initialization

DataGateway(config: dict)

  • config (dict): Configuration dictionary containing:
    • folder_id (str): Google Drive folder ID where tables are stored.
    • client_config_file (str): Path to the OAuth client secrets JSON file.

Methods

put(table_name: str, df: pandas.DataFrame, overwrite: bool = False)

Uploads the given DataFrame as a Parquet file inside the folder for the specified table. Also uploads a metadata Parquet file containing the DataFrame’s info.

  • table_name (str): Name of the table.
  • df (pandas.DataFrame): DataFrame to upload.
  • overwrite (bool, optional): Whether to overwrite existing files. Default is False.

Raises FileExistsError if the table exists and overwrite is False.

get(table_name: str) -> pandas.DataFrame

Downloads and returns the DataFrame for the specified table.

  • table_name (str): Name of the table.

Raises FileNotFoundError if the table or data file does not exist.

meta(table_name: str) -> str

Retrieves the metadata string (from the DataFrame’s .info()) for the specified table.

  • table_name (str): Name of the table.

Raises FileNotFoundError if the metadata file does not exist.

list() -> list[str]

Returns a list of all table folder names inside the root folder.

delete(table_name: str)

Deletes the entire folder for the specified table, including all data and metadata files. Prompts the user for confirmation before deleting.

  • table_name (str): Name of the table to delete.

If the folder does not exist, prints a message and aborts the operation.


Notes

  • Each table is stored within its own dedicated folder inside the root folder identified by folder_id.
  • Data and metadata files are stored in Parquet format for efficiency.
  • The metadata file contains the output of the DataFrame .info() method saved as Parquet.
  • Authentication uses OAuth with offline access and token refreshing handled transparently.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datagateway-0.2.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datagateway-0.2.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file datagateway-0.2.0.tar.gz.

File metadata

  • Download URL: datagateway-0.2.0.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for datagateway-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6a929856636477453004580c0854f0e4de330ea4256b25726386d0eeacb75c1d
MD5 6f6540a412856580f7f6ec12e324c532
BLAKE2b-256 d4940a0b0228545d2a27a56945a0f579bf5d1e9b1b52cef69bf78b8e6fe3ddc0

See more details on using hashes here.

File details

Details for the file datagateway-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: datagateway-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for datagateway-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 be1eb6939855b6c4121764a4fa519e8a377a4df0f9b2d559fa4fe0645c99782a
MD5 d924efec2670409e561699cc58069c10
BLAKE2b-256 b9ed5c0ba1268b42c5ad7203bfc1f29a5804b57afcc7279a7491653093e00fa8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page