Skip to main content

Python Table Functions for DuckDB

Project description

DuckTables: Python Functions for DuckDB

DuckTables is Python library that provides out of the box functions for the DuckDB PyTables extension.

Installation

pip install ducktables

Authentication

DuckTables interacts with a number of different services, and each has their own authentication and configuration mechanism. Note that you only have to configure the services that you plan on using.

AWS

Follow the auth instructions for Boto3, the library that ducktables uses to interface with AWS apis. Note that if you're already using boto3 or the AWS CLI, your existing authentication configuration will be utilized, and you don't need to do anything.

Github

When starting your DuckDB client (such as the DuckDB cli), set an environment variable named GITHUB_ACCESS_TOKEN. From the command line, this will look like:

GITHUB_ACCESS_TOKEN=<token> duckdb -unsigned

To obtain an API token, log into github.com, and create one here.

OpenAI

To use OpenAI functions you will need a valid API key and organization id. If you're logged in, you can create an API key here, and your organization id can be found here.

To specify these values, you'll need to set two environment variables when running DuckDB: OPENAI_ORG_ID and OPENAI_API_KEY. This may look something like:

OPENAI_ORG_ID=<orid> OPENAI_API_KEY=<apikey> duckdb -unsigned

Google

Google related functions assume you have a json applications credentials file on disk where you're running DuckDB. The user or service account assoicated with the file will need to have the necessary permissions to interact with services associated with any functions you may use.

To specify the path to the credentials file, set an evironment variable named GOOGLE_APPLICATION_CREDENTIALS when running DuckDB. That may look something like:

GOOGLE_APPLICATION_CREDENTIALS=~/.gcloud/some-file.json duckdb -unsigned

Example Queries

These queries assume you've already installed the PyTables extension in a DuckDB session. See the extensions installation guide if you have not.

AWS

Note that all queries will assume to run in the region you have specified in your configuration file. Set the AWS_DEFAULT_REGION to override this behavior.

EC2 Instances

Returns the results of a DescribeInstances API call.

    SELECT * FROM pytable('aws:ec2_instances',
      columns = {
        'instance_id': 'VARCHAR',
        'name': 'VARCHAR',
        'instance_type': 'VARCHAR',
        'state': 'VARCHAR',
        'key_pair': 'VARCHAR',
        'platform': 'VARCHAR',
        'architecture': 'VARCHAR',
        'vpc_id': 'VARCHAR',
        'subnet_id': 'VARCHAR',
        'public_dns': 'VARCHAR',
        'public_ip': 'VARCHAR',
        'private_dns': 'VARCHAR',
        'private_ip': 'VARCHAR'
        });

S3 Buckets

Returns the results of a ListBuckets API call.

SELECT * FROM pytable('aws:s3_buckets', 
  columns={'name': 'VARCHAR', 'creation_date': 'VARCHAR'}
);

S3 Objects

This only includes metadata about the objects themselves, not their contents. See the httpfs DuckDB extension if you're interested in loading data from objects on S3.

Note that the second argument, the prefix path, is optional and can be omitted.

SELECT * FROM pytable('aws:s3_objects', 'bucket-name', 'foo/bar/prefix',
  columns = { 'key': 'VARCHAR', 'last_modified': 'VARCHAR', 'size': 'INT', 'storage_class': 'VARCHAR'}
);

Github

Enumerates all repositories for the named user or organization.

SELECT * FROM pytable('ghub:repos_for', 'duckdb',
  columns = {'repo': 'VARCHAR', 'description': 'VARCHAR', 'language': 'VARCHAR'}
);

Open AI

ChatGPT

Given a prompt and a number of desired responses, will generate a table with the model's response.

SELECT * FROM pytable('chatgpt:prompt', 'Write a limerick poem about how much you love SQL', 2,
  columns = {'index': 'INT','message': 'VARCHAR', 'message_role': 'VARCHAR', 'finish_reason': 'VARCHAR'},
);

Google

Sheets

Pulls data from a Google Sheet. Note that you'll either need user account authorization, or if you use a service account, the account will need read permissions on the sheet in question if it is private.

This example query pulls from a Google provided example spreadsheet. You can see the sheet here. Note that despite this sheet being public, you will still need Google Cloud authorization configured on your local machine.

SELECT *
FROM pytable('google:sheet', '1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms', 'Class Data!A2:F31',
  columns = {
    'name': 'VARCHAR', 'gender': 'VARCHAR', 'class_level': 'VARCHAR',
    'state': 'VARCHAR', 'major': 'VARCHAR', 'extracurricular': 'VARCHAR'}
);

Analytics

This query has three required parameters. The ViewID you wish to query, and the start/end date range for your query. See this post for an explanation of finding your view ID. Start and end dates should be strings in a 'YYYY-MM-DD' format. Note you can alos override the set of Dimensions and Metrics, but this requires understanding how to adjust the columns argument to account for the change in schema that will occur as a result.

    SELECT * FROM pytable('ducktables.google.analytics:metrics', '<view-id>', '<start-date>', '<end-date>',
      columns = {
        -- Dimensions
        'date': 'VARCHAR', 'hour': 'VARCHAR', 'pagePath': 'VARCHAR', 'source': 'VARCHAR', 'medium': 'VARCHAR',
        -- Metrics
        'sessions': 'INT', 'users': 'INT', 'pageviews': 'INT', 'avgSessionDuration': 'FLOAT', 'bounceRate': 'FLOAT'
        });

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ducktables-0.1.3.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

ducktables-0.1.3-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file ducktables-0.1.3.tar.gz.

File metadata

  • Download URL: ducktables-0.1.3.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for ducktables-0.1.3.tar.gz
Algorithm Hash digest
SHA256 291912e4e0c542504e5f0d83876900e44e86628c09be973abc1c3d674268297c
MD5 a8893bd64cfb328dc32e83c6b95ce894
BLAKE2b-256 d5142128e25c60cc33a12ddb8a22ba838bbfa0df8af22a6ca82363dd0dbdc421

See more details on using hashes here.

File details

Details for the file ducktables-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ducktables-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for ducktables-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 23e8eb8d52f90e871eb05a8fd238881ba309b89cde7d181536e084bf09127db4
MD5 ae1d7674a816ec6a2803865025a5cace
BLAKE2b-256 bf705fdc721d7e1b38bd4642a81af777ea3af87b95257a901839df4b7dc6305d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page