Skip to main content

Convert database tables to parquet tables.

Project description

Library to convert PostgreSQL data to parquet files

This package was created to convert PostgreSQL data to parquet format. This package has four major functions, one for each of three popular data formats, plus an "update" function that only updates if necessary.

  • wrds_pg_to_pq(): Exports a WRDS PostgreSQL table to a parquet file.
  • db_to_pq(): Exports a PostgreSQL table to a parquet file.
  • db_schema_to_pq(): Exports a PostgreSQL schema to parquet files.
  • wrds_update_pq(): A variant on wrds_pg_to_pq() that checks the "last modified" value for the relevant SAS file against that of the local parquet before getting new data from the WRDS PostgreSQL server.

Requirements

1. Python

The software uses Python 3 and depends on Ibis, pyarrow (Python API for Apache Arrow libraries), and Paramiko. These dependencies are installed when you use Pip:

pip install db2pq --upgrade

2. A WRDS ID

To use public-key authentication to access WRDS, follow hints taken from here to set up a public key. Copy that key to the WRDS server from the terminal on your computer. (Note that this code assumes you have a directory .ssh in your home directory. If not, log into WRDS via SSH, then type mkdir ~/.ssh to create this.) Here's code to create the key and send it to WRDS:

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub | ssh $WRDS_ID@wrds-cloud-sshkey.wharton.upenn.edu "cat >> ~/.ssh/authorized_keys"

Use an empty passphrase in setting up the key so that the scripts can run without user intervention.

3. Environment variables

Environment variables that the code uses include:

  • WRDS_ID: Your WRDS ID.
  • DATA_DIR: The local repository for parquet files.

Once can set these environment variables in (say) ~/.zprofile:

export WRDS_ID="iangow"
export DATA_DIR="~/Dropbox/pq_data"

As an alternative to setting these environment variables, they can be passed as values of arguments wrds_id and data_dir, respectively, of the functions above.

Report bugs

Author: Ian Gow, iandgow@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

db2pq-0.0.8.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

db2pq-0.0.8-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file db2pq-0.0.8.tar.gz.

File metadata

  • Download URL: db2pq-0.0.8.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for db2pq-0.0.8.tar.gz
Algorithm Hash digest
SHA256 92fef593d44e64dfd085e995f9870e16cdf4495fa0959d6b1756bc1d60d45519
MD5 e10946706e87127e1681e7ce8563c624
BLAKE2b-256 4a975b76f5b4a4a5cc058725e9fecc7d7fc764b8ea0d5894fd968bf07389621d

See more details on using hashes here.

File details

Details for the file db2pq-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: db2pq-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for db2pq-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 668522d6c3524fc1117b8e19033f8818b211ee92332ab25501764b081a558609
MD5 9a37f615e003235f65b4359d840b23b2
BLAKE2b-256 292b959f26d3846f4352c5ea573f40c365d877bb01833cd9de9ac63c3e7e1dab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page