Skip to main content

Convert database tables to parquet tables.

Project description

Library to convert PostgreSQL data to parquet files

This package was created to convert PostgreSQL data to parquet format. This package has four major functions, one for each of three popular data formats, plus an "update" function that only updates if necessary.

  • wrds_pg_to_pq(): Exports a WRDS PostgreSQL table to a parquet file.
  • db_to_pq(): Exports a PostgreSQL table to a parquet file.
  • db_schema_to_pq(): Exports a PostgreSQL schema to parquet files.
  • wrds_update_pq(): A variant on wrds_pg_to_pq() that checks the "last modified" value for the relevant SAS file against that of the local parquet before getting new data from the WRDS PostgreSQL server.

Requirements

1. Python

The software uses Python 3 and depends on Ibis, pyarrow (Python API for Apache Arrow libraries), and Paramiko. These dependencies are installed when you use Pip:

pip install db2pq --upgrade

2. A WRDS ID

To use public-key authentication to access WRDS, follow hints taken from here to set up a public key. Copy that key to the WRDS server from the terminal on your computer. (Note that this code assumes you have a directory .ssh in your home directory. If not, log into WRDS via SSH, then type mkdir ~/.ssh to create this.) Here's code to create the key and send it to WRDS:

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub | ssh $WRDS_ID@wrds-cloud-sshkey.wharton.upenn.edu "cat >> ~/.ssh/authorized_keys"

Use an empty passphrase in setting up the key so that the scripts can run without user intervention.

3. Environment variables

Environment variables that the code uses include:

  • WRDS_ID: Your WRDS ID.
  • DATA_DIR: The local repository for parquet files.

Once can set these environment variables in (say) ~/.zprofile:

export WRDS_ID="iangow"
export DATA_DIR="~/Dropbox/pq_data"

As an alternative to setting these environment variables, they can be passed as values of arguments wrds_id and data_dir, respectively, of the functions above.

Report bugs

Author: Ian Gow, iandgow@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

db2pq-0.1.6.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

db2pq-0.1.6-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file db2pq-0.1.6.tar.gz.

File metadata

  • Download URL: db2pq-0.1.6.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for db2pq-0.1.6.tar.gz
Algorithm Hash digest
SHA256 78af2632e173bb2c135d1fb827a31485d3d1e73946e143ecdd766c2639b00dde
MD5 b9f9d31d893890437218c5d960ab3960
BLAKE2b-256 13ca4a97a27c657234093910ad30ac3c505750ef29aa164052b21b55896003ec

See more details on using hashes here.

File details

Details for the file db2pq-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: db2pq-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for db2pq-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b5c554234b664115112d42e6215bd37835b127df630643fbf16c5c4aa87672e0
MD5 a6492531253408cf7fcfd0d547603bd9
BLAKE2b-256 b2a0264cafc6bd544933a218bbcd34d5168dd947c3f8cc8139b1315c3d953020

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page