Skip to main content

Convert database tables to parquet tables.

Project description

Library to convert PostgreSQL data to parquet files

This package was created to convert PostgreSQL data to parquet format. This package has four major functions, one for each of three popular data formats, plus an "update" function that only updates if necessary.

  • wrds_pg_to_pq(): Exports a WRDS PostgreSQL table to a parquet file.
  • db_to_pq(): Exports a PostgreSQL table to a parquet file.
  • db_schema_to_pq(): Exports a PostgreSQL schema to parquet files.
  • wrds_update_pq(): A variant on wrds_pg_to_pq() that checks the "last modified" value for the relevant SAS file against that of the local parquet before getting new data from the WRDS PostgreSQL server.

Requirements

1. Python

The software uses Python 3 and depends on Ibis, pyarrow (Python API for Apache Arrow libraries), and Paramiko. These dependencies are installed when you use Pip:

pip install db2pq --upgrade

2. A WRDS ID

To use public-key authentication to access WRDS, follow hints taken from here to set up a public key. Copy that key to the WRDS server from the terminal on your computer. (Note that this code assumes you have a directory .ssh in your home directory. If not, log into WRDS via SSH, then type mkdir ~/.ssh to create this.) Here's code to create the key and send it to WRDS:

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub | ssh $WRDS_ID@wrds-cloud-sshkey.wharton.upenn.edu "cat >> ~/.ssh/authorized_keys"

Use an empty passphrase in setting up the key so that the scripts can run without user intervention.

3. Environment variables

Environment variables that the code uses include:

  • WRDS_ID: Your WRDS ID.
  • DATA_DIR: The local repository for parquet files.

Once can set these environment variables in (say) ~/.zprofile:

export WRDS_ID="iangow"
export DATA_DIR="~/Dropbox/pq_data"

As an alternative to setting these environment variables, they can be passed as values of arguments wrds_id and data_dir, respectively, of the functions above.

Report bugs

Author: Ian Gow, iandgow@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

db2pq-0.0.9.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

db2pq-0.0.9-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file db2pq-0.0.9.tar.gz.

File metadata

  • Download URL: db2pq-0.0.9.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for db2pq-0.0.9.tar.gz
Algorithm Hash digest
SHA256 969db21c5d6e50647e90cfb7aa0c46b425f6edba1814d95e9364d48b589491b9
MD5 65457a2b692b5ccd079ad94078adce61
BLAKE2b-256 526a9266f3b3a869b57280609f012b9da0d3337af482b52246aa754eac3f7d2f

See more details on using hashes here.

File details

Details for the file db2pq-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: db2pq-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for db2pq-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 0db10a73f2bfb79a0f29f7b069162f22680d5916265d97a55b69a6cd7c5ebbfc
MD5 a28d1106bae432ee24c53c9b5658caa1
BLAKE2b-256 4ca4e8856863bc3cc0688f1b201885f316a75273f08ccdbc34064111727f98d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page