Convert database tables to parquet tables.
Project description
Library to convert PostgreSQL data to parquet files
This package was created to convert PostgreSQL data to parquet format. This package has four major functions, one for each of three popular data formats, plus an "update" function that only updates if necessary.
wrds_pg_to_pq()
: Exports a WRDS PostgreSQL table to a parquet file.db_to_pq()
: Exports a PostgreSQL table to a parquet file.db_schema_to_pq()
: Exports a PostgreSQL schema to parquet files.wrds_update_pq()
: A variant onwrds_pg_to_pq()
that checks the "last modified" value for the relevant SAS file against that of the local parquet before getting new data from the WRDS PostgreSQL server.
Requirements
1. Python
The software uses Python 3 and depends on Ibis, pyarrow
(Python API for Apache Arrow libraries), and Paramiko.
These dependencies are installed when you use Pip:
pip install db2pq --upgrade
2. A WRDS ID
To use public-key authentication to access WRDS, follow hints taken from here to set up a public key.
Copy that key to the WRDS server from the terminal on your computer.
(Note that this code assumes you have a directory .ssh
in your home directory. If not, log into WRDS via SSH, then type mkdir ~/.ssh
to create this.)
Here's code to create the key and send it to WRDS:
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub | ssh $WRDS_ID@wrds-cloud-sshkey.wharton.upenn.edu "cat >> ~/.ssh/authorized_keys"
Use an empty passphrase in setting up the key so that the scripts can run without user intervention.
3. Environment variables
Environment variables that the code uses include:
WRDS_ID
: Your WRDS ID.DATA_DIR
: The local repository for parquet files.
Once can set these environment variables in (say) ~/.zprofile
:
export WRDS_ID="iangow"
export DATA_DIR="~/Dropbox/pq_data"
As an alternative to setting these environment variables, they can be passed as values of arguments wrds_id
and data_dir
, respectively, of the functions above.
Report bugs
Author: Ian Gow, iandgow@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file db2pq-0.0.8.tar.gz
.
File metadata
- Download URL: db2pq-0.0.8.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92fef593d44e64dfd085e995f9870e16cdf4495fa0959d6b1756bc1d60d45519 |
|
MD5 | e10946706e87127e1681e7ce8563c624 |
|
BLAKE2b-256 | 4a975b76f5b4a4a5cc058725e9fecc7d7fc764b8ea0d5894fd968bf07389621d |
File details
Details for the file db2pq-0.0.8-py3-none-any.whl
.
File metadata
- Download URL: db2pq-0.0.8-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 668522d6c3524fc1117b8e19033f8818b211ee92332ab25501764b081a558609 |
|
MD5 | 9a37f615e003235f65b4359d840b23b2 |
|
BLAKE2b-256 | 292b959f26d3846f4352c5ea573f40c365d877bb01833cd9de9ac63c3e7e1dab |