Convert WRDS SAS data to PostgreSQL, parquet, or CSV.
Project description
Library to convert WRDS SAS data
This package was created to convert WRDS SAS data to modern data formats. This package has three major functions, one for each of three popular data formats.
wrds_update(): Imports WRDS SAS data into a PostgreSQL database.wrds_update_pq(): Converts WRDS SAS data to parquet files.wrds_update_csv(): Converts WRDS SAS data to gzipped CSV files.
This package was primarily designed to handle WRDS data, but some support is provided for importing a local SAS file (*.sas7dbat) into a PostgreSQL database.
Functions prefixed with wrds_ are designed to pull data from WRDS via SSH.
For local SAS datasets, use functions that accept fpath (e.g., sas_to_pandas() and related helpers).
Requirements
1. Python
The software uses Python 3 and depends on SQLAlchemy and Paramiko. Some helper functions return Pandas DataFrames; Pandas is optional unless those functions are used.
2. A WRDS ID
To access WRDS non-interactively (e.g., from Python scripts), you must use SSH public-key authentication.
WRDS provides a dedicated SSH endpoint for key-based authentication:
wrds-cloud-sshkey.wharton.upenn.edu
Step 1: Generate a modern SSH key (recommended)
WRDS supports modern SSH key types. We recommend ed25519:
ssh-keygen -t ed25519 -C "your_wrds_id@wrds"
Accept the default location (~/.ssh/id_ed25519).
You may use a passphrase if your SSH agent is running. For unattended jobs (cron / CI), an empty passphrase may be required.
Step 2: Install the public key on WRDS
Copy your public key to the WRDS SSH-key host:
cat ~/.ssh/id_ed25519.pub | \
ssh your_wrds_id@wrds-cloud-sshkey.wharton.upenn.edu \
"mkdir -p ~/.ssh && chmod 700 ~/.ssh && \
cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"
If ~/.ssh does not exist on WRDS, the command above will create it.
Step 3: (Recommended) Configure SSH
Add an entry to ~/.ssh/config:
Host wrds
HostName wrds-cloud-sshkey.wharton.upenn.edu
User your_wrds_id
IdentityFile ~/.ssh/id_ed25519
IdentitiesOnly yes
You can now connect with:
ssh wrds
This configuration is also used automatically by paramiko, enabling
password-less access from Python.
Troubleshooting
If SSH still prompts for a password, run:
ssh -vvv wrds
and confirm that publickey appears in the list of authentication methods.
wrds2pg uses paramiko to execute SAS code on WRDS via SSH.
Password-based authentication will not work in unattended scripts.
3. PostgreSQL
For the wrds_update() function, you should have write access to a PostgreSQL database to store the data.
4. Environment variables
Environment variables that the code can use include:
PGDATABASE: The name of the PostgreSQL database you use.PGUSER: Your username on the PostgreSQL database.PGHOST: Where the PostgreSQL database is to be found (this will belocalhostif it's on the same machine as you're running the code on)WRDS_ID: Your WRDS ID.DATA_DIR: The local repository for parquet files.CSV_DIR: The local repository for compressed CSV files.
You can set these environment variables in (say) ~/.zprofile:
export PGHOST="localhost"
export PGDATABASE="crsp"
export WRDS_ID="iangow"
export PGUSER="igow"
Using wrds_update().
Two arguments table_name and schema are required.
1. WRDS Settings
Set WRDS_ID using either wrds_id=your_wrds_id in the function call or the environment variable WRDS_ID.
2. Environment variables
The wrds_update() function will use the environment variables PGHOST, PGDATABASE, and PGUSER if you have set them.
Otherwise, you need to provide values as arguments to wrds_udpate().
The default for PGPORT is 5432.
3. Table settings
To tailor your request, specify the following arguments:
fix_missing: set toTrueto fix missing values. This addresses special missing values, which SAS'sPROC EXPORTdumps as strings. The default isFalse.fix_cr: set toTrueto fix characters. Default value isFalse.drop: specify columns to be dropped using SAS syntax (e.g.,drop="id name"will drop columnsidandname).obs: specify the maximum number of observations to download (e.g.,obs=10will import the first 10 rows from the table on WRDS).rename: rename columns (e.g.,rename="fee=mngt_fee"renamesfeetomngt_fee).force: set toTrueto force update. Default value isFalse.
Importing local SAS data into PostgreSQL
The software can also upload a local SAS file to PostgreSQL.
You need to have local SAS in order to use this function.
Use fpath to specify the path to the file to be imported.
Examples
This software is available from PyPI.
To install of wrds2pg from there:
pip3 install wrds2pg
To install the development version wrds2pg from Github:
sudo -H pip3 install git+https://github.com/iangow/wrds2pg --upgrade
Example usage:
from wrds2pg import wrds_update
# 1. Download crsp.mcti from wrds and upload to pg as crps.mcti
# Simplest version
wrds_update(table_name="mcti", schema="crsp")
# Tailored arguments
wrds_update(table_name="mcti", schema="crsp", host=your_pghost,
dbname=your_pg_database,
fix_missing=True, fix_cr=True, drop="b30ret b30ind", obs=10,
rename="caldt=calendar_date", force=True)
Report bugs
Author: Ian Gow, iandgow@gmail.com Contributors: Jingyu Zhang, jingyu.zhang@chicagobooth.edu, Evan Jo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wrds2pg-1.0.37.tar.gz.
File metadata
- Download URL: wrds2pg-1.0.37.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1962f107db26370887b8f67dfaf0588f5eb7d13f298327b87bc5325a129126e
|
|
| MD5 |
fc1cddc0b4ac0a654e6008c8d3e2c133
|
|
| BLAKE2b-256 |
653017995610c58073f248b0842b034a9053c5ab3f53653c12534abfe9d61414
|
File details
Details for the file wrds2pg-1.0.37-py3-none-any.whl.
File metadata
- Download URL: wrds2pg-1.0.37-py3-none-any.whl
- Upload date:
- Size: 23.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5391d81b19cefc3fa33cd1597a7c7be7ffe9632159e9232e17312a16b4acae9
|
|
| MD5 |
675c4612d2d763d6f1ccb47fce677681
|
|
| BLAKE2b-256 |
ddedb3c422e7c44c92ba11c80043f45076edc1e609ff1ddffc18b78f8b69cb54
|