Skip to main content

Convert WRDS SAS data to PostgreSQL, parquet, or CSV.

Project description

Library to convert WRDS SAS data

This package was created to convert WRDS SAS data to modern data formats. This package has three major functions, one for each of three popular data formats.

  • wrds_update(): Imports WRDS SAS data into a PostgreSQL database.
  • wrds_update_pq(): Converts WRDS SAS data to parquet files.
  • wrds_update_csv(): Converts WRDS SAS data to gzipped CSV files.

This package was primarily designed to handle WRDS data, but some support is provided for importing a local SAS file (*.sas7dbat) into a PostgreSQL database. Functions prefixed with wrds_ are designed to pull data from WRDS via SSH. For local SAS datasets, use functions that accept fpath (e.g., sas_to_pandas() and related helpers).

Requirements

1. Python

The software uses Python 3 and depends on SQLAlchemy and Paramiko. Some helper functions return Pandas DataFrames; Pandas is optional unless those functions are used.

2. A WRDS ID

To access WRDS non-interactively (e.g., from Python scripts), you must use SSH public-key authentication.

WRDS provides a dedicated SSH endpoint for key-based authentication:

wrds-cloud-sshkey.wharton.upenn.edu

Step 1: Generate a modern SSH key (recommended)

WRDS supports modern SSH key types. We recommend ed25519:

ssh-keygen -t ed25519 -C "your_wrds_id@wrds"

Accept the default location (~/.ssh/id_ed25519).

You may use a passphrase if your SSH agent is running. For unattended jobs (cron / CI), an empty passphrase may be required.

Step 2: Install the public key on WRDS

Copy your public key to the WRDS SSH-key host:

cat ~/.ssh/id_ed25519.pub | \
ssh your_wrds_id@wrds-cloud-sshkey.wharton.upenn.edu \
  "mkdir -p ~/.ssh && chmod 700 ~/.ssh && \
   cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"

If ~/.ssh does not exist on WRDS, the command above will create it.

Step 3: (Recommended) Configure SSH

Add an entry to ~/.ssh/config:

Host wrds
    HostName wrds-cloud-sshkey.wharton.upenn.edu
    User your_wrds_id
    IdentityFile ~/.ssh/id_ed25519
    IdentitiesOnly yes

You can now connect with:

ssh wrds

This configuration is also used automatically by paramiko, enabling password-less access from Python.

Troubleshooting

If SSH still prompts for a password, run:

ssh -vvv wrds

and confirm that publickey appears in the list of authentication methods.

wrds2pg uses paramiko to execute SAS code on WRDS via SSH. Password-based authentication will not work in unattended scripts.

3. PostgreSQL

For the wrds_update() function, you should have write access to a PostgreSQL database to store the data.

4. Environment variables

Environment variables that the code can use include:

  • PGDATABASE: The name of the PostgreSQL database you use.
  • PGUSER: Your username on the PostgreSQL database.
  • PGHOST: Where the PostgreSQL database is to be found (this will be localhost if it's on the same machine as you're running the code on)
  • WRDS_ID: Your WRDS ID.
  • DATA_DIR: The local repository for parquet files.
  • CSV_DIR: The local repository for compressed CSV files.

You can set these environment variables in (say) ~/.zprofile:

export PGHOST="localhost"
export PGDATABASE="crsp"
export WRDS_ID="iangow"
export PGUSER="igow"

Using wrds_update().

Two arguments table_name and schema are required.

1. WRDS Settings

Set WRDS_ID using either wrds_id=your_wrds_id in the function call or the environment variable WRDS_ID.

2. Environment variables

The wrds_update() function will use the environment variables PGHOST, PGDATABASE, and PGUSER if you have set them. Otherwise, you need to provide values as arguments to wrds_udpate(). The default for PGPORT is 5432.

3. Table settings

To tailor your request, specify the following arguments:

  • fix_missing: set to True to fix missing values. This addresses special missing values, which SAS's PROC EXPORT dumps as strings. The default is False.
  • fix_cr: set to True to fix characters. Default value is False.
  • drop: specify columns to be dropped using SAS syntax (e.g., drop="id name" will drop columns id and name).
  • obs: specify the maximum number of observations to download (e.g., obs=10 will import the first 10 rows from the table on WRDS).
  • rename: rename columns (e.g., rename="fee=mngt_fee" renames fee to mngt_fee).
  • force: set to True to force update. Default value is False.

Importing local SAS data into PostgreSQL

The software can also upload a local SAS file to PostgreSQL. You need to have local SAS in order to use this function. Use fpath to specify the path to the file to be imported.

Examples

This software is available from PyPI. To install of wrds2pg from there:

pip3 install wrds2pg

To install the development version wrds2pg from Github:

sudo -H pip3 install git+https://github.com/iangow/wrds2pg --upgrade

Example usage:

from wrds2pg import wrds_update

# 1. Download crsp.mcti from wrds and upload to pg as crps.mcti
# Simplest version
wrds_update(table_name="mcti", schema="crsp")

# Tailored arguments 
wrds_update(table_name="mcti", schema="crsp", host=your_pghost, 
	dbname=your_pg_database, 
	fix_missing=True, fix_cr=True, drop="b30ret b30ind", obs=10, 
	rename="caldt=calendar_date", force=True)

Report bugs

Author: Ian Gow, iandgow@gmail.com Contributors: Jingyu Zhang, jingyu.zhang@chicagobooth.edu, Evan Jo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wrds2pg-1.0.37.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wrds2pg-1.0.37-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file wrds2pg-1.0.37.tar.gz.

File metadata

  • Download URL: wrds2pg-1.0.37.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for wrds2pg-1.0.37.tar.gz
Algorithm Hash digest
SHA256 b1962f107db26370887b8f67dfaf0588f5eb7d13f298327b87bc5325a129126e
MD5 fc1cddc0b4ac0a654e6008c8d3e2c133
BLAKE2b-256 653017995610c58073f248b0842b034a9053c5ab3f53653c12534abfe9d61414

See more details on using hashes here.

File details

Details for the file wrds2pg-1.0.37-py3-none-any.whl.

File metadata

  • Download URL: wrds2pg-1.0.37-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for wrds2pg-1.0.37-py3-none-any.whl
Algorithm Hash digest
SHA256 a5391d81b19cefc3fa33cd1597a7c7be7ffe9632159e9232e17312a16b4acae9
MD5 675c4612d2d763d6f1ccb47fce677681
BLAKE2b-256 ddedb3c422e7c44c92ba11c80043f45076edc1e609ff1ddffc18b78f8b69cb54

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page