Skip to main content

Download and process datasets commonly used in finance research

Project description

finsets

Download and process datasets commonly used in finance research

Each module handles a different data source. Almost all submodules (other than utility ones) have a get_raw_data function that downloads the raw data and a process_raw_data function that processes the data into a pandas.DataFrame having, as index, either:

  • A pandas.Period date reflecting the frequency of the data (for time-series datasets), or
  • A pandas.MultiIndex with a panel identifier in the first dimension and a pandas.Period date in the second dimension (for panel datasets).

The period date in the index will be named following the pattern Xdate where X is the string literal representing the frequency of the data (e.g. Mdate for monthly data, Qdate for quarterly data, Ydate for annual data).

Documentation site.

GitHub page.

Install

pip install finsets

How to use

import finsets as fds

or

from finsets import fred, wrds, papers

Below, we very briefly describe each submodule. For more details, please see the documentation of each submodule (they provide a lot more functionality than presented here).

WRDS

Downloads and processes datasets from Wharton Research Data Services WRDS.

Each WRDS module handles a different library in WRDS (e.g. compa module for the Compustat Annual CCM file, crspm for the CRSP Monthly Stock file, etc.).

Before you use any of the wrds modules, you need to create a pgpass with your WRDS credentials. To do that, run

from finsets.wrds import wrds_api
db = wrds_api.Connection()

This will prompt you for your WRDS username and password. After you enter your credentials, if you don’t have a pgpass file already set up, it will ask you if you want to do that. Hit y and it will be automatically created for you. After this, you will never have to input your WRDS password.

You will still have to supply your WRDS username to functions that retrieve data from WRDS (all of them have a wrds_username parameter). If you don’t want to be prompted for the username for every download, save it under a WRDS_USERNAME environment variable:

  • On Windows, in a Command Prompt:
    • setx WRDS_USERNAME "your_wrds_username_here"
  • On Linux, in a terminal:
    • echo 'export WRDS_USERNAME="your_wrds_username_here"' >> ~/.bashrc && source ~/.bashrc
  • On macOS, since macOS Catalina:
    • echo 'export WRDS_USERNAME="your_wrds_username_here"' >> ~/.zshrc && source ~/.szhrc
  • On macOS, prior to macOS Catalina:
    • echo 'export WRDS_USERNAME="your_wrds_username_here"' >> ~/.bash_profile && source ~/.bash_profile

The functions in the wrds_ modules will close database connections to WRDS automatically. However, if you open a connection manually, as above (with wrds.Connection()) make sure you remember to close that connection. In our example above:

db.close()

Check the wrds_utils module for an introduction to some of the main utilities that come with the wrds package.

FRED

Downloads and processes datasets from the St. Louis FRED.

To use the functions in the fred module, you’ll need an API key from the St. Louis FRED.

Get one here and store it in your environment variables under the name FRED_API_KEY

Alternatively, you can supply the API key directly as the api_key parameter in each function in the fred module.

gdp = fred.fred.get_raw_data(['GDP'])
gdp['info']
id realtime_start realtime_end title observation_start observation_end frequency frequency_short units units_short seasonal_adjustment seasonal_adjustment_short last_updated popularity notes
0 GDP 2023-11-15 2023-11-15 Gross Domestic Product 1947-01-01 2023-07-01 Quarterly Q Billions of Dollars Bil. of $ Seasonally Adjusted Annual Rate SAAR 2023-10-26 07:55:01-05 92 BEA Account Code: A191RC Gross domestic produ...
gdp['Q']
GDP
1947-01-01 243.164
1947-04-01 245.968
1947-07-01 249.585
1947-10-01 259.745
1948-01-01 265.742
... ...
2022-07-01 25994.639
2022-10-01 26408.405
2023-01-01 26813.601
2023-04-01 27063.012
2023-07-01 27623.543

307 rows × 1 columns

PAPERS

Downloads and processes datasets made available by the authors of academic papers.

Each papers module handles a different paper. The naming convention is that the module’s name is made up of the last names of the authors and the publication year, separated by underscores. If more than two authors, all but the first author’s name is replaced by ‘etal’. For example, the module for the paper “Firm-Level Political Risk: Measurement and Effects” (2019) by Tarek A. Hassan, Stephan Hollander, Laurence van Lent, Ahmed Tahoun is named hasan_etal_2019.

papers.hassan_etal_2019.list_all_vars().head()
name
0 gvkey
1 date
2 PRisk
3 NPRisk
4 Risk

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finsets-0.0.6.tar.gz (106.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finsets-0.0.6-py3-none-any.whl (60.1 kB view details)

Uploaded Python 3

File details

Details for the file finsets-0.0.6.tar.gz.

File metadata

  • Download URL: finsets-0.0.6.tar.gz
  • Upload date:
  • Size: 106.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for finsets-0.0.6.tar.gz
Algorithm Hash digest
SHA256 3601ba35276420aacebe018021e18988e473911e04ec0f15f4d333d33b25d576
MD5 744b69ffeb06070d16783d4a17ef6f7d
BLAKE2b-256 3e536f2083c1bc36b9d8f65862e73f737c4b03172f6a006fcafc10d93a0a3fc2

See more details on using hashes here.

File details

Details for the file finsets-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: finsets-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 60.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for finsets-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 bc9da4c67a8a406528dca8ae7388f6f0ed55d37d7947c1a961f75d35c1bc2323
MD5 886f5f26ce0067be46e4177dfa42f957
BLAKE2b-256 346d634e643e8488dd6d90588e931c8b8f55d18685ea2caf691ca9db8be8789c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page