Skip to main content

Extends pandas with common functions used in finance and economics research

Project description

pandasmore

The full documentation site is here, and the GitHub page is here.

Here is a short description of some of the main functions (more details below and in the documentation):

  • setup_tseries: cleans up dates and sets them as the index
  • setup_panel: cleans up dates and panel id’s and sets them as the index (panel id, period date)
  • lag: robust lagging that accounts for panel structure, unsorted or duplicate dates, or gaps in the time-series

Install

pip install pandasmore

How to use

First, we set up an example dataset to showcase the functions in this module.

import pandas as pd
import numpy as np
import pandasmore as pdm
raw = pd.DataFrame(np.random.rand(15,2), 
                    columns=list('AB'), 
                    index=pd.MultiIndex.from_product(
                        [[1,2, np.nan],[np.nan,'2010-01','2010-02','2010-02','2010-04']],
                        names = ['firm_id','date'])
                      ).reset_index()
raw
firm_id date A B
0 1.0 NaN 0.249370 0.926335
1 1.0 2010-01 0.282501 0.513859
2 1.0 2010-02 0.804278 0.307171
3 1.0 2010-02 0.828895 0.746789
4 1.0 2010-04 0.569099 0.331814
5 2.0 NaN 0.533977 0.823457
6 2.0 2010-01 0.207558 0.401378
7 2.0 2010-02 0.086001 0.959371
8 2.0 2010-02 0.054230 0.993980
9 2.0 2010-04 0.062525 0.200272
10 NaN NaN 0.091012 0.635409
11 NaN 2010-01 0.866369 0.972394
12 NaN 2010-02 0.432087 0.837597
13 NaN 2010-02 0.878219 0.148009
14 NaN 2010-04 0.820386 0.834821
df = pdm.setup_tseries(raw.query('firm_id==1'),
                        time_var='date', time_var_format="%Y-%m",
                        freq='M')
df
date dtdate firm_id A B
Mdate
2010-01 2010-01 2010-01-01 1.0 0.282501 0.513859
2010-02 2010-02 2010-02-01 1.0 0.828895 0.746789
2010-04 2010-04 2010-04-01 1.0 0.569099 0.331814
df = pdm.setup_panel(raw,
                        panel_ids='firm_id',
                        time_var='date', time_var_format="%Y-%m",
                        freq='M')
df
date dtdate A B
firm_id Mdate
1 2010-01 2010-01 2010-01-01 0.282501 0.513859
2010-02 2010-02 2010-02-01 0.828895 0.746789
2010-04 2010-04 2010-04-01 0.569099 0.331814
2 2010-01 2010-01 2010-01-01 0.207558 0.401378
2010-02 2010-02 2010-02-01 0.054230 0.993980
2010-04 2010-04 2010-04-01 0.062525 0.200272
pdm.lag(df['A'])
firm_id  Mdate  
1        2010-01         NaN
         2010-02    0.282501
         2010-04         NaN
2        2010-01         NaN
         2010-02    0.207558
         2010-04         NaN
Name: A_lag1, dtype: float64

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandasmore-0.0.6.tar.gz (24.5 kB view hashes)

Uploaded Source

Built Distribution

pandasmore-0.0.6-py3-none-any.whl (11.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page