Extends pandas with common functions used in finance and economics research
Project description
pandasmore
The full documentation site is here, and the GitHub page is here.
Here is a short description of some of the main functions (more details below and in the documentation):
setup_tseries
: cleans up dates and sets them as the indexsetup_panel
: cleans up dates and panel id’s and sets them as the index (panel id, period date)lag
: robust lagging that accounts for panel structure, unsorted or duplicate dates, or gaps in the time-series
Install
pip install pandasmore
How to use
First, we set up an example dataset to showcase the functions in this module.
import pandas as pd
import numpy as np
import pandasmore as pdm
raw = pd.DataFrame(np.random.rand(15,2),
columns=list('AB'),
index=pd.MultiIndex.from_product(
[[1,2, np.nan],[np.nan,'2010-01','2010-02','2010-02','2010-04']],
names = ['firm_id','date'])
).reset_index()
raw
firm_id | date | A | B | |
---|---|---|---|---|
0 | 1.0 | NaN | 0.249370 | 0.926335 |
1 | 1.0 | 2010-01 | 0.282501 | 0.513859 |
2 | 1.0 | 2010-02 | 0.804278 | 0.307171 |
3 | 1.0 | 2010-02 | 0.828895 | 0.746789 |
4 | 1.0 | 2010-04 | 0.569099 | 0.331814 |
5 | 2.0 | NaN | 0.533977 | 0.823457 |
6 | 2.0 | 2010-01 | 0.207558 | 0.401378 |
7 | 2.0 | 2010-02 | 0.086001 | 0.959371 |
8 | 2.0 | 2010-02 | 0.054230 | 0.993980 |
9 | 2.0 | 2010-04 | 0.062525 | 0.200272 |
10 | NaN | NaN | 0.091012 | 0.635409 |
11 | NaN | 2010-01 | 0.866369 | 0.972394 |
12 | NaN | 2010-02 | 0.432087 | 0.837597 |
13 | NaN | 2010-02 | 0.878219 | 0.148009 |
14 | NaN | 2010-04 | 0.820386 | 0.834821 |
df = pdm.setup_tseries(raw.query('firm_id==1'),
time_var='date', time_var_format="%Y-%m",
freq='M')
df
date | dtdate | firm_id | A | B | |
---|---|---|---|---|---|
Mdate | |||||
2010-01 | 2010-01 | 2010-01-01 | 1.0 | 0.282501 | 0.513859 |
2010-02 | 2010-02 | 2010-02-01 | 1.0 | 0.828895 | 0.746789 |
2010-04 | 2010-04 | 2010-04-01 | 1.0 | 0.569099 | 0.331814 |
df = pdm.setup_panel(raw,
panel_ids='firm_id',
time_var='date', time_var_format="%Y-%m",
freq='M')
df
date | dtdate | A | B | ||
---|---|---|---|---|---|
firm_id | Mdate | ||||
1 | 2010-01 | 2010-01 | 2010-01-01 | 0.282501 | 0.513859 |
2010-02 | 2010-02 | 2010-02-01 | 0.828895 | 0.746789 | |
2010-04 | 2010-04 | 2010-04-01 | 0.569099 | 0.331814 | |
2 | 2010-01 | 2010-01 | 2010-01-01 | 0.207558 | 0.401378 |
2010-02 | 2010-02 | 2010-02-01 | 0.054230 | 0.993980 | |
2010-04 | 2010-04 | 2010-04-01 | 0.062525 | 0.200272 |
pdm.lag(df['A'])
firm_id Mdate
1 2010-01 NaN
2010-02 0.282501
2010-04 NaN
2 2010-01 NaN
2010-02 0.207558
2010-04 NaN
Name: A_lag1, dtype: float64
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pandasmore-0.0.6.tar.gz
(24.5 kB
view hashes)
Built Distribution
pandasmore-0.0.6-py3-none-any.whl
(11.4 kB
view hashes)
Close
Hashes for pandasmore-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70377506392d6b14430a5203b0aebbd7c363c966da5e089b41603a1fc92ff91b |
|
MD5 | d42ebb6cb224cde6191f6accfebe5819 |
|
BLAKE2b-256 | 60552a985031dabc3d16af23156935d62d36b758582caa536c9d23153475ef26 |