pandas-ml-common

Augment pandas DataFrame with methods for machine learning

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.8
Topic
- Software Development :: Build Tools

Project description

The pandas ml common module

This module holds all common extensions and utilities for the pandas ml quant stack. Feel free to study the examples as well.

easy joining of data frames with multi indexes

from pandas_ml_common import pd, np

df1 = pd.DataFrame({"a": np.random.random(10), "b": np.random.random(10)})
print(df1.inner_join(df1, prefix_left='A', prefix='B', force_multi_index=True).to_markdown())

	('A', 'a')	('A', 'b')	('B', 'a')	('B', 'b')
0	0.907892	0.726913	0.907892	0.726913
1	0.602275	0.134278	0.602275	0.134278
2	0.264399	0.207429	0.264399	0.207429
3	0.559751	0.816759	0.559751	0.816759
4	0.951172	0.797524	0.951172	0.797524
5	0.504332	0.51996	0.504332	0.51996
6	0.765235	0.17908	0.765235	0.17908
7	0.388691	0.644103	0.388691	0.644103
8	0.663636	0.678879	0.663636	0.678879
9	0.291603	0.0164627	0.291603	0.0164627

access columns with regex

df4 = pd.DataFrame({"a_22_a": np.random.random(1), "b_21_b": np.random.random(1)})
df4._[r'.*\d+_.']

	a_22_a	b_21_b
0	0.22039	0.0374084

easy access multi level index

df1.unique_level_columns(0)

['A', 'B']

df1.add_multi_index('Z', axis=1)

data splitting, sampling and folding (aka cross validation)

from pandas_ml_common import Sampler, XYWeight, random_splitter

df2 = pd.DataFrame({"c": np.random.random(10)})
sampler = Sampler(XYWeight(df1, df2), splitter=random_splitter(0.5))

for batches in sampler.sample_for_training():
    for batch in batches:
        print(batch)

access to nested numpy arrays in data frame columns (df._.values)

df3 = pd.DataFrame({"a": [[1, 2], [3, 4], [5, 6]]})
df3._.values

array([[1, 2],
       [3, 4],
       [5, 6]])

dynamic method call providing suitable *args and **kwargs (dependency injection)

from pandas_ml_common import call_callable_dynamic_args

def adder(a, b):
    return a + b

call_callable_dynamic_args(adder, a=12, b=10, c='illegal')

22

numpy utils

from pandas_ml_common import np_nans

np_nans((3, 3))

array([[nan, nan, nan],
       [nan, nan, nan],
       [nan, nan, nan]])


from pandas_ml_common import temp_seed

with temp_seed(42):
    print(np.random.random(2))

np.random.random(2)


[0.37454012 0.95071431]
array([0.69373278, 0.69790163])

serialization utils

from pandas_ml_common import serializeb, deserializeb

deserializeb(serializeb(np.array([1, 2, 3])))
array([1, 2, 3])

re-scalings

from pandas_ml_common import ReScaler

x = np.arange(0, 1, .1)
rescaler = ReScaler((0, 1), (5, -5))

rescaler(x)
array([ 5.,  4.,  3.,  2.,  1.,  0., -1., -2., -3., -4.])

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.8
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

This version

0.2.7

Aug 23, 2021

0.2.6

Jun 19, 2021

0.2.5

Jun 11, 2021

0.2.4

May 28, 2021

0.2.3

Apr 29, 2021

0.2.2

Feb 10, 2021

0.2.1

Feb 6, 2021

0.2.0

Jan 29, 2021

0.1.15

Sep 8, 2020

0.1.14.3

Aug 9, 2020

0.1.14.2

Aug 9, 2020

0.1.14.1

Aug 9, 2020

0.1.14

Aug 9, 2020

0.1.13

Aug 9, 2020

0.1.12

Jul 27, 2020

0.1.10

Jun 22, 2020

0.1.9

Jun 14, 2020

0.1.8

May 23, 2020

0.1.7

May 23, 2020

0.1.6

May 20, 2020

0.1.5

May 12, 2020

0.1.4

Apr 26, 2020

0.1.3

Apr 11, 2020

0.1.2

Apr 4, 2020

0.1.1

Mar 20, 2020

0.1.0

Mar 8, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas-ml-common-0.2.7.tar.gz (252.2 kB view hashes)

Uploaded Aug 23, 2021 Source

Hashes for pandas-ml-common-0.2.7.tar.gz

Hashes for pandas-ml-common-0.2.7.tar.gz
Algorithm	Hash digest
SHA256	`f42829bd945ec16a51f1bba8f547d247beefda956c8884199f330b7afcca6beb`
MD5	`0196b1c6f4a9e65dfdeed3ef2c93c1da`
BLAKE2b-256	`48801e8aa8748128ad2d2bd5cd7287c1b03439d71b1d999a819f7a5c738c148c`