Utilities for data manipulation including creation of DAGs and tables
Project description
pyg-npy
pip install from https://pypi.org/project/pyg-npy/
conda install from https://anaconda.org/yoavgit/pyg-npy
A quick utility to save dataframes as npy files.
It supports append and checks lightly on column names matching and index.
For simple read/write/append, it is about 5-10 times faster than parquet writing or pystore.
import numpy as np; import pandas as pd
from pyg_npy import pd_to_npy, pd_read_npy
import pystore
import datetime
pystore.set_path("c:/temp/pystore")
store = pystore.store('mydatastore')
collection = store.collection('NASDAQ')
arr = np.random.normal(0,1,(100,10))
df = pd.DataFrame(arr, columns = list('abcdefghij'))
dates = [datetime.datetime(2020,1,1) + datetime.timedelta(i) for i in range(-10000,0)]
ts = pd.DataFrame(np.random.normal(0,1,(10000,10)), dates, columns = list('abcdefghij'))
### write
%timeit collection.write('TEST', df, overwrite = True)
19.5 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.to_parquet('c:/temp/test.parquet')
9.53 ms ± 650 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd_to_npy(df, 'c:/temp/test.npy')
947 µs ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
### read
%timeit collection.item('TEST').to_pandas()
7.7 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.read_parquet('c:/temp/test.parquet')
2.85 ms ± 54.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd_read_npy('c:/temp/test.npy')
847 µs ± 39.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
### append
# because we need to append data with increasing index value, we use this:
collection.write('TEST2', ts.iloc[:100], overwrite = True)
%time len([collection.append('TEST2', ts.iloc[i*100: i*100+100], npartitions=2) for i in range(1,100)])
Wall time: 12.1 s
pd_to_npy(ts.iloc[:100], 'c:/temp/test.npy')
%time len([pd_to_npy(ts.iloc[i*100: i*100+100], 'c:/temp/test.npy', 'a', True) for i in range(1,100)])
Wall time: 2.14 s
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyg-npy-0.0.9.tar.gz
(8.2 kB
view details)
Built Distribution
File details
Details for the file pyg-npy-0.0.9.tar.gz
.
File metadata
- Download URL: pyg-npy-0.0.9.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b71643e78f291be92d0e7c52584f05172881fec9846739bc8e10aca4c115ff9 |
|
MD5 | 88cf983ff97b430532ed961da9f8ea45 |
|
BLAKE2b-256 | 67656c67e08b27a15c6d54cd352f316994d8a21aa1025b6f5a901fab8647f664 |
File details
Details for the file pyg_npy-0.0.9-py3-none-any.whl
.
File metadata
- Download URL: pyg_npy-0.0.9-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d6f3944b8f3adb9a60499d6d8bdd5fa38ad054d884c5e86a1e066a1be4438d2 |
|
MD5 | e1f21bffd88d99c461fbb760959de3c2 |
|
BLAKE2b-256 | 503e0f5739e600c9b0968cc210ed63ba6aaf782055eb3cac5709671da87221cb |