Utilities for data manipulation including creation of DAGs and tables
Project description
pyg-npy
pip install from https://pypi.org/project/pyg-npy/
conda install from https://anaconda.org/yoavgit/pyg-npy
A quick utility to save dataframes as npy files.
It supports append and checks lightly on column names matching and index.
For simple read/write/append, it is about 5-10 times faster than parquet writing or pystore.
import numpy as np; import pandas as pd
from pyg_npy import pd_to_npy, pd_read_npy
import pystore
import datetime
pystore.set_path("c:/temp/pystore")
store = pystore.store('mydatastore')
collection = store.collection('NASDAQ')
arr = np.random.normal(0,1,(100,10))
df = pd.DataFrame(arr, columns = list('abcdefghij'))
dates = [datetime.datetime(2020,1,1) + datetime.timedelta(i) for i in range(-10000,0)]
ts = pd.DataFrame(np.random.normal(0,1,(10000,10)), dates, columns = list('abcdefghij'))
### write
%timeit collection.write('TEST', df, overwrite = True)
19.5 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.to_parquet('c:/temp/test.parquet')
9.53 ms ± 650 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd_to_npy(df, 'c:/temp/test.npy')
947 µs ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
### read
%timeit collection.item('TEST').to_pandas()
7.7 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.read_parquet('c:/temp/test.parquet')
2.85 ms ± 54.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd_read_npy('c:/temp/test.npy')
847 µs ± 39.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
### append
# because we need to append data with increasing index value, we use this:
collection.write('TEST2', ts.iloc[:100], overwrite = True)
%time len([collection.append('TEST2', ts.iloc[i*100: i*100+100], npartitions=2) for i in range(1,100)])
Wall time: 12.1 s
pd_to_npy(ts.iloc[:100], 'c:/temp/test.npy')
%time len([pd_to_npy(ts.iloc[i*100: i*100+100], 'c:/temp/test.npy', 'a', True) for i in range(1,100)])
Wall time: 2.14 s
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyg-npy-0.0.9.tar.gz.
File metadata
- Download URL: pyg-npy-0.0.9.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b71643e78f291be92d0e7c52584f05172881fec9846739bc8e10aca4c115ff9
|
|
| MD5 |
88cf983ff97b430532ed961da9f8ea45
|
|
| BLAKE2b-256 |
67656c67e08b27a15c6d54cd352f316994d8a21aa1025b6f5a901fab8647f664
|
File details
Details for the file pyg_npy-0.0.9-py3-none-any.whl.
File metadata
- Download URL: pyg_npy-0.0.9-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d6f3944b8f3adb9a60499d6d8bdd5fa38ad054d884c5e86a1e066a1be4438d2
|
|
| MD5 |
e1f21bffd88d99c461fbb760959de3c2
|
|
| BLAKE2b-256 |
503e0f5739e600c9b0968cc210ed63ba6aaf782055eb3cac5709671da87221cb
|