Skip to main content

Utilities for data manipulation including creation of DAGs and tables

Project description

pyg-npy

pip install from https://pypi.org/project/pyg-npy/

conda install from https://anaconda.org/yoavgit/pyg-npy

A quick utility to save dataframes as npy files.

It supports append and checks lightly on column names matching and index.

For simple read/write/append, it is about 5-10 times faster than parquet writing or pystore.

import numpy as np; import pandas as pd
from pyg_npy import pd_to_npy, pd_read_npy
import pystore
import datetime

pystore.set_path("c:/temp/pystore")
store = pystore.store('mydatastore')
collection = store.collection('NASDAQ')
arr = np.random.normal(0,1,(100,10))
df = pd.DataFrame(arr, columns = list('abcdefghij'))
dates = [datetime.datetime(2020,1,1) + datetime.timedelta(i) for i in range(-10000,0)]
ts = pd.DataFrame(np.random.normal(0,1,(10000,10)), dates, columns = list('abcdefghij'))

### write
%timeit collection.write('TEST', df, overwrite = True)
19.5 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.to_parquet('c:/temp/test.parquet')
9.53 ms ± 650 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit pd_to_npy(df, 'c:/temp/test.npy')
947 µs ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


### read
%timeit collection.item('TEST').to_pandas()
7.7 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit pd.read_parquet('c:/temp/test.parquet')
2.85 ms ± 54.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit pd_read_npy('c:/temp/test.npy')
847 µs ± 39.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

### append
# because we need to append data with increasing index value, we use this:

collection.write('TEST2', ts.iloc[:100], overwrite = True)
%time len([collection.append('TEST2', ts.iloc[i*100: i*100+100], npartitions=2) for i in range(1,100)])
Wall time: 12.1 s

pd_to_npy(ts.iloc[:100], 'c:/temp/test.npy')
%time len([pd_to_npy(ts.iloc[i*100: i*100+100], 'c:/temp/test.npy', 'a', True) for i in range(1,100)])
Wall time: 2.14 s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyg-npy-0.0.9.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

pyg_npy-0.0.9-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file pyg-npy-0.0.9.tar.gz.

File metadata

  • Download URL: pyg-npy-0.0.9.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pyg-npy-0.0.9.tar.gz
Algorithm Hash digest
SHA256 1b71643e78f291be92d0e7c52584f05172881fec9846739bc8e10aca4c115ff9
MD5 88cf983ff97b430532ed961da9f8ea45
BLAKE2b-256 67656c67e08b27a15c6d54cd352f316994d8a21aa1025b6f5a901fab8647f664

See more details on using hashes here.

File details

Details for the file pyg_npy-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: pyg_npy-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pyg_npy-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 7d6f3944b8f3adb9a60499d6d8bdd5fa38ad054d884c5e86a1e066a1be4438d2
MD5 e1f21bffd88d99c461fbb760959de3c2
BLAKE2b-256 503e0f5739e600c9b0968cc210ed63ba6aaf782055eb3cac5709671da87221cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page