Faster loading of pandas data frames by saving them as numpy arrays and pickling their meta info (row+column names, column dtype info).
Project description
numpickle
Faster loading of pandas data frames by saving them as numpy arrays and pickling their meta info (row+column names, column dtype info).
Install
pip install numpickle
Usage
import pandas as pd
import numpickle as npl
# create example data frame with non-numeric and numeric columns
df = pd.DataFrame([[1, 2,'a'], [3, 4, 'b']])
df.columns = ["A", "B", "C"]
df.index = ["row1", "row2"]
df
# A B C
# row1 1 2 a
# row2 3 4 b
df.dtypes
# A int64
# B int64
# C object
# dtype: object
# save data frame as numpy array and pickle row and column names
# into helper pickle file "/home/user/test.npy.pckl"
npl.save_numpickle(df, "/home/user/test.npy")
# load the saved data
df_ = npl.load_numpickle("/home/user/test.npy")
df_
# A B C
# row1 1 2 a
# row2 3 4 b
df_.dtypes
# A int64
# B int64
# C object
# dtype: object
all(df == df_)
# True
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
numpickle-0.1.2.post7.tar.gz
(2.3 kB
view hashes)
Built Distribution
Close
Hashes for numpickle-0.1.2.post7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 574501bbd11a32bc929817da32eaa3b24afeafcafbb6ee910b2743114b2c9290 |
|
MD5 | 2aeea9d321b7f4ff0305b1218851f9fe |
|
BLAKE2b-256 | d8e230570b4b79c98706eeaab6118a2b1cb706bbcc4a1cf4c371d8bcf0e313fa |