Read, write and update large scale pandas DataFrame with ElasticSearch
Project description
es_pandas
Read, write and update large scale pandas DataFrame with ElasticSearch.
Requirements
This package should work on Python3(>=3.4) and ElasticSearch should be version 6.x or 7.x(>=6.8).
Installation The package is hosted on PyPi and can be installed with pip:
pip install es_pandas
Usage
import time
import pandas as pd
from es_pandas import es_pandas
# Information of es cluseter
es_host = 'localhost:9200'
index = 'demo'
# crete es_pandas instance
ep = es_pandas(es_host)
# Example data frame
df = pd.DataFrame({'Alpha': [chr(i) for i in range(97, 128)],
'Num': [x for x in range(31)],
'Date': pd.date_range(start='2019/01/01', end='2019/01/31')})
# init template if you want
doc_type = 'demo'
ep.init_es_tmpl(df, doc_type)
# Example of write data to es, auto create and put template to es if template does not exits
ep.to_es(df, index)
time.sleep(10)
# Example of read data from es
df = ep.to_pandas(index)
print(df.head())
# return certain fields in es
heads = ['Num', 'Date']
df = ep.to_pandas(index, heads=heads)
print(df.head())
# set certain columns dtype
dtype = {'Num': 'float', 'Alpha': object}
df = ep.to_pandas(index, dtype=dtype)
print(df.dtypes)
df2 = pd.DataFrame({'Alpha': [chr(i) for i in range(97, 129)],
'Num': [x for x in range(32)],
'Date': pd.date_range(start='2019/01/01', end='2019/02/01')})
df2.loc[df2['Num']==10, ['Alpha']] = 'change'
# Example of update data in es
ep.to_es_dev(df2, index, 'Num')
More about update
to_es_dev(df, index, key_col, ignore_cols=[])
function is available if you want to write or update data with ElasticSearch.
to_es_dev
function writes df
to es if index
not exits, or it reads data from ElasticSearch in batches, and compare the data with df
by merging them on key_col
, if you want to ignore some columns when comparing, set it with ignore_col
parameter. Moreover, new records in df
will be written to ElasticSearch.
License
(c) 2019 Frank
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for es_pandas-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c288832629d3f064ae160b5c31bc4fb7fd989645af53ca4b6a2bed1d5f690ced |
|
MD5 | 1b004bfafe2295b543fa0da57a15f7ac |
|
BLAKE2b-256 | 50b286f67099c542c5f474b82ee4abd334c408b3096ddcb7714e25c586531c6d |